[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gnubol: Non-recursive parsing
I have tried several different approaches to nesting conditional potent COBOL
statements within conditional potent COBOL statements with bison. There may
be a number of mysteries to deal with, but the ones I have encountered have
led me to conclude that it is worth considering not recursing at all for
statement blocks within conditional clauses of statemenest which can be
conditional. These comments do not relate so much to IF/EVALUATE/SEARCH, but
a special case arrangement can be made there too.
Let me explain the basic sequence as I have encountered it.
Imagine a set of rules that would allow a verb that can take on the
alternating conditionals too look something like the following: LEAD is the
leading protion of a statement like
ADD data_ref TO data_ref (we ignore GIVING and ROUNDED for now):
LEAD ON SIZE ERROR stmt_block
LEAD ON SIZE ERROR stmt_block NOT ON SIZE ERROR stmt_block
LEAD NOT ON SIZE ERROR stmt_block ON SIZE ERROR stmt_block
LEAD NOT ON SIZE ERROR stmt_block
If you do not want gnubol to permit transposition, then these rules simplify
to three instead of four (but that is a side issue)
Because these statements can nest we commence to get some ambiguity in the
longer rules as the second half can conpete with the first half, as the
'context free' goal oriented grammar tool attempts to see what is what.
Another issue that arises (and this is not life or death, but just
important), as this pattern suggests, COBOL is heavily right recursive. That
is not so good for bison. It's internal stack could get quite deep on deeply
nested programs. Potentially retaining a sybol for interior statement block
reductions on the stack long after the matter is done with (in deeply nested
programs). To begin to deal with the ambiguity and to take a crack at
containing right recursion, a chop of this basic form can be considered.
It is actually possible to just cut off the right side of each of the four
rules, if the iterator that
found this form can keep looking for the statements that will be blocked on
the right side of
these forms. So,
iterator_rule :
basic_statements
| condtional_cluster
| iterator_rule basic_statements
| iterator_rule condtional_cluster
condtional_cluster :
LEAD ON SIZE ERROR
| LEAD ON SIZE ERROR stmt_block NOT ON SIZE ERROR
| LEAD NOT ON SIZE ERROR stmt_block ON SIZE ERROR
| LEAD NOT ON SIZE ERROR
Now we need code in the actions to manage things as far as what statement is
in what block, and what block glues to what clause. That is not really too
difficult, but it is an important change. Also a minor note is that the
iterator must also have rules to detect when conditional clusters have zero
statements glued on. Again that can't be too tough.
But it definitely worth noting immediately that if that outer statement can
be elevated from naive lower rules, then it might be worth considering
generalizing that.
But before we do we must consider a special problem or two.
First the stmt_block within the interior of rules such as
| LEAD ON SIZE ERROR stmt_block NOT ON SIZE ERROR
are actually special. They can not be the all encompassing, most inclusive,
statement grouping rule. If the LEAD is (any) arithmetic then we must
exclude simple arithemtics as the _last_ statement of the interior
stmt_block, or it will glue to the subsequent conditional clause. So for
each specific verb LEAD we must restrict that verbs family of verbs from
being the last in the in terior block.
That actually creates a lot of special rules, really proliferating rules for
the interior blocks of a number of conditional families.
There is a second problem. The four rule conditional cluster that I have
sketched is basically for unterminated conditionals. Lets back track and try
to sketch the naive rules for the explicitly scope terminated
statements.Perhaps these, where END-arith has to be specific to the exact
verb in LEAD
LEAD ON SIZE ERROR stmt_block END-arith
LEAD ON SIZE ERROR stmt_block NOT ON SIZE ERROR stmt_block END-arith
LEAD NOT ON SIZE ERROR stmt_block ON SIZE ERROR stmt_block END-arith
LEAD NOT ON SIZE ERROR stmt_block END-arith
Now the outer stmt_blocks, just before the END-arith have a special
requirement. These can not end in a simple arithmetic of the same specific
verb, nor may the left portion of the block have an open unterminated
conditional of the same specific verb: this because the END-arith would try
to glue to those verbs, rather than the outer verb in LEAD.
In this situation it at first seems like the solution is to just include
ENDed forms as one of the things that can occur on the left. That is almost
okay, but we still say there can not be an an unterminated conditional in
that statement block (of the same spcific verb).
The interior blocks of ENDed satement forms can also have some difficulties
if we allow those blocks to glue short one-legged conditionals of the same
specific verb, as the next conditional would glue to that rather than the
outer verb in the portrayed LEAD of the four presented rules. (This is only
partly relieved if we suppress transposition).
It is exactly at these points that we face the severest problem with the
tools. Depending upon actual style of coding the rules; we encounter reduce /
reduce conflicts (r/r) or shift / reduce conflicts (s/r). We can really
decieve ourself with some of the flexibility of the tools. In the absence of
special markings, bison will take its own choice in s/r conflicts. But what I
am seeing is that even with %prec precedence marking the grammars are not
coming out right. We really do not want shift or reduce; what we really want
is block defintitions that keep the simple or same-verb statements off the
right edge of the block. We want to avoid the conflict altogether, not
resolve it.
So I have bounced back to the idea of trying to elevate the parts of lower
level rules to the iterator up above. It at first looks very difficult to
tear apart the dual branched conditional rules with their special interiors.
But I have come across an insight that I think can be of benefit to those
working on both the bison and the PCCTS grammars.
This problem opens up when we reallize that every block has two parts. The
front part is a sequence of distint imperatives, that are either simple
imperatives, or explicitly terminated statements. These block_front
statements do not stick together. They accumulate sequentially within the
current block but do not nest within each other. (there can be zero, one or
more of these).
The back of the block can have open conditionals (unterminated) These open
conditionals deploy blocks of their own that nest subsequent statements.
Futher open conditionals can dig a deeper stack of open conditionals. The
block_back is really only one statement. This block_back also can have zero
statements. (It can be nested, hoever, ot an indefinite depth).
Generally, there must be one statement in a block (a front_ or a back_). The
parser actually must be coded to detect a zero statement block. (And there
are certain logical ORing syntaxes with EVALUATE WHEN clauses that permit
what on the surface looks like an empty block).
So if we were to break apart the pieces of the conditional cluster, and put
the fragments up in an iterator, it begins to look something like
iterator_rule :
basic_statements
| condtional_clause_initial
| condtional_clause_alternate
| block_front
| block_back
| END-xyz
| iterator_rule ... {/*recursing etc. I am simplifying to keep the fragments
in focus */}
Now we see that more responsibilities will have to be discharged in the
actions. The actions will have to make certain that things are happening in
the right sequence; match begin and end scope fagments; and get deeply
involved in conditional clause recurse managment. In short a fairly
impressive data structure will be needed to managed scope and nesting. But
the grammar rules begin to look much simpler.
So now to the special problems mentioned above. By turning the rules inside
out we are nolonger interested in either recusing blocks on the right side,
nor in elaborating the entire block strucure inside of conditionals. We have
only a few isolate issues to deal with. So the insight that is available is
to see that this approach to the rules is really just trolling the source
code for signals. Conditonal block mode turns on, and turns off; verb scope
turns on and turns off.
One by one signals are detected that push things onto a stack of conditional
clauses (to be reflected in a data structure). One by one the conditionals
clauses are to end some how, and one by one entire statements can have their
scope explicitly terminated. The grammar rules detect the signals (the
actions will have to manage the matching of front and back).
So if we back up to the idea that the interior block can not end on a simple
statement that
would ambiguously compete for the alternate conditional clause; we are
basically only interested in the block_back when we try to express the
restriction. The trick is to merge
the block back onto the fragment that imposes the restriction.
For example in the original naive rule
LEAD ON SIZE ERROR stmt_block NOT ON SIZE ERROR stmt_block
The alternate clause, NOT ON SIZE ERROR, imposses restrictions on the ending
of the
interior stmt_block. The pieces in there would be
... block_front block_back_restricted_for_arith NOT ON SIZE ERROR ...
We can problably leave the block_front up in the iterator, and just render
all the special
block_back_restricted_ rules in lower rules. For example
iterator_rule :
basic_statements
| condtional_clause_initial
| block_front
| select_block_back_plus_condtional_clause_alternate
| END-xyz
| iterator_rule ... {/*recursing etc. I am simplifying to keep the fragments
in focus */}
select_block_back_plus_condtional_clause_alternate
block_back_restricted_for_arith NOT ON SIZE ERROR
| block_back_restricted_for_I_O NOT INVALID KEY
| block_back_restricted_for_unstring NOT OVERFLOW
If you blieve in transposition, you need also the 'positive' logic clauses
| block_back_restricted_for_arith ON SIZE ERROR
| block_back_restricted_for_I_O INVALID KEY
| block_back_restricted_for_unstring OVERFLOW
Thus the high level iterator is just a signal detector. And the signal for
transition from the first
conditional clause to the second is not merely the NOT ON SIZE ERROR clause
particles. We must have the NOT ON SIZE ERROR clause preceded most
immediately by a statement that does not attempt to shift it! The actions
down on these rules need to deal separately with the statements in the
block_back_restricted, and with the conditional clause. ((To make the
statement glue correctly to the preceding block, we may actually need all
these expressed explicitly in the iterator. Here it is presented as a
subrule, just so we can focus on the iterator as a general pattern of
included rules)).
The iterator will nolonge be in search of statements, but instead hunts for
statement fragments.
When you look at the top you will see that we need to keep things straight
manually. But one important side benefit is that the top of the parser is
much more robust. GIve it umpteen conditional clauses that have no verb LEAD
to hang on and it can detect that as easily as detect one orphan. Let an
END-xyz drift through that has no VERB-xyz stack to pop, and its a snap to
conclude that it is an error.
Anyway the END-xyz fragments now probably no longer need a special
block_back. That is an aritfact of the naive rule expression, because it
attempted to recurse; and the
'context free' tool gives us static on ambiguities, when it sees that like
statments can nest.
Instead the implications of explict scope terminators and implicit scope
terminators are all to be managed by the actions.
The code needed by the actions to manage scope will probably not be
difficult. Each verb and each conditional clause opens scopes. We will need
to keep clear in the code that the end of a clause is distinct from the end
of a statments. There are no explicit clause terminators for example.
Explicit scope teminators, terminate their statement and any clauses, and may
also implicitly terminate orher statements and their clauses.
I think that we can associate a unique scope ID to every new scope. We can
link them mindlessly together in numerical sequence, and we can separately
keep hanging them on a linked list for that verb. An explict scope
terminator cures to the ID of the most recent scope on that verbs list, every
more recent scope is implictly terminated instantly. If we record the line
number of scope commencement (the verb or the first particle of a conditional
clause), and if we can sense the line number of the explicit scope terminator
(easy) or lookahead token (harder) then we can diagnose every scope with
begin / end line numbers.
(incidentally, high level error productions, perhaps thus at a high level
with the special symbol error, may need to clear scope stacks completely ,
under severe conditions).
Basically the verb scopes link together monuarally, the alternating
conditionals might require a binary link. We may need to list conditional
clauses of distinct verbs in this later list.
I want to mention that the EVALUATE statement can probalby be subjected to
this analysis also. An evaluate has two alternates (which are ordinal and
not transposable). The first alternate of an evaluate is a block composed of
a sequence of WHENs. Rather than a block of imperative statements, it is
really just a sequence of conditional statements (and asociated block) that
have no independent ELSE clause. The second portion is a block composed of a
dummy header (WHEN OTHER) and a sequence of statments.
In conclusion, this begins to depict a mainline iterator parser rule. That
can detect fragments and even see when some of them have become isolated by
coding errors, or parse disruption.
The abstractions described here are motivated by the experience of serious
ambiguities when attempting to handle conditional clause blocks as lower
level recurses with bison. So the entire grammar is turned inside out. The
iteration is done at the top or near the top. The actions must map statement
fragments together in the approach described here. This will require a number
of stacks to track conditional clause nesting. To a certain extent the 'rules
of the language' are taken out of the grammar rule syntax patterns, and will
have to be managed by enfocements reflected in actions.
Hope You All find that useful,
Bob Rayhawk
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.