[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: gnubol: Non-recursive parsing



Most of this note is beyond me.  I did, however, see a sentence that SEEMED
(in my ignorance) to imply that you thought that only ONE verb could appear
on a line of source code.  This is not a valid assumption.  Although MOST
"real" programmers do tend to do this, there is nothing in the Standard to
require it and some "code generators" do create multiple statements on the
same line.

Bill Klein
  wmklein <at> ix.netcom.com

> -----Original Message-----
> From: owner-gnu-cobol@wallace.lusars.net
> [mailto:owner-gnu-cobol@wallace.lusars.net]On Behalf Of
> RKRayhawk@aol.com
> Sent: Monday, December 13, 1999 12:25 AM
> To: gnu-cobol@lusars.net
> Subject: gnubol: Non-recursive parsing
>
>
>
> I have tried several different approaches to nesting conditional
> potent COBOL
> statements within conditional potent COBOL statements with bison.
> There may
> be a number of mysteries to deal with, but the ones I have
> encountered have
> led me to conclude that it is worth considering not recursing at all for
> statement blocks within conditional clauses of statemenest which can be
> conditional. These comments do not relate so much to
> IF/EVALUATE/SEARCH, but
> a special case arrangement can be made there too.
>
> Let me explain the basic sequence as I have encountered it.
>
> Imagine a set of rules that would allow a verb that can take on the
> alternating conditionals too look something like the following:
> LEAD is the
> leading protion of a statement like
> ADD data_ref TO data_ref (we ignore GIVING and ROUNDED for now):
>
> LEAD  ON SIZE ERROR  stmt_block
> LEAD  ON SIZE ERROR  stmt_block  NOT ON SIZE ERROR  stmt_block
> LEAD  NOT ON SIZE ERROR  stmt_block  ON SIZE ERROR  stmt_block
> LEAD  NOT ON SIZE ERROR  stmt_block
>
> If you do not want gnubol to permit transposition, then these
> rules simplify
> to three instead of four (but that is a side issue)
>
> Because these statements can nest we commence to get some ambiguity in the
> longer rules as the second half can conpete with the first half, as the
> 'context free' goal oriented grammar tool attempts to see what is what.
>
> Another issue that arises (and this is not life or death, but just
> important), as this pattern suggests, COBOL is heavily right
> recursive. That
> is not so good for bison. It's internal stack could get quite deep
> on deeply
> nested programs. Potentially retaining a sybol for interior
> statement block
> reductions on the stack long after the matter is done with (in
> deeply nested
> programs). To begin to deal with the ambiguity and to take a crack at
> containing right recursion, a chop of this basic form can be considered.
>
> It is actually possible to just cut off the right side of each of the four
> rules, if the iterator that
> found this form can keep looking for the statements that will be
> blocked on
> the right side of
> these forms. So,
>
> iterator_rule :
>    basic_statements
>  | condtional_cluster
>  | iterator_rule basic_statements
>  | iterator_rule condtional_cluster
>
> condtional_cluster :
>    LEAD  ON SIZE ERROR
>  | LEAD  ON SIZE ERROR  stmt_block  NOT ON SIZE ERROR
>  | LEAD  NOT ON SIZE ERROR  stmt_block  ON SIZE ERROR
>  | LEAD  NOT ON SIZE ERROR
>
> Now we need code in the actions to manage things as far as what
> statement is
> in what block, and what block glues to what clause. That is not really too
> difficult, but it is an important change.  Also a minor note is that the
> iterator must also have rules to detect when conditional clusters
> have zero
> statements glued on. Again that can't be too tough.
>
> But it definitely worth noting immediately that if that outer
> statement can
> be elevated from naive lower rules, then it might be worth considering
> generalizing that.
>
> But before we do we must consider a special problem or two.
>
> First the stmt_block within the interior of rules such as
>  | LEAD  ON SIZE ERROR  stmt_block  NOT ON SIZE ERROR
> are actually special. They can not be the all encompassing,  most
> inclusive,
> statement grouping rule.   If the LEAD is (any) arithmetic then we must
> exclude simple arithemtics as the _last_ statement of the interior
> stmt_block, or it will glue to the subsequent conditional clause.  So for
> each specific verb LEAD we must restrict that verbs family of verbs from
> being the last in the in terior block.
>
> That actually creates a lot of special rules, really proliferating
> rules for
> the interior blocks of a number of conditional families.
>
> There is a second problem. The four rule conditional cluster that I have
> sketched is basically for unterminated conditionals.  Lets back
> track and try
> to sketch the naive rules for the explicitly scope terminated
> statements.Perhaps these, where END-arith has to be specific to the exact
> verb in LEAD
>
> LEAD  ON SIZE ERROR  stmt_block  END-arith
> LEAD  ON SIZE ERROR  stmt_block  NOT ON SIZE ERROR  stmt_block  END-arith
> LEAD  NOT ON SIZE ERROR  stmt_block  ON SIZE ERROR  stmt_block  END-arith
> LEAD  NOT ON SIZE ERROR  stmt_block  END-arith
>
> Now the outer stmt_blocks, just before the END-arith have a special
> requirement. These can not end in a simple arithmetic of the same specific
> verb, nor may the left portion of the block have an open unterminated
> conditional of the same specific verb: this because the END-arith
> would try
> to glue to those verbs, rather than the outer verb in LEAD.
>
> In this situation it at first seems like the solution is to just include
> ENDed forms as one of the things that can occur on the left. That
> is almost
> okay, but we still say there can not be an  an unterminated conditional in
> that statement block (of the same spcific verb).
>
> The interior blocks of ENDed satement forms can also have some
> difficulties
> if we allow those blocks to glue short one-legged conditionals of the same
> specific verb, as the next conditional would glue to that rather than the
> outer verb in the portrayed LEAD of the four presented rules.
> (This is only
> partly relieved if we suppress transposition).
>
> It is exactly at these points that we face the severest problem with the
> tools. Depending upon actual style of coding the rules; we
> encounter reduce /
> reduce conflicts (r/r) or shift / reduce conflicts (s/r). We can really
> decieve ourself with some of the flexibility of the tools. In the
> absence of
> special markings, bison will take its own choice in s/r conflicts.
> But what I
> am seeing is that even with %prec precedence marking the grammars are not
> coming out right. We really do not want shift or reduce; what we
> really want
> is block defintitions that keep the simple or same-verb statements off the
> right edge of the block. We want to avoid the conflict altogether, not
> resolve it.
>
> So I have bounced back to the idea of trying to elevate the parts of lower
> level rules to the iterator up above. It at first looks very difficult to
> tear apart the dual branched conditional rules with their special
> interiors.
> But I have come across an insight that I think can be of benefit to those
> working on both the bison and the PCCTS grammars.
>
> This problem opens up when we reallize that every block has two parts. The
> front part is a sequence of distint imperatives, that are either simple
> imperatives, or explicitly terminated statements.  These block_front
> statements do not stick together. They accumulate sequentially within the
> current block but do not nest within each other. (there can be
> zero, one or
> more of these).
>
> The back of the block can have open conditionals (unterminated) These open
> conditionals deploy blocks of their own that nest subsequent statements.
> Futher open conditionals can dig a deeper stack of open conditionals. The
> block_back is really only one statement. This block_back also can
> have zero
> statements. (It can be nested, hoever, ot an indefinite depth).
>
> Generally, there must be one statement in a block (a front_ or a
> back_). The
> parser actually must be coded to detect a zero statement block. (And there
> are certain logical ORing syntaxes with EVALUATE WHEN clauses that permit
> what on the surface looks like an empty block).
>
> So if we were to break apart the pieces of the conditional
> cluster, and put
> the fragments up in an iterator, it begins to look something like
>
> iterator_rule :
>    basic_statements
>  | condtional_clause_initial
>  | condtional_clause_alternate
>  | block_front
>  | block_back
>  | END-xyz
>  | iterator_rule ... {/*recursing etc. I am simplifying to keep
> the fragments
> in focus */}
>
> Now we see that more responsibilities will have to be discharged in the
> actions. The actions will have to make certain that things are
> happening in
> the right sequence; match begin and end scope fagments; and get deeply
> involved in conditional clause recurse managment.  In short a fairly
> impressive data structure will be needed to managed scope and
> nesting.  But
> the grammar rules begin to look much simpler.
>
> So now to the special problems mentioned above. By turning the
> rules inside
> out we are nolonger interested in either recusing blocks on the
> right side,
> nor in elaborating the entire block strucure inside of
> conditionals. We have
> only a few isolate issues to deal with. So the insight that is
> available is
> to see that this approach to the rules is really just trolling the source
> code for signals.  Conditonal block mode turns on, and turns off;
> verb scope
> turns on and turns off.
>
> One by one signals are detected that push things onto a stack of
> conditional
> clauses (to be reflected in a data structure). One by one the conditionals
> clauses are to end some how, and one by one entire statements can
> have their
> scope explicitly terminated. The grammar rules detect the signals (the
> actions will have to manage the matching of front and back).
>
> So if we back up to the idea that the interior block can not end
> on a simple
> statement that
> would ambiguously compete for the alternate  conditional clause; we are
> basically only interested in the block_back when we try to express the
> restriction. The trick is to merge
> the block back onto the fragment that imposes the restriction.
>
> For example in the original naive rule
> LEAD  ON SIZE ERROR  stmt_block  NOT ON SIZE ERROR  stmt_block
>
> The alternate clause, NOT ON SIZE ERROR, imposses restrictions on
> the ending
> of the
> interior stmt_block. The pieces in there would be
>
>  ... block_front  block_back_restricted_for_arith  NOT ON SIZE ERROR ...
>
> We can problably leave the block_front up in the iterator, and just render
> all the special
> block_back_restricted_ rules in lower rules. For example
>
> iterator_rule :
>    basic_statements
>  | condtional_clause_initial
>  | block_front
>  | select_block_back_plus_condtional_clause_alternate
>  | END-xyz
>  | iterator_rule ... {/*recursing etc. I am simplifying to keep
> the fragments
> in focus */}
>
>
> select_block_back_plus_condtional_clause_alternate
>    block_back_restricted_for_arith  NOT ON SIZE ERROR
>  |  block_back_restricted_for_I_O  NOT INVALID KEY
>  |  block_back_restricted_for_unstring NOT OVERFLOW
>
> If you blieve in transposition, you need also the 'positive'  logic clauses
>  |  block_back_restricted_for_arith  ON SIZE ERROR
>  |  block_back_restricted_for_I_O  INVALID KEY
>  |  block_back_restricted_for_unstring OVERFLOW
>
> Thus the high level iterator is just a signal detector. And the signal for
> transition from the first
> conditional clause to the second is not merely the NOT ON SIZE
> ERROR clause
> particles. We must have the NOT ON SIZE ERROR clause preceded most
> immediately by a statement that does not attempt to shift it! The actions
> down on these rules need to deal separately with the statements in the
> block_back_restricted, and with the conditional clause. ((To make the
> statement glue correctly to the preceding block, we may actually need all
> these expressed explicitly in the iterator. Here it is presented as a
> subrule, just so we can focus on the iterator as a general pattern of
> included rules)).
>
> The iterator will nolonge be in search of statements, but instead
> hunts for
> statement fragments.
>
> When you look at the top you will see that we need to keep things straight
> manually. But one important side benefit is that the top of the parser is
> much more robust. GIve it umpteen conditional clauses that have no
> verb LEAD
> to hang on and it can detect that as easily as detect one orphan. Let an
> END-xyz drift through that has no VERB-xyz stack to pop, and its a snap to
> conclude that it is an error.
>
> Anyway the END-xyz fragments now probably no longer need a special
> block_back. That is an aritfact of the naive rule expression, because it
> attempted to recurse; and the
> 'context free' tool gives us static on ambiguities, when it sees that like
> statments can nest.
>
> Instead the implications of explict scope terminators and implicit scope
> terminators are all to be managed by the actions.
>
> The code needed by the actions to manage scope will probably not be
> difficult. Each verb and each conditional clause opens scopes. We
> will need
> to keep clear in the code that the end of a clause is distinct
> from the end
> of a statments. There are no explicit clause terminators for example.
>
> Explicit scope teminators, terminate their statement and any
> clauses, and may
> also implicitly terminate orher statements and their clauses.
>
> I think that we can associate a unique scope ID to every new scope. We can
> link them mindlessly together in numerical sequence, and we can separately
> keep hanging them on a linked list for that verb.  An explict scope
> terminator cures to the ID of the most recent scope on that verbs
> list, every
> more recent scope is implictly terminated instantly.  If we record
> the line
> number of scope commencement (the verb or the first particle of a
> conditional
> clause), and if we can sense the line number of the explicit scope
> terminator
> (easy) or lookahead token (harder) then we can diagnose every scope with
> begin / end line numbers.
>
> (incidentally, high level error productions, perhaps thus at a high level
> with the special symbol error, may need to clear scope stacks completely ,
> under severe conditions).
>
> Basically the verb scopes link together monuarally, the alternating
> conditionals might require a binary link. We may need to list conditional
> clauses of distinct verbs in this later list.
>
> I want to mention that the EVALUATE statement can probalby be subjected to
> this analysis also.  An evaluate has two alternates (which are ordinal and
> not transposable).  The first alternate of an evaluate is a block
> composed of
> a sequence of WHENs. Rather than a block of imperative statements, it is
> really just a sequence of conditional statements (and asociated
> block)  that
> have no independent ELSE clause.  The second portion is a block
> composed of a
> dummy header (WHEN OTHER) and a sequence of statments.
>
> In conclusion, this begins to depict a mainline iterator parser rule. That
> can detect fragments and even see when some of them have become
> isolated by
> coding errors, or parse disruption.
>
> The abstractions described here are motivated by the experience of serious
> ambiguities when attempting to handle conditional clause blocks as lower
> level recurses with bison. So the entire grammar is turned inside out. The
> iteration is done at the top or near the top. The actions must map
> statement
> fragments together in the approach described here. This will
> require a number
> of stacks to track conditional clause nesting. To a certain extent
> the 'rules
> of the language' are taken out of the grammar rule syntax
> patterns, and will
> have to be managed by enfocements reflected in actions.
>
> Hope You All find that useful,
>
> Bob Rayhawk
>
> --
> This message was sent through the gnu-cobol mailing list.  To
> remove yourself
> from this mailing list, send a message to majordomo@lusars.net with the
> words "unsubscribe gnu-cobol" in the message body.  For more information on
> the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.


--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.