[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gnubol: Non-recursive parsing

To: gnu-cobol@lusars.net
Subject: gnubol: Non-recursive parsing
From: RKRayhawk@aol.com
Date: Mon, 13 Dec 1999 01:25:01 EST
Delivered-To: gnu-cobol-outgoing@wallace.lusars.net
Reply-To: gnu-cobol@lusars.net
Sender: owner-gnu-cobol@wallace.lusars.net
    
I have tried several different approaches to nesting conditional potent COBOL 
statements within conditional potent COBOL statements with bison. There may 
be a number of mysteries to deal with, but the ones I have encountered have 
led me to conclude that it is worth considering not recursing at all for 
statement blocks within conditional clauses of statemenest which can be 
conditional. These comments do not relate so much to IF/EVALUATE/SEARCH, but 
a special case arrangement can be made there too.

Let me explain the basic sequence as I have encountered it. 

Imagine a set of rules that would allow a verb that can take on the 
alternating conditionals too look something like the following: LEAD is the 
leading protion of a statement like 
ADD data_ref TO data_ref (we ignore GIVING and ROUNDED for now):

LEAD  ON SIZE ERROR  stmt_block
LEAD  ON SIZE ERROR  stmt_block  NOT ON SIZE ERROR  stmt_block
LEAD  NOT ON SIZE ERROR  stmt_block  ON SIZE ERROR  stmt_block
LEAD  NOT ON SIZE ERROR  stmt_block  

If you do not want gnubol to permit transposition, then these rules simplify 
to three instead of four (but that is a side issue)

Because these statements can nest we commence to get some ambiguity in the 
longer rules as the second half can conpete with the first half, as the 
'context free' goal oriented grammar tool attempts to see what is what.

Another issue that arises (and this is not life or death, but just 
important), as this pattern suggests, COBOL is heavily right recursive. That 
is not so good for bison. It's internal stack could get quite deep on deeply 
nested programs. Potentially retaining a sybol for interior statement block 
reductions on the stack long after the matter is done with (in deeply nested 
programs). To begin to deal with the ambiguity and to take a crack at 
containing right recursion, a chop of this basic form can be considered. 

It is actually possible to just cut off the right side of each of the four 
rules, if the iterator that
found this form can keep looking for the statements that will be blocked on 
the right side of 
these forms. So,

iterator_rule :
   basic_statements
 | condtional_cluster
 | iterator_rule basic_statements
 | iterator_rule condtional_cluster

condtional_cluster :
   LEAD  ON SIZE ERROR  
 | LEAD  ON SIZE ERROR  stmt_block  NOT ON SIZE ERROR  
 | LEAD  NOT ON SIZE ERROR  stmt_block  ON SIZE ERROR  
 | LEAD  NOT ON SIZE ERROR   

Now we need code in the actions to manage things as far as what statement is 
in what block, and what block glues to what clause. That is not really too 
difficult, but it is an important change.  Also a minor note is that the 
iterator must also have rules to detect when conditional clusters have zero 
statements glued on. Again that can't be too tough.

But it definitely worth noting immediately that if that outer statement can 
be elevated from naive lower rules, then it might be worth considering 
generalizing that.

But before we do we must consider a special problem or two.

First the stmt_block within the interior of rules such as 
 | LEAD  ON SIZE ERROR  stmt_block  NOT ON SIZE ERROR  
are actually special. They can not be the all encompassing,  most inclusive, 
statement grouping rule.   If the LEAD is (any) arithmetic then we must 
exclude simple arithemtics as the _last_ statement of the interior 
stmt_block, or it will glue to the subsequent conditional clause.  So for 
each specific verb LEAD we must restrict that verbs family of verbs from 
being the last in the in terior block.

That actually creates a lot of special rules, really proliferating rules for 
the interior blocks of a number of conditional families.

There is a second problem. The four rule conditional cluster that I have 
sketched is basically for unterminated conditionals.  Lets back track and try 
to sketch the naive rules for the explicitly scope terminated 
statements.Perhaps these, where END-arith has to be specific to the exact 
verb in LEAD
  
LEAD  ON SIZE ERROR  stmt_block  END-arith
LEAD  ON SIZE ERROR  stmt_block  NOT ON SIZE ERROR  stmt_block  END-arith
LEAD  NOT ON SIZE ERROR  stmt_block  ON SIZE ERROR  stmt_block  END-arith
LEAD  NOT ON SIZE ERROR  stmt_block  END-arith  

Now the outer stmt_blocks, just before the END-arith have a special 
requirement. These can not end in a simple arithmetic of the same specific 
verb, nor may the left portion of the block have an open unterminated 
conditional of the same specific verb: this because the END-arith would try 
to glue to those verbs, rather than the outer verb in LEAD. 

In this situation it at first seems like the solution is to just include 
ENDed forms as one of the things that can occur on the left. That is almost 
okay, but we still say there can not be an  an unterminated conditional in 
that statement block (of the same spcific verb).

The interior blocks of ENDed satement forms can also have some difficulties 
if we allow those blocks to glue short one-legged conditionals of the same 
specific verb, as the next conditional would glue to that rather than the 
outer verb in the portrayed LEAD of the four presented rules. (This is only 
partly relieved if we suppress transposition).

It is exactly at these points that we face the severest problem with the 
tools. Depending upon actual style of coding the rules; we encounter reduce / 
reduce conflicts (r/r) or shift / reduce conflicts (s/r). We can really 
decieve ourself with some of the flexibility of the tools. In the absence of 
special markings, bison will take its own choice in s/r conflicts. But what I 
am seeing is that even with %prec precedence marking the grammars are not 
coming out right. We really do not want shift or reduce; what we really want 
is block defintitions that keep the simple or same-verb statements off the 
right edge of the block. We want to avoid the conflict altogether, not 
resolve it.

So I have bounced back to the idea of trying to elevate the parts of lower 
level rules to the iterator up above. It at first looks very difficult to 
tear apart the dual branched conditional rules with their special interiors. 
But I have come across an insight that I think can be of benefit to those 
working on both the bison and the PCCTS grammars.

This problem opens up when we reallize that every block has two parts. The 
front part is a sequence of distint imperatives, that are either simple 
imperatives, or explicitly terminated statements.  These block_front 
statements do not stick together. They accumulate sequentially within the 
current block but do not nest within each other. (there can be zero, one or 
more of these).  

The back of the block can have open conditionals (unterminated) These open 
conditionals deploy blocks of their own that nest subsequent statements. 
Futher open conditionals can dig a deeper stack of open conditionals. The 
block_back is really only one statement. This block_back also can have zero 
statements. (It can be nested, hoever, ot an indefinite depth).

Generally, there must be one statement in a block (a front_ or a back_). The 
parser actually must be coded to detect a zero statement block. (And there 
are certain logical ORing syntaxes with EVALUATE WHEN clauses that permit 
what on the surface looks like an empty block).

So if we were to break apart the pieces of the conditional cluster, and put 
the fragments up in an iterator, it begins to look something like

iterator_rule :
   basic_statements
 | condtional_clause_initial
 | condtional_clause_alternate
 | block_front
 | block_back
 | END-xyz
 | iterator_rule ... {/*recursing etc. I am simplifying to keep the fragments 
in focus */}

Now we see that more responsibilities will have to be discharged in the 
actions. The actions will have to make certain that things are happening in 
the right sequence; match begin and end scope fagments; and get deeply 
involved in conditional clause recurse managment.  In short a fairly 
impressive data structure will be needed to managed scope and nesting.  But 
the grammar rules begin to look much simpler.

So now to the special problems mentioned above. By turning the rules inside 
out we are nolonger interested in either recusing blocks on the right side, 
nor in elaborating the entire block strucure inside of conditionals. We have 
only a few isolate issues to deal with. So the insight that is available is 
to see that this approach to the rules is really just trolling the source 
code for signals.  Conditonal block mode turns on, and turns off; verb scope 
turns on and turns off.

One by one signals are detected that push things onto a stack of conditional 
clauses (to be reflected in a data structure). One by one the conditionals 
clauses are to end some how, and one by one entire statements can have their 
scope explicitly terminated. The grammar rules detect the signals (the 
actions will have to manage the matching of front and back).

So if we back up to the idea that the interior block can not end on a simple 
statement that
would ambiguously compete for the alternate  conditional clause; we are 
basically only interested in the block_back when we try to express the 
restriction. The trick is to merge
the block back onto the fragment that imposes the restriction.

For example in the original naive rule
LEAD  ON SIZE ERROR  stmt_block  NOT ON SIZE ERROR  stmt_block 

The alternate clause, NOT ON SIZE ERROR, imposses restrictions on the ending 
of the
interior stmt_block. The pieces in there would be

 ... block_front  block_back_restricted_for_arith  NOT ON SIZE ERROR ...

We can problably leave the block_front up in the iterator, and just render 
all the special
block_back_restricted_ rules in lower rules. For example

iterator_rule :
   basic_statements
 | condtional_clause_initial
 | block_front
 | select_block_back_plus_condtional_clause_alternate 
 | END-xyz
 | iterator_rule ... {/*recursing etc. I am simplifying to keep the fragments 
in focus */}


select_block_back_plus_condtional_clause_alternate 
   block_back_restricted_for_arith  NOT ON SIZE ERROR 
 |  block_back_restricted_for_I_O  NOT INVALID KEY 
 |  block_back_restricted_for_unstring NOT OVERFLOW

If you blieve in transposition, you need also the 'positive'  logic clauses
 |  block_back_restricted_for_arith  ON SIZE ERROR 
 |  block_back_restricted_for_I_O  INVALID KEY 
 |  block_back_restricted_for_unstring OVERFLOW

Thus the high level iterator is just a signal detector. And the signal for 
transition from the first
conditional clause to the second is not merely the NOT ON SIZE ERROR clause 
particles. We must have the NOT ON SIZE ERROR clause preceded most 
immediately by a statement that does not attempt to shift it! The actions 
down on these rules need to deal separately with the statements in the 
block_back_restricted, and with the conditional clause. ((To make the 
statement glue correctly to the preceding block, we may actually need all 
these expressed explicitly in the iterator. Here it is presented as a 
subrule, just so we can focus on the iterator as a general pattern of 
included rules)).

The iterator will nolonge be in search of statements, but instead hunts for 
statement fragments.

When you look at the top you will see that we need to keep things straight 
manually. But one important side benefit is that the top of the parser is 
much more robust. GIve it umpteen conditional clauses that have no verb LEAD 
to hang on and it can detect that as easily as detect one orphan. Let an 
END-xyz drift through that has no VERB-xyz stack to pop, and its a snap to 
conclude that it is an error.

Anyway the END-xyz fragments now probably no longer need a special 
block_back. That is an aritfact of the naive rule expression, because it 
attempted to recurse; and the 
'context free' tool gives us static on ambiguities, when it sees that like 
statments can nest.

Instead the implications of explict scope terminators and implicit scope 
terminators are all to be managed by the actions.

The code needed by the actions to manage scope will probably not be 
difficult. Each verb and each conditional clause opens scopes. We will need 
to keep clear in the code that the end of a clause is distinct from the end 
of a statments. There are no explicit clause terminators for example.

Explicit scope teminators, terminate their statement and any clauses, and may 
also implicitly terminate orher statements and their clauses.

I think that we can associate a unique scope ID to every new scope. We can 
link them mindlessly together in numerical sequence, and we can separately 
keep hanging them on a linked list for that verb.  An explict scope 
terminator cures to the ID of the most recent scope on that verbs list, every 
more recent scope is implictly terminated instantly.  If we record the line 
number of scope commencement (the verb or the first particle of a conditional 
clause), and if we can sense the line number of the explicit scope terminator 
(easy) or lookahead token (harder) then we can diagnose every scope with 
begin / end line numbers.

(incidentally, high level error productions, perhaps thus at a high level 
with the special symbol error, may need to clear scope stacks completely , 
under severe conditions).

Basically the verb scopes link together monuarally, the alternating 
conditionals might require a binary link. We may need to list conditional 
clauses of distinct verbs in this later list.

I want to mention that the EVALUATE statement can probalby be subjected to 
this analysis also.  An evaluate has two alternates (which are ordinal and 
not transposable).  The first alternate of an evaluate is a block composed of 
a sequence of WHENs. Rather than a block of imperative statements, it is 
really just a sequence of conditional statements (and asociated block)  that 
have no independent ELSE clause.  The second portion is a block composed of a 
dummy header (WHEN OTHER) and a sequence of statments.

In conclusion, this begins to depict a mainline iterator parser rule. That 
can detect fragments and even see when some of them have become isolated by 
coding errors, or parse disruption.

The abstractions described here are motivated by the experience of serious 
ambiguities when attempting to handle conditional clause blocks as lower 
level recurses with bison. So the entire grammar is turned inside out. The 
iteration is done at the top or near the top. The actions must map statement 
fragments together in the approach described here. This will require a number 
of stacks to track conditional clause nesting. To a certain extent the 'rules 
of the language' are taken out of the grammar rule syntax patterns, and will 
have to be managed by enfocements reflected in actions.

Hope You All find that useful,

Bob Rayhawk

--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.
Follow-Ups:
- RE: gnubol: Non-recursive parsing
  - From: "William M. Klein" <wmklein@ix.netcom.com>
Prev by Date: gnubol: FW: Testers needed - Micro Focus, IBM, and Wang
Next by Date: RE: gnubol: Non-recursive parsing
Prev by thread: gnubol: FW: Testers needed - Micro Focus, IBM, and Wang
Next by thread: RE: gnubol: Non-recursive parsing
Index(es):
- Date
- Thread