[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: Hacks needed to parse COBOL



In a message dated 12/3/99 7:38:27 AM EST, 
the indomitable Tim Josling, TIMJOSLING@prodigy.net writes:

<< 
 I found another problem fixing these ones. This comes from accommodating
 nested conditional expressions.
 
 add a to b 
     size error
        unstring
            on
 
 After 'on' could be 'overflow' or 'size'.
 
 If it's 'overflow', then the unstring continues, if 'size' then the
 unstring is finished and should be reduced. This is an SR conflict but
 shifting does not resolve it. We need to know what is coming.
  >>

That is a really good find.

I don't want to push hard on an idea I posted in another interaction just a 
few hours ago. But really you simply do not need any kind of specialized 
lookahead for any of this. Instead you need concurrent rules competing for 
those tokens.  In this case the ON is optional in the rule set for both 
competing interpretations, and the SIZE or OVERFLOW precipitates the 
appropriate reduction.  The tools like bison have a single current state, but 
that state can represent one position in many competing rules.  The idea of 
looking ahead for a sneak token to get started on the correct rule definitely 
shows that you do not see how the tool works. That is meant in an entirely 
positive way. The tool can do these things for you. 

There is no basic challenge to get concurrent rules in scope, that is fairly 
natural. Getting the rules to terminate as you want is very hard in modern 
COBOL because simple unterminated conditionals will have to be competing with 
complex unterminated conditionals,  complex terminated conditionals and 
simple terminated conditionals (all of many varieties).

A number of situations you have contemplated seem to you to involve 
'infinite' lookahead as you describe it.  That is not true. Many statements 
are going to condense into recurses that are glued into your highlevel rules 
as a single symbol, so the token you need to find as lookahead will be right 
there (one or two steps away, not inifinite).

This situation seems infinite to you because you are starting from the bottom 
up.  Try writing the rules from the top down. Lets say that you have the 
master_stmt_list somewhere that has all statements. You then try to code the 
add rule like this sketch

stmt_add_simple :
 ADD dataref TO dataref %prec LESS_THAN_GIVING
   {/* call me rule 1*/;}
 | ADD dataref TO dataref GIVING dataref
   {/* call me rule 2*/;}

stmt_add_deep_thought :
 ADD dataref TO dataref  ON SIZE ERROR master_stmt_list
   {/* call me rule 3*/;}
| ADD dataref TO dataref  
   ON SIZE ERROR master_stmt_list
   NOT ON SIZE ERROR master_stmt_list
   {/* call me rule 4*/;}

Assuming numerous other rules, the above is really a set of four rules that 
will be launched if the compiler is still on its feet when it encounters ADD. 

When the parsed sequence of input is infact
    ADD dataref TO dataref  
For that first step and each of the next three steps, the four rules remain 
in current scope competing with one another for hopeful reduction. Should the 
next token be GIVING, the first, third and fourth rule drop. Simple.  Much of 
the instructional material for yacc/bison type tools get folks very oriented 
to shift/reduce and reduce/reduce conflict resolution.  The is a tendency to 
think that you should only have one thing going at a time.  Not at all. You 
_want_ concurrency, that is exactly how to engage the lookahead function in 
the parse algorithm.

These rules are over simplified for presentation of the main point about 
concurrency.

Same hold for input that might be in the pattern
    ADD dataref TO dataref  ON SIZE ERROR master_stmt_list

at the end of that both rule 3 and rule 4 have been trucking along, if the 
next token is NOT rule three will drop off and rule four will keep trying. 
Incidentally rule three would need something like a %prec LESS_THAN_NOT_SIZE 
to compel it to not reduce (that is to allow a shift that enables rule for to 
win out for the foreseeable future).

((which is going be difficult to figure out because of the multiple roles of 
NOT)

And incidentally %prec LESS_THAN_GIVING actually must also be less than ON 
SIZE ERROR clauses to yield to a shift for that.  The arithmetics will be 
tricky indeed. 

Now the lookahead can also be applied to the explicit scope terminators. (all 
of the following groupings are just for illustration of the lookahead

stmt_add_simple :
 ADD dataref TO dataref %prec LESS_THAN_GIVING
   {/* call me rule 1.a*/;}
| ADD dataref TO dataref END-ADD
   {/* call me rule 1.b*/;}
| ADD dataref TO dataref GIVING dataref
   {/* call me rule 2*/;}

stmt_add_deep_thought :
 ADD dataref TO dataref  ON SIZE ERROR master_stmt_list
   {/* call me rule 3.a*/;}
| ADD dataref TO dataref  ON SIZE ERROR master_stmt_list END-ADD
   {/* call me rule 3.b*/;}
| ADD dataref TO dataref  
   ON SIZE ERROR master_stmt_list
   NOT ON SIZE ERROR master_stmt_list %prec LESS_THAN_END_ARITH
   {/* call me rule 4.a*/;}
| ADD dataref TO dataref  
   ON SIZE ERROR master_stmt_list  
   NOT ON SIZE ERROR master_stmt_list  END-ADD   
  {/* call me rule 4.b*/;}


Your lookahead mechanism I fear is very vulnerable. It competes with the 
intrinsic lookahead in the algorithm deployed by the grammar tool. I am 
confident that a number of your lookahead techniques are motivated by an over 
estimation of the distance to the requisite lookahead token after subrule 
recurse reductions. In the rule 4.a above that we can easily involves a 
potentially gigantic distance from the ADD to the END-ADD, when you need it 
the END-ADD is just one step away from the reduced  symbol master_stmt_list 
(which would be more than a selection but a huge recurse with all manner of 
imperative and conditional statments all piled up neatly for you. When you 
are thus at the edge of rule 4.a and one shy of the end of 4.b, if the 
lookahead END-ADD rule 4.a will drop yielding a to a shift of the higher 
precedence END-arith token, and rule 4.b moves a step further getting ready 
to reduce. 

Incidentally if  TO has lower precedence than  GIVING, and GIVING has lower 
precedence
than ON SIZE ERROR tokens , some of the %prec markers are not needed in the 
actual examples. This is because the tokens are in the rule on that level. 
That is not realistic though. The tokens even if defined in a convenient 
order are very likely to be in subrules, and we will be managing shift/reduce 
up in higher rules that have only symbols; so should we decide to commit to 
bison the %prec markers will be present on many rules I am sure.


Hope that helps,
Bob Rayhawk
RKRayhawk@aol.com

--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.