[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnubol: Hacks needed to parse COBOL
In a message dated 12/3/99 7:38:27 AM EST,
the indomitable Tim Josling, TIMJOSLING@prodigy.net writes:
<<
I found another problem fixing these ones. This comes from accommodating
nested conditional expressions.
add a to b
size error
unstring
on
After 'on' could be 'overflow' or 'size'.
If it's 'overflow', then the unstring continues, if 'size' then the
unstring is finished and should be reduced. This is an SR conflict but
shifting does not resolve it. We need to know what is coming.
>>
That is a really good find.
I don't want to push hard on an idea I posted in another interaction just a
few hours ago. But really you simply do not need any kind of specialized
lookahead for any of this. Instead you need concurrent rules competing for
those tokens. In this case the ON is optional in the rule set for both
competing interpretations, and the SIZE or OVERFLOW precipitates the
appropriate reduction. The tools like bison have a single current state, but
that state can represent one position in many competing rules. The idea of
looking ahead for a sneak token to get started on the correct rule definitely
shows that you do not see how the tool works. That is meant in an entirely
positive way. The tool can do these things for you.
There is no basic challenge to get concurrent rules in scope, that is fairly
natural. Getting the rules to terminate as you want is very hard in modern
COBOL because simple unterminated conditionals will have to be competing with
complex unterminated conditionals, complex terminated conditionals and
simple terminated conditionals (all of many varieties).
A number of situations you have contemplated seem to you to involve
'infinite' lookahead as you describe it. That is not true. Many statements
are going to condense into recurses that are glued into your highlevel rules
as a single symbol, so the token you need to find as lookahead will be right
there (one or two steps away, not inifinite).
This situation seems infinite to you because you are starting from the bottom
up. Try writing the rules from the top down. Lets say that you have the
master_stmt_list somewhere that has all statements. You then try to code the
add rule like this sketch
stmt_add_simple :
ADD dataref TO dataref %prec LESS_THAN_GIVING
{/* call me rule 1*/;}
| ADD dataref TO dataref GIVING dataref
{/* call me rule 2*/;}
stmt_add_deep_thought :
ADD dataref TO dataref ON SIZE ERROR master_stmt_list
{/* call me rule 3*/;}
| ADD dataref TO dataref
ON SIZE ERROR master_stmt_list
NOT ON SIZE ERROR master_stmt_list
{/* call me rule 4*/;}
Assuming numerous other rules, the above is really a set of four rules that
will be launched if the compiler is still on its feet when it encounters ADD.
When the parsed sequence of input is infact
ADD dataref TO dataref
For that first step and each of the next three steps, the four rules remain
in current scope competing with one another for hopeful reduction. Should the
next token be GIVING, the first, third and fourth rule drop. Simple. Much of
the instructional material for yacc/bison type tools get folks very oriented
to shift/reduce and reduce/reduce conflict resolution. The is a tendency to
think that you should only have one thing going at a time. Not at all. You
_want_ concurrency, that is exactly how to engage the lookahead function in
the parse algorithm.
These rules are over simplified for presentation of the main point about
concurrency.
Same hold for input that might be in the pattern
ADD dataref TO dataref ON SIZE ERROR master_stmt_list
at the end of that both rule 3 and rule 4 have been trucking along, if the
next token is NOT rule three will drop off and rule four will keep trying.
Incidentally rule three would need something like a %prec LESS_THAN_NOT_SIZE
to compel it to not reduce (that is to allow a shift that enables rule for to
win out for the foreseeable future).
((which is going be difficult to figure out because of the multiple roles of
NOT)
And incidentally %prec LESS_THAN_GIVING actually must also be less than ON
SIZE ERROR clauses to yield to a shift for that. The arithmetics will be
tricky indeed.
Now the lookahead can also be applied to the explicit scope terminators. (all
of the following groupings are just for illustration of the lookahead
stmt_add_simple :
ADD dataref TO dataref %prec LESS_THAN_GIVING
{/* call me rule 1.a*/;}
| ADD dataref TO dataref END-ADD
{/* call me rule 1.b*/;}
| ADD dataref TO dataref GIVING dataref
{/* call me rule 2*/;}
stmt_add_deep_thought :
ADD dataref TO dataref ON SIZE ERROR master_stmt_list
{/* call me rule 3.a*/;}
| ADD dataref TO dataref ON SIZE ERROR master_stmt_list END-ADD
{/* call me rule 3.b*/;}
| ADD dataref TO dataref
ON SIZE ERROR master_stmt_list
NOT ON SIZE ERROR master_stmt_list %prec LESS_THAN_END_ARITH
{/* call me rule 4.a*/;}
| ADD dataref TO dataref
ON SIZE ERROR master_stmt_list
NOT ON SIZE ERROR master_stmt_list END-ADD
{/* call me rule 4.b*/;}
Your lookahead mechanism I fear is very vulnerable. It competes with the
intrinsic lookahead in the algorithm deployed by the grammar tool. I am
confident that a number of your lookahead techniques are motivated by an over
estimation of the distance to the requisite lookahead token after subrule
recurse reductions. In the rule 4.a above that we can easily involves a
potentially gigantic distance from the ADD to the END-ADD, when you need it
the END-ADD is just one step away from the reduced symbol master_stmt_list
(which would be more than a selection but a huge recurse with all manner of
imperative and conditional statments all piled up neatly for you. When you
are thus at the edge of rule 4.a and one shy of the end of 4.b, if the
lookahead END-ADD rule 4.a will drop yielding a to a shift of the higher
precedence END-arith token, and rule 4.b moves a step further getting ready
to reduce.
Incidentally if TO has lower precedence than GIVING, and GIVING has lower
precedence
than ON SIZE ERROR tokens , some of the %prec markers are not needed in the
actual examples. This is because the tokens are in the rule on that level.
That is not realistic though. The tokens even if defined in a convenient
order are very likely to be in subrules, and we will be managing shift/reduce
up in higher rules that have only symbols; so should we decide to commit to
bison the %prec markers will be present on many rules I am sure.
Hope that helps,
Bob Rayhawk
RKRayhawk@aol.com
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.