[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gnubol: non-recursive OFIN
Paraphrased, some of the early grammar rules had data_reference rules
that had lots of possible ambiguities, like this.
| TOK_NUMBERORNAMEORPIC
opt_P_of_in_clauses
opt_P_subscript
opt_P_ref_mod
{;}
(That is a paraphrase, if it is wrenched, I did it.)
The optionality of the alternatives creates many competing
continuations with epsilons:
(The following are my simple rules to illustrate the enumerations
that could cause the epsilons).
|TOK_NUMBERORNAMEORPIC
opt_P_of_in_clauses
{;}
|TOK_NUMBERORNAMEORPIC
opt_P_subscript
{;}
|TOK_NUMBERORNAMEORPIC
opt_P_ref_mod
{;}
|TOK_NUMBERORNAMEORPIC
opt_P_subscript
opt_P_ref_mod
{;}
It is sometimes best to just flatten and explicate these rules: So partly,
then we might have a different approach, such as
|TOK_NUMBERORNAMEORPIC definitely_P_ref_mod
{;}
|TOK_NUMBERORNAMEORPIC definitely_P_subscript definitely_P_ref_mod
{;}
... and more
This to try to get rid of the epsilons.
But also the OFIN clause sub rule will cause problems because it is
in fact optional, so it is natural to express it with a rule collection
that includes an epsilon /*empty */ alternate.
There is another way, it will require a higher level iterator, similar to
the ideas posted relating to statements within blocks. Blocks are sequences
of statements, so it is easy to atleast consider an iterator that looks
for one statement after the other in an iterative fashion.
Data references do not seem like iterators at first. They seem singular in
nature. We know of some verb constructs that allow, apparently as a kind of
exceptional thing, a list of data references. But of course you can look at
that the other way around, although it seems strange at first to us
COBOLers.
Since we must sometimes deal with a single reference, and sometimes with a
list of references. It is just as legitimate to assume that all data
references are lists of data references; and that two special cases are when
we have just one, and when we have none. And there is a special case,
actually common, that some rules would only want one data reference not a
list.
So, maybe, a reference to data actually looks something like
data_ref :
single_ref
| data_ref single_ref
So keeping the absolutely critical issues of reference modification and
subscripting to the side for a moment, we can try to deal with qualification
via OFIN for single_ref in the naive way:
single_ref :
TOK_data_ref
| token_data_ref opt_OFIN_clause
/*impossible rule here, I am leaving epsilon off: this would never end */
opt_OFIN_clause :
OFIN single_ref
At that point we are already
off to the races with a shift/reduce conflict up in single_ref that
we basically want shift to win. But regretably the interest in single_ref
as a lookahead for the opt_OFIN_clause will reduce the lowest rule
opt_OFIN_clause and _also_ the top data_ref rules. That will be a
reduce / reduce
conflict that is hard to suppress.
(Further more we are recursing on the right, which is atleast inadvisable
in an LR algorithm like bison).
When we change to the real rule
opt_OFIN_clause :
{; /* empty needed to make it acually optional */
| OFIN single_ref
A brand new and somewhat devastating reduce / reduce conflict will arise for
any rule set that includes a opt_OFIN_clause as one of it alternates as the
last symbol, and also has an alternate that can end just before that: like
single_ref :
TOK_data_ref
| token_data_ref opt_OFIN_clause
The grammar tools will see that epsilon on the front of the opt_OFIN_clause
is synonymous with nothing: so the two alternates are ambiguous. The grammar
tools are very smart.
So a solution is to turn the structure inside out, and let actions track the
relationships, not the grammar rules.
data_ref :
single_ref
| single_ref OFIN
| data_ref single_ref
| data_ref single_ref OFIN
single_ref :
TOK_data_ref
Now we are not recursing at all in the lower sub rule for single_ref,
we are merely iterating (by means of what is unfortunately called a
'recursion' rule up above, it is really a just a tail recursion and so is
actually just an iteration). The actions will
have to take on new responsibilities here.
((relating this to posts about choosing top-down or bottom-up OFIN clause
validation. The coding strategy here suggested actually leaves us with an
OFIN walk by the grammar rules that is from the bottom to the top of the
OFIN chain. Relative to that other conversation
this coding strategy actually reduces
the available exploitable walks that make validation coincide with parse.
Bison, for example, can recurse on the right, although that may not be
advisable. Using bison to recurse on the right would leave an available
pop sequence that is top to bottom for the OFIN validation.
The coding strategy here suggeted
makes bison into a quick reducer. Still you
can record the OFIN chain in a data structure and still walk downward if you
want. The coding strategy suggested here does not outlaw that, it just
eliminates all recursion, and so the right recursion is not available as
an exploitable artifact of the tool operation.))
Anyway, about this inverted rule structure
data_ref :
single_ref
| single_ref OFIN
| data_ref single_ref
| data_ref single_ref OFIN
single_ref :
TOK_data_ref
We would need a means to detect the situation where no single_ref followed
a lead-on rule with OFIN at the end. But we need that capability anyway!,
because folks do make coding errors of that nature.
It might actually be easier to do it up here rather than down in a myriad
of rules about subscripting and or reference modification. data_ref might
need a wrapper like this
/*
this is the wrapper, name left simple so upper rules
don't hurt your eyes
*/
data_ref : data_ref_wrappee
{if ($1.lastpart == loose_ofin)
{ printf this is not good
throw commutable error flag switch }
else
{smooth sailing for semantic emissions }
}
data_ref_wrappee :
single_ref
{$$.lastpart = hatch_battened;}
| single_ref OFIN
{$$.lastpart = loose_ofin;}
| data_ref single_ref
{$$.lastpart = hatch_battened;}
| data_ref single_ref OFIN
{$$.lastpart = loose_ofin;}
So boiling it down to essentials. A rule like
| single_ref OFIN
has an unambiguous token on the end. So a pair of competing rules, like
single_ref
| single_ref OFIN
are flattened and do not have an epsilon as a lookahead; thus avoiding
a likely generator of reduce / reduce conflicts. The end of the rule does
not look like the end of any other rule, so it will be less likely to lead
us into r/r or s/r conflicts of that nature.
For completeness then the subscripting can go above this set of rules in an
explicated fashion:
some_rule :
| data_ref definitely_P_ref_mod
{;}
| data_ref definitely_P_subscript definitely_P_ref_mod
{;}
| data_ref definitely_P_subscript
{;}
This could be enhanced to have three variants available beneath if we choose
to drive some
validation in parser rules:
data_ref_that_may_be_ref_mod :
data_ref_that_may_be_subscripted :
data_ref_that_may_be_ref_mod_and_subscripted :
That would require SYMT support. These rules would start their iteration with
a distinct token, the OFIN clause scan could be less rigorous and common to
the several rules.
There is not yet much support for type precision in the rules however. So the
suggestion can
be viewed as containable to the single data_ref image sketched above.
When in doubt, turn the world upside down; or is that downside up.
Hope that helps. :-)
Bob Rayhawk
RKRayhawk@aol.com
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.