[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gnubol: non-recursive OFIN




Paraphrased, some of the early grammar rules had data_reference rules
that had lots of possible ambiguities, like this.

| TOK_NUMBERORNAMEORPIC
 opt_P_of_in_clauses 
 opt_P_subscript  
 opt_P_ref_mod 
 {;}

(That is a paraphrase, if it is wrenched, I did it.)

The optionality of the alternatives creates many competing 
continuations with epsilons:

(The following are my simple rules to illustrate the enumerations 
that could cause the epsilons).

|TOK_NUMBERORNAMEORPIC
 opt_P_of_in_clauses
 {;}
|TOK_NUMBERORNAMEORPIC
 opt_P_subscript 
 {;}
|TOK_NUMBERORNAMEORPIC
 opt_P_ref_mod 
 {;}
|TOK_NUMBERORNAMEORPIC
 opt_P_subscript 
 opt_P_ref_mod 
 {;}


It is sometimes best to just flatten and explicate these rules: So partly, 
then we might have a different approach, such as

|TOK_NUMBERORNAMEORPIC definitely_P_ref_mod 
{;}
|TOK_NUMBERORNAMEORPIC definitely_P_subscript definitely_P_ref_mod 
{;}

... and more


This to try to get rid of the epsilons. 

But also the OFIN clause sub rule will cause problems because it is
in fact optional, so it is natural to express it with a rule collection 
that includes an epsilon /*empty */ alternate.

There is another way, it will require a higher level iterator, similar to 
the ideas posted relating to statements within blocks. Blocks are sequences 
of statements, so it is easy to atleast consider an iterator that looks 
for one statement after the other in an iterative fashion.

Data references do not seem like iterators at first. They seem singular in 
nature.  We know of some verb constructs that allow, apparently as a kind of
exceptional thing, a list of data references. But of course you can look at 
that the other way around, although it seems strange at first to us 
COBOLers.

Since we must sometimes deal with a single reference, and sometimes with a 
list of references. It is just as legitimate to assume that all data
references are lists of data references; and that two special cases are when
we have just one, and when we have none. And there is a special case, 
actually common, that some rules would only want one data reference not a 
list.

So, maybe, a reference to data actually looks something like

data_ref :
   single_ref
 | data_ref single_ref


So keeping the absolutely critical issues of reference modification and 
subscripting to the side for a moment, we can try to deal with qualification 
via OFIN for single_ref in the naive way:

single_ref :
 TOK_data_ref
 | token_data_ref opt_OFIN_clause

/*impossible rule here, I am leaving epsilon off: this would never end */
opt_OFIN_clause :
  OFIN single_ref

At that point we are already 
off to the races with a shift/reduce conflict up in single_ref that
we basically want shift to win. But regretably the interest in single_ref 
as a lookahead for the opt_OFIN_clause will reduce the lowest rule 
opt_OFIN_clause and _also_ the top data_ref rules. That will be a 
reduce / reduce
conflict that is hard to suppress.

(Further more we are recursing on the right, which is atleast inadvisable 
in an LR algorithm like bison). 

When we change to the real rule

opt_OFIN_clause :
 {; /* empty needed to make it acually optional */
 | OFIN single_ref

A brand new and somewhat devastating reduce / reduce conflict will arise for 
any rule set that includes a opt_OFIN_clause as one of it alternates as the 
last symbol, and also has an alternate that can end just before that: like

single_ref :
 TOK_data_ref
 | token_data_ref opt_OFIN_clause

The grammar tools will see that epsilon on the front of the opt_OFIN_clause
is synonymous with nothing: so the two alternates are ambiguous. The grammar 
tools are very smart. 

So a solution is to turn the structure inside out, and let actions track the
relationships, not the grammar rules.


data_ref :
   single_ref
 | single_ref OFIN 
 | data_ref single_ref
 | data_ref single_ref OFIN

single_ref :
 TOK_data_ref


Now we are not recursing at all in the lower sub rule for single_ref,
we are merely iterating (by means of what is unfortunately called a
'recursion' rule up above, it is really a just a tail recursion and so is
actually just an iteration). The actions will 
have to take on new responsibilities here. 

((relating this to posts about choosing top-down or bottom-up OFIN clause 
validation. The coding strategy here suggested actually leaves us with an 
OFIN walk by the grammar rules that is from the bottom to the top of the 
OFIN chain. Relative to that other conversation 
this coding strategy actually reduces
the available exploitable walks that make validation coincide with parse. 
Bison, for example, can recurse on the right, although that may not be 
advisable. Using bison to recurse on the right would leave an available 
pop sequence that is top to bottom for the OFIN validation. 
The coding strategy here suggeted
makes bison into a quick reducer. Still you 
can record the OFIN chain in a data structure and still walk downward if you
want. The coding strategy suggested here does not outlaw that, it just 
eliminates all recursion, and so the right recursion is not available as
an exploitable artifact of the tool operation.))


Anyway, about this inverted rule structure

data_ref :
   single_ref
 | single_ref OFIN 
 | data_ref single_ref
 | data_ref single_ref OFIN

single_ref :
 TOK_data_ref

We would need a means to detect the situation where no single_ref followed
a lead-on rule with OFIN at the end.  But we need that capability anyway!,
because folks do make coding errors of that nature.
It might actually be easier to do it up here rather than down in a myriad 
of rules about subscripting and or reference modification. data_ref might 
need a wrapper like this

/*
  this is the wrapper, name left simple so upper rules
  don't hurt your eyes
*/
data_ref : data_ref_wrappee 
 {if ($1.lastpart == loose_ofin)
    { printf this is not good
      throw commutable error flag switch }
  else
    {smooth sailing for semantic emissions }
 }


data_ref_wrappee :
   single_ref
   {$$.lastpart = hatch_battened;}
 | single_ref OFIN 
   {$$.lastpart = loose_ofin;}
 | data_ref single_ref
   {$$.lastpart = hatch_battened;}
 | data_ref single_ref OFIN
   {$$.lastpart = loose_ofin;}


So boiling it down to essentials. A rule like
  | single_ref OFIN 
has an unambiguous token on the end. So a pair of competing rules, like
    single_ref
  | single_ref OFIN 

are flattened and do not have an epsilon as a lookahead; thus avoiding 
a likely generator of reduce / reduce conflicts. The end of the rule does 
not look like the end of any other rule, so it will be less likely to lead
us into r/r or s/r conflicts of that nature.

For completeness then the subscripting can go above this set of rules in an 
explicated fashion:

some_rule :
 | data_ref definitely_P_ref_mod 
 {;}
 | data_ref definitely_P_subscript definitely_P_ref_mod 
 {;}
 | data_ref definitely_P_subscript 
 {;}

This could be enhanced to have three variants available beneath if we choose 
to drive some 
validation in parser rules:

  data_ref_that_may_be_ref_mod :

  data_ref_that_may_be_subscripted :

  data_ref_that_may_be_ref_mod_and_subscripted :

That would require SYMT support. These rules would start their iteration with 
a distinct token, the OFIN clause scan could be less rigorous and common to 
the several rules.

There is not yet much support for type precision in the rules however. So the 
suggestion can
be viewed as containable to the single data_ref image sketched above.

When in doubt, turn the world upside down; or is that downside up. 
Hope that helps.  :-) 

Bob Rayhawk
RKRayhawk@aol.com


--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.