[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: reference characterizers



In a message dated 11/1/99 7:45:55 PM EST, mck@tivoli.mv.com commenced a 
discussion of "... how we might resolve the syntactic ambiguity ..." within 
code such as the following snip 

<< 
     IF a = b OR ( c < d AND e < f ) OR g
>>

which later got bandied about as something like, 

<< 
     IF a = b OR g OR ( c < d AND e < f )

>>

where 'g' is a named conditional.

I would like to again encourage a technique that intercepts the tokens 
between the lexer and the parser for the purpose of characterizing 
_references_ before they reach the grammar rules.

So that the grammar sees 
 cond :   IF DATA_REF REL-OPERATOR DATA_REF BOOL_OP DATA_REF ...
or
 cond :  IF DATA_REF REL-OPERATOR DATA_REF BOOL_OP CONDITIONAL_REF ...


This kind of distinction can be made by an access to the symbol table. It can 
be advantageous to do this in a filter program, rather than in the parser 
itself, because it is sometimes too late in the parser.  
  
  Structurally your compiler takes on the following shape
      main() {
         parse() {
            ref_characterizer() { 
              //characterizes REFERENCES to more specific REF_ tokens
              lexer() {
                //may not be able to type a reference
              }
            }
         }
      }


This notion definitely lends itself to the purpose of coding an exhaustive 
list of error production rules, such as

 cond:   IF DATA_REF REL-OPERATOR CONDITIONAL_REF ...
       {list_error(line_number,"conditional cannot be compared to a data 
name.");}

Explicating the references by type before they reach the parser will also 
make it possible to diagnose serious syntax problems with references before 
they bubble up to the higher level production rules; which will keep the 
grammar very much less cluttered.

The  ref_characterizer() functions can be specific for each division (or even 
for sections). The idea of reference characterizers is recommended most 
strongly for the procedure division. It will help in the data division as 
well; mostly for REDEFINES and RENAMES clauses, but also for a few tidbits in 
the FDs. In other words, in the DATA DIVISION there are data declarations 
which are distinct from data references. If at all possible we should not 
pass these up to the grammar globed as simply PROG_NAME. ((It is notable that 
the FDs contain forward references, which would be nice to distinguish for 
the parser)).

As an aside, function names, paragraph names and section names should also be 
quite distinct tokens, as should file names, data names, and condition names. 
The lexer can not make this kind of distinction lexically (In BASIC the #, $, 
and ! actually support lexically based typing of declarations and 
references). In COBOL a check of the symbol table will be necessary.

Elsewhere, I have recommended that we become really precise on data 
references; distinguished qualifiable from non-qualifiable, 
reference-modifiable from non-reference-modifiable, and subscriptable from 
non-subscriptable.  If we can manifest these distinctions down in the 
production rules for references rather than up in the rules for statements 
and conditional expressions, we will not only make the grammar easier to 
understand, but we will probably be able to make the compiler more robust.

We should be able to capture most data_reference_syntax_errors with artful 
error productions without sending the parser into an error recovery mode.

Best Wishes,
Bob Rayhawk
RKRayhawk@aol.com


--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.