[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: distinguishing data refs and conditional refs





In a message dated 11/3/99 3:05:50 AM EST, 
Randall Bart (Barticus@att.net) writes:

<< 
 Can you give me an example where data names and procedure names can occur 
 in the same context?  Procedure names always occur after GO, PERFORM, INPUT 
 PROCEDURE, OUTPUT PROCDURE, ALTER (boo), or a period (when declared).
 >>

I appreciate you responding to my posting. My orientation is to bullet-proof 
the grammar with exacting specifications.  The lexer can not always do that 
alone.

You have used the word 'context' in your reply. We can each mean several 
different things by that word. The tools used and discussed in this project 
are generally 'context free' processing mechanisms, (comments here are 
basically oriented to that sense of the word, although near the end I drop 
back to a use that is less technical).

In a context free grammar there could be a rule such as

valid_ref : simple_ref
  | qualified_ref
  | subscripted_ref
  | reference_modified_ref
  | qualified_subscripted_reference_modified_ref
   { blip_cross_reference_list($1,line_number)
   ;


rules like simple_ref and qualified_ref would be reduced in a context free 
environment. They do not know that they are occuring in our mind under the 
potential reduction of a rule like
 move_stmt : MOVE valid_ref TO valid_ref 
   ....
or 
 perf_stmt : PERFORM valid_ref THRU valid_ref
   .....

So I am proposing that the SYMT (symbol table) be used in stages beneath the 
parser to characterize references very clearly. So that a data reference is 
distinct from a procedure reference before it ever gets to a rule that 
attempts to reduce it as valid syntactically.

Data references would return as ref_data tokens and could reduce to 
valid_ref_data, and procedure references would return as ref_proc and could 
reduce as valid_ref_proc.

Rules for data referencing verbs would (on first approach) only want to see 
data references (et cetera).

for example

 move_stmt : MOVE valid_data_ref TO valid_data_ref 
       {positive production}
or 
 perf_stmt : PERFORM valid_proc_ref THRU valid_proc_ref
       {positive production}


An "example where data names and procedure names can occur 
in the same context" is any program error that can produce that. For example

  MOVE valid_data_ref TO valid_proc_ref 
or
  MOVE valid_proc_ref TO valid_data_ref 

The 'context' in which we are compiling is not all possible valid programs, 
but simply all possible programs. And there is a need to keep the compiler on 
it's feet. 

In other posts I have speculated about a need to have lexer states (for the 
procedure division) to raise an assumption about references as being either 
data references or procedure references, based upon the most recent verb 
recognized. That is not a possible parser action as the ungoing rule for the 
verb _has_not_reduced_yet_. States are dangerous exactly because of the 
temporal asynchrony of lexer state and rule reduction. But there is a 
possible need to be able to characterize references to _undefined_ items as 
either (assumed) data references or (assumed) procedure references. States 
_are_ context sensitive. The conundrum is that the lexer's state is ahead of 
the parser (often).

With references to previously defined data items and (with a SYMT that 
already contains procedure division section and paragraph names) previously 
sensed procedure names, there is no need for the context sensitivity (lexer 
state). We just need error productions like

rejected_move_stmt :  MOVE valid_data_ref TO valid_proc_ref 
  { error(line_number, " procedure name not valid in TO clause")

or
rejected_move_stmt :  MOVE valid_data_ref TO valid_proc_ref 
 |  MOVE valid_proc_ref TO valid_data_ref 
 |  MOVE valid_proc_ref TO valid_proc_ref 
  { error(line_number, " procedure name not valid in move statement")

But for _undefined_ items we made need to apply the assumption (in the lexer 
or a filter beneath the parser).

Keep in mind that the reason for all of this is to thoroughly handle all of 
the complexity of references down in the reference rules; exactly to keep it 
out of the statement and expression rules (where we would just be multiplying 
the problem and those rules are challenging enough). For example we do not 
want to code determination and error handling of an invalid qualification (an 
OF/IN clause group) inside the rule for MOVE and again for ADD. Same applies 
for reference modification diagnosis (we do not want to do it again and again 
in MOVE and INSPECT). Yet if we do reduce procedure references separately we 
need a few error productions up at the higher level. As the 
rejected_move_stmt rule suggests.

My examples have implied that the nodes passed up from reference reduction 
rules contain an error flag member that can either be interrogated by the 
statement and expression rules to short circuit code emission or to support 
passing the error flag via a member in the AST structure to allow code 
generation to be later halted (passing the buck) or to record that erroneous 
code was ignored and the node is really a reference to s dummy (reference 
substitution) to support complete source code compilation (but prevent 
generation of invalid executables).


So trying to tie it all together, let me describe it this way. 

If a reference is inherently invalid syntactically, that error should be 
diagnosed in the rules for references. For example, reference modifying an 
item that you cannot reference modify, or subscripting an item not dominated 
by an occurs clause. This is to get the complexity out of the higher level 
rules. But those rules (even the positive productions) must be prepared for 
error flags or dummy references.

When a syntactically valid refence occurs incorrectly in a statement (that 
is, it is 'out of context'), we probably need an error production to trap it. 
Such error productions are parallel to the valid productions for those kind 
of statements or expressions. These rules will need to turn on the sub-node 
error flags and do substitutions to dummy references to support further 
compilation.

Bob Rayhawk
RKRayhawk@aol.com

--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.