[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: distinguishing data refs and conditional refs



In a message dated 11/1/99 10:44:35 PM EST, mck@tivoli.mv.com adding further 
excellent comments on syntax such as, 
<< 
    IF a = b OR ( c < d AND e < f ) OR g
 >>
writes
>>
 We are propagating the subject and relation from a factor to a term.
 I guess that's the way it is, but it makes me a little
 uncomfortable.  It also means, that if Fred is right, making the
 implied grouping explicit, changes the semantics of the phrase
 profoundly.  Now you can say "Ick", Chris.
 
 >>

To which I will add the small notion ...
Now back track and simplify to just a relational operator like equal and drop 
a NOT in there somewhere. and say what is implied. As in

    IF a = b AND NOT g
or
    IF a NOT = b AND g

and explain implied grouping, and state which standard agrees. Now say, Ick.

No matter what, if you can perk references to the precision of distinguishing 
data_item_reference versus conditional_name_reference, the grammar will be 
stronger and you can also explicate (sorry) the error productions.

In this connection Fred Neale's (fneale@tss.com.au ) 11/1/99 8:15:41 PM EST 
posting of the
  BODMAS precedence rule is nice
    Brackets
    Of
    Division
    Multiplication
    Addition
    Subtraction

What is your source for that? Is that complete?  

Surely function references must be high precedence, and reducing subsctipts 
is right up there with OF (atleast before arithmetic operators).


C (not C++) has a FUMAS-REBBBLL-TAC ("fumas rebel tack", as any Smother's 
Brother would recognize). Where Relationals and Equalities have higher 
precedence then Logicals. But COBOL's natural language heritage does some 
violence to any such straight forward approach.

Still you can rip the strangeness out of the conditionals issue discussed 
here by providing distinct tokens. (That is decidedly not the same as rip the 
strangeness our of implied subjects and operators).


In some of the original material quoted in this thread
 <value1> = <value2> OR <value3>
   ....
can become multiple rules something like
 <data_value1> = <data_value2> OR <data_value3>
   .....  production D
and
 <data_value1> = <data_value2> OR <cond_valueA>
  ....  production C

((where data_valueN and cond_valueX are references (quite distinct from 
declarations but also) clearly distinct from one another in kind))

We just have to think of the return from the lexer (actually from the 
intermediate filter as I have suggested) as being more exact. To this we can 
also add an undefined reference.  In other words, if we are scanning the 
procedure division in expectation of keywords or datanames and it is not a 
keyword and it is not a dataname in the SYMT, then it is an undefined 
dataname.

Production rules for  data_nameN can include undefined dataname (which would 
be diagnosed at the lower level. The action (or subsequent intermediate form 
processors) can trap the error and avoid emitting code.

for example
cond :  <data_value1> = <data_value2> OR <cond_valueA>
  { if ($1.errflag || $3.errflag || $5.errflag) 
     { ; // do nothing it's already been diagnosed}
    else
     {emit(code or AST for this kind of conditional)}
  };


alternatively
cond :  <data_value1> = <data_value2> OR <cond_valueA>
  {  //pass the buck
    thisAST.errglag |= $1.errflag || $3.errflag; || $5.errflag;
    emit(AST for this kind of conditional);
    // AST processor will have to suppress code emission.
    // Even at that error data/conditional references production rules
    // can actually return references to processable dummies.
    // As long as the error is diagnosed, and a high level switch is set
    // to suppress later stages; robust compilation can proceed with 
substitutes.
    };

Behind these suggestion is the notion of turning most specific detailed 
errors into some kind of thing which will keep compiling. A reference to an 
undefined data item should not stop the compile, or even disrupt the parse.

A poorly constructed conditional expression (with good items) should not halt 
the compile or disrupt the parse. So in the discussions of things like

    IF a = b OR ( c < d AND e < f ) OR z

where z is a non-sequitur (such as a data name), we need to catch it and 
produce it (even if the production is to do nothing. Otherwise we will be way 
into the THEN  or ELSE clause attempting recovery.  With many real world 
programs that will create an unacceptable result.


The following comment changes the subject slightly.
Anyway, this also implies a plausible arguement that the lexer (or the 
procedure division filter), needs a state mechanism.  To me it seems 
problematic to comit to lexer states early in a design (I won't justify that 
point of view here, I'll just explain why I think we _MIGHT_ need this kind 
of technology).

The basic surface of the procedure division is keywords as opposed to 
references. I am advocating as early a distinction on references as possible 
(before the parser torks to a new table cell). This creates a minor 
challenge. 

In the procedure division, valid data references are basically backward 
references to things already declared in previous divisions. In the procedure 
division, a reference to a section or paragraph name might easily be a 
forward reference. In a single pass copilation, such a forward reference 
would be indistinguishable from a reference to an undeclared data name (an 
error).

In some of the interaction concerning the zPROCs, it was suggested that every 
paragraph (and section) could become a processible unit of work out from the 
preprocessor. If that design concept is pursued, we can place the procedure 
division paragraph and section names in a portion of the SYMT early. This 
will aid in distinguishing data references from procedure references.

Still we may have unknowns (uncoded references) in incomplete or erroneous 
programs, so there may still be a need to have states in the procedure 
division lexer (or the filter).  These would conceptually process in the 
following manner: If the most recent verb (state) suggests data names are 
expected then return a reference as a data name reference, if the most recent 
verb (state) seems to imply a procedure name is expected then return a 
reference as a procedure name.

I know that if we have a SYMT with atleast datanames already, this matter is 
simplified; and as indicated, if we entable the procedure names, things are 
even more easy.  But we still have the matter of undefined references.  It 
seems like it will be much better to return these as either data references 
or procedure references, not as a non-descript references.  That is a lot of 
words to set up to ask a basic question: can anyone see a way to do this 
without the equivalent of lexer states? That is can we do it without context 
sensitivity? (Alternatively these same states may provide other useful 
support).

So any and how, the direction I am pointing is to get every single peep from 
the lexer as precisely manifested as possible.


Best Wishes,
Bob Rayhawk
RKRayhawk@aol.com
















--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.