[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnubol: distinguishing data refs and conditional refs
In a message dated 11/1/99 10:44:35 PM EST, mck@tivoli.mv.com adding further
excellent comments on syntax such as,
<<
IF a = b OR ( c < d AND e < f ) OR g
>>
writes
>>
We are propagating the subject and relation from a factor to a term.
I guess that's the way it is, but it makes me a little
uncomfortable. It also means, that if Fred is right, making the
implied grouping explicit, changes the semantics of the phrase
profoundly. Now you can say "Ick", Chris.
>>
To which I will add the small notion ...
Now back track and simplify to just a relational operator like equal and drop
a NOT in there somewhere. and say what is implied. As in
IF a = b AND NOT g
or
IF a NOT = b AND g
and explain implied grouping, and state which standard agrees. Now say, Ick.
No matter what, if you can perk references to the precision of distinguishing
data_item_reference versus conditional_name_reference, the grammar will be
stronger and you can also explicate (sorry) the error productions.
In this connection Fred Neale's (fneale@tss.com.au ) 11/1/99 8:15:41 PM EST
posting of the
BODMAS precedence rule is nice
Brackets
Of
Division
Multiplication
Addition
Subtraction
What is your source for that? Is that complete?
Surely function references must be high precedence, and reducing subsctipts
is right up there with OF (atleast before arithmetic operators).
C (not C++) has a FUMAS-REBBBLL-TAC ("fumas rebel tack", as any Smother's
Brother would recognize). Where Relationals and Equalities have higher
precedence then Logicals. But COBOL's natural language heritage does some
violence to any such straight forward approach.
Still you can rip the strangeness out of the conditionals issue discussed
here by providing distinct tokens. (That is decidedly not the same as rip the
strangeness our of implied subjects and operators).
In some of the original material quoted in this thread
<value1> = <value2> OR <value3>
....
can become multiple rules something like
<data_value1> = <data_value2> OR <data_value3>
..... production D
and
<data_value1> = <data_value2> OR <cond_valueA>
.... production C
((where data_valueN and cond_valueX are references (quite distinct from
declarations but also) clearly distinct from one another in kind))
We just have to think of the return from the lexer (actually from the
intermediate filter as I have suggested) as being more exact. To this we can
also add an undefined reference. In other words, if we are scanning the
procedure division in expectation of keywords or datanames and it is not a
keyword and it is not a dataname in the SYMT, then it is an undefined
dataname.
Production rules for data_nameN can include undefined dataname (which would
be diagnosed at the lower level. The action (or subsequent intermediate form
processors) can trap the error and avoid emitting code.
for example
cond : <data_value1> = <data_value2> OR <cond_valueA>
{ if ($1.errflag || $3.errflag || $5.errflag)
{ ; // do nothing it's already been diagnosed}
else
{emit(code or AST for this kind of conditional)}
};
alternatively
cond : <data_value1> = <data_value2> OR <cond_valueA>
{ //pass the buck
thisAST.errglag |= $1.errflag || $3.errflag; || $5.errflag;
emit(AST for this kind of conditional);
// AST processor will have to suppress code emission.
// Even at that error data/conditional references production rules
// can actually return references to processable dummies.
// As long as the error is diagnosed, and a high level switch is set
// to suppress later stages; robust compilation can proceed with
substitutes.
};
Behind these suggestion is the notion of turning most specific detailed
errors into some kind of thing which will keep compiling. A reference to an
undefined data item should not stop the compile, or even disrupt the parse.
A poorly constructed conditional expression (with good items) should not halt
the compile or disrupt the parse. So in the discussions of things like
IF a = b OR ( c < d AND e < f ) OR z
where z is a non-sequitur (such as a data name), we need to catch it and
produce it (even if the production is to do nothing. Otherwise we will be way
into the THEN or ELSE clause attempting recovery. With many real world
programs that will create an unacceptable result.
The following comment changes the subject slightly.
Anyway, this also implies a plausible arguement that the lexer (or the
procedure division filter), needs a state mechanism. To me it seems
problematic to comit to lexer states early in a design (I won't justify that
point of view here, I'll just explain why I think we _MIGHT_ need this kind
of technology).
The basic surface of the procedure division is keywords as opposed to
references. I am advocating as early a distinction on references as possible
(before the parser torks to a new table cell). This creates a minor
challenge.
In the procedure division, valid data references are basically backward
references to things already declared in previous divisions. In the procedure
division, a reference to a section or paragraph name might easily be a
forward reference. In a single pass copilation, such a forward reference
would be indistinguishable from a reference to an undeclared data name (an
error).
In some of the interaction concerning the zPROCs, it was suggested that every
paragraph (and section) could become a processible unit of work out from the
preprocessor. If that design concept is pursued, we can place the procedure
division paragraph and section names in a portion of the SYMT early. This
will aid in distinguishing data references from procedure references.
Still we may have unknowns (uncoded references) in incomplete or erroneous
programs, so there may still be a need to have states in the procedure
division lexer (or the filter). These would conceptually process in the
following manner: If the most recent verb (state) suggests data names are
expected then return a reference as a data name reference, if the most recent
verb (state) seems to imply a procedure name is expected then return a
reference as a procedure name.
I know that if we have a SYMT with atleast datanames already, this matter is
simplified; and as indicated, if we entable the procedure names, things are
even more easy. But we still have the matter of undefined references. It
seems like it will be much better to return these as either data references
or procedure references, not as a non-descript references. That is a lot of
words to set up to ask a basic question: can anyone see a way to do this
without the equivalent of lexer states? That is can we do it without context
sensitivity? (Alternatively these same states may provide other useful
support).
So any and how, the direction I am pointing is to get every single peep from
the lexer as precisely manifested as possible.
Best Wishes,
Bob Rayhawk
RKRayhawk@aol.com
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.