[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: distinguishing data refs and conditional refs



It is possible to keep the parser on its feet in situations like
 move paragraph1 to file2.
if our rule is something like
 stmt_move : MOVE diag_data_ref TO diag_data_ref opt_EOS

where diag_data_ref is a diagnosed data reference something like


diag_data_ref : valid_data_ref
   {/* log this for cross reference */
    /* generate AST node for data reference */
    /* hook AST node in */    
    ;
   }
 | invalid_data_ref
   {/* may not need error message here as it will happen lower*/
    /* possibly do nothing here at all */
    /* or ... */
    /* assuming we have a few spares around */
    /* hook AST to a dummy variable to node so certain aspects of AST
       or intermediate code processing can _still_ happen */
    /* definitely throw a big bad error switch if not done so yet */
    this_AST_node_errflag |= syntax_err_was_found_dont_go_too_far;
 }

And invalid_data_ref is something like

/* assuming this is some place where we really only want data ref */
invalid_data_ref : reference_to_procedure
  | reference_to_collating_sequence_name
  | reference_to_system_mnemonic_like_SYSIPT
  | etc.....
  {/* diagnose it */
  err_service(file_ident, line_number, STD_ERR_000n, ptr_yylex);
  };

We may not be able to commit to exactly how to script the error until we 
chart out the parsers a little more. Comments to the effect that there are a 
lot of possible error messages are certainly easy to agree with. Truthfully 
it might even be a good idea to bring in non programmers, yet someone who can 
learn the standard, to help manage the error text issue.
 
Yet generally, if errors from the parser can be invoked through the error 
service by some conventionalized 'naming' straregy, like using enums such as 
STD_ERR_000n; then we could go back and begin to improve the error 
messaging language and specificity sent out for the interaction with the tool 
user.

The important thing initially would be to break out all the possible 
detection points with a precise (and long) terminal symbol list. An interface 
to an 
 err_service() function with things like (file_ident, line_number, 
STD_ERR_000n, ptr_yylex) should leave us room for now. (no it is not 
complete, I just saying don't build anything into the error_service interface 
that
assumes anything at all about the type of error, just find a way to give it 
everything).   ((this mild suggestion is assuming that STD_ERR_000n allows 
the error processor to look up some text that may or may not have a 
substitutable particle, like an &ampervar in JCL, or a %s thingy in a printf 
data item.))

So let's say we had a bundle of rules where we were lazy and just used, say
STD_ERR_0101.  We could go back and begin to improve that to be more 
specific.  

So having raced forward let me rewind. With this proposal, in the statement
   move paragraph1 to file2.

reference file2 remains entirely visible. The compiler, as a program, does 
not go into error recovery mode when processing paragraph1. Instead it just 
builds (reduces) a data reference, which happened to be of the the kind 
wrong. The status flags on the AST nodes will ultimately prevent code 
generation, as will any high-level error switches that can be thrown by error 
productions. We not only get to the 'to' and the reference 'file2', but that 
reference if erzatz will be diagnosed, and _still_ we can execute later 
stages as long as we have an error flag handshake method.

----------

tej@melbpc.org.au (Tim Josling) wrote
<<
Also the rule that it is
perfectly legal to have duplicate data names as long as you don't
reference them. e.g.:

01 data1 pic x.
01 data1 pic x.
>>

I would generalize beyond this.  I am not sure it is legal to name a pargraph 
the same as a dataname (... as long as you don't reference it, or anyother 
time). But this kind of thing should cause us no problem at all. The compiler 
should be designed for all possible programs, not just all valid programs. So 
really maybe the very first capability that we need in the symbol table is 
duplicate declaration of the same name. (Quite obviously coping with scope 
distinct reuses of the same name as okay). 

But here I am radical.  Just as you point out that the unqulaified use of the 
duplicate data names is an error (and you are right sometimes 01s with the 
same name can't be distinguished, but it they are under different FDs/SDs 
...).
So too, we may want to start with the approach that when we encounter 
duplicate names of different items with different types, that we not get too 
concerned, ... that is, not immediately (in another light let me say that the 
symbol table processes are not error processes).

So lets say you have a dataname and a procedure name that are identical (a 
procedure being a paragraph or section). In the complete absence of anything 
else that relates to these at all, just wait to the end to try to diagnose 
it. If the two had been just data names, as you suggest, leave them be. If 
they were of type data and type procedure, and the standard says that is an 
error, then flag it. But just wait to the end to do that. There is a reason 
for this stealth.

In the situation where there is a data and a procedure name identical, 
actually a normal otherwise valid reference to the data name (like a MOVE) is 
not syntactically invalid. Same goes for valid references to the procedure 
name (like a PERFORM).  So lets say you have all four of these things. What 
do you diagnose. 

The obvious answer today is that we should be so fortunate as to get that far 
with the project. But conceptually it is possible to permit the option of 
just diagnosing the duplication at the end, _possibly_ not flaging the 
otherwise valid references. (the arguement being that the references per se 
are syntactically valid). Programmers may realize the implications of the 
duplication diagnostic.  But my recommendation is not any one of these 
particular diagnostic choices, but merely to set the symbol table up with the 
possibility of many duplications. And indeed, to be prepared to handle some 
of the scoping challenges via attributes in the symbol table.


--------------
tej@melbpc.org.au (Tim Josling) also wrote
<<
The symbol table usage is more complex in cobol because of the
complex scope rules (nested programs, global variables) but also
because of the a of b of c type of construct. You may have to
look ahead up to 100 symbols or so. 
>>

Perhaps, well, we should probably get a handle on the exact number. I think 
that the requirement that level numbers basically be 01 through 49 brings 
that down some (we can have 01s within FDs/SDs, and an 88 is actually under 
an element in the sense that syntactically 
  cond-88-name OF data-49-name
is valid. So 50 or 60.

But, even with precise typing, we can still recurse. The rules don't have to 
explicate the qualification levels literally (although my initial toying with 
it suggests there could be numerous rules). We can recurse even in bison.

So actually there is not that much lookahead in the grammatical sense. But 
out there in the real world are programs lurking with outlandish 
qualification depths (OF/IN clauses). And we need a way to digest them and 
stay on our feet. So I can't chart a catch-50-and-error rule. But it is not 
likely that it will be too different from the thing we devise to detect level 
2 in valid procedure references (only one OFIN clause is legal on paragraph 
names).  Likewise you can not OFIN the section name itself.  As in 
  PERFORM section_ref OF section_ref.

but we need to trap it and when we do it will need to scoop up numerous 
further OF clauses tacked maliciously or not on the end.  My point is that 
the programs out there will present us with numerous things to kind of look 
ahead into, but the appropriate rules do not necessarily require that many 
lookaheads, because the available tools can recurse, even without 
backtracking.

(((For a real aside, read this: we may need to tred softly on the cardinality 
of any limit to the depth of qualification via IN/OF clauses. IMNSHO, COBOL 
meets XML somewhere near here - that is, in this grammar work. I don't think 
XML has depth limitations.  We might want to parameterize this constraint. 
But actually I'm not sure how to tread softly.  In fact I remain somewhat 
uncertain how to end the recurse at an arbitrary point. It is possible to 
leave it open, I just don't know how to chop it at a specific depth via 
grammar rules.  This is actually a kind of optimization issue.  Really 
ridiculous programs might crash the compiler due to stack limitations, and we 
would want to prevent the runaway OFIN recurse somehow. So to get started 
with either OFINs and more unabashedly with XML, we can just have open ended 
recurses.  But when you think about it, it is probably not really totally 
silly to think that the lexer might be able to count some of that.  Really OF 
x OF x ... ad infinitum, is probably easily sensible on the way out of the 
lexer.  I am not sure about PCCTS, but actually bison has a hook point up 
high before the switch() statement on the token.  We don;t have to sleep up 
there, we can snoop, ...., it is possible to hack a transform of a token,  
... in the same location functionally would be the lex-parser filter that I 
have commented about, ... so anyway this is an aside.)))

Best Wishes,
Bob Rayhawk
RKRayhawk@aol.com

--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.