[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: procedure types



In a message dated 11/3/99 3:05:50 AM EST,
Randall Bart (Barticus@att.net) writes:

<< 
 I agree that invalid expressions should have error productions, but I don't 
 see why we should end up "way into" anything.  Expression parsing should 
 always halt on any verb or keyword other than expression keywords, and 
 should resume at the first verb after the error (including the verb that 
 terminated the expression).  Eg,
 
      IF  ((A = B) OR (C = GO TO X
 
 As soon as the verb GO is encountered, we should be in an error 
 production.  Parsing should resume at GO.
>>

The following comments are not exactly responsive to the specifics of 
Randall's comments. I just want to take this opportunity to talk about 
parentheses and damaged code.

I would like to propose that the interface bewteen the first processor to 
recognize the procedure division and enumerate it' s parts and the next 
processor that will actually try to interpret it, have a handshaking that 
cognizes the following basic procedure types.

A procedure is any part of the procedure division executable as a separate 
piece. Basically a section or a paragraph is a procedure, as the word is used 
here, but there are other variations. So a procedure is really just a 
collection of statements.

The types might include
 - statements after the procedure division heading but before any section or 
paragraph name
 - statements under a paragraph heading but not under section heading
 - statements under a section heading but not under a paragraph
 - statements under a paragraph heading that is under a section heading


So in the phrase "collection of statements" I have avoided the use of the 
term 'named', and intentionally so. The first of the four types listed above, 
statements after the procedure division heading but before any section or 
paragraph name, is actually unnamed. And I mean that lexically. We would be 
justified in preventing code within the program or any of its invoked 
programs from entering unnamed code. This is certainly getting way ahead to 
linkage and security issues. But I wanted to suggest this high level 
attribute before making the following suggestions (because it would be hard 
to build in later).

Also notice that functions are a procedure type. (it is not entirely 
unreasonable to suggest that condition names are actually a kind of procedure 
type, thus syntax rules for data references and condition name references 
ought be distinct).

So any way, there could be several different procedure types. These notion 
apply to the source code and the executable. What follows applies to the 
source code only.

Separately I would propose for procedure division parsing that we have a 
balanced parentheses level attribute. So that a procedure with unbalanced 
parentheses might be parsed by an entirely distinct collection of rules then 
a parenthetically balanced procedure.  The purpose is to keep us way out of 
trouble on code such as
 
      IF  ((A = B) OR (C = GO TO X


((Again I am not being responsive to the post I am quoting, the focus there 
can be exampled with something like

      IF  A = B OR C = GO TO X

I think. I am just picking on the unbalanced parens for my own reasons here.

))

But where there is smoke there is fire. It is not rare that coding errors get 
compounded. And even robust compilers can be confronted with very strange 
code.

Unbalanced parentheses, however, present really serious problems to a COBOL 
parse. 

There are two basic computations for balancing parentheses, that I can think 
of. 

First, obviously your left paren count should equal your right paren count.
Surely a paragraph should not end with an unequal count. That can be 
embellished; a statement should not end with an unequal count, or an 
expression should not ... But somewhere in there we meet the enemy who is us 
the coder and the essence of the terms and factors become the parens 
themselves.... The Godellean threshold is reached when you try to say that a 
thing that requires balanced parens for it to be that kind of thing, can not 
contain unbalanced parens. If we stay out at the procedure level, that abyss 
is not entered. (Practically you stay out of the abyss if you can trap the 
problem before you reduce any rules dependent upon token pairing, which in 
the case at hand means multipasses in the compiler).

Secondly, you can track levels. When you encounter a left paren you go one 
level in (add 1 to paren level), when you encounter the right paren you go 
one level back out (subtract 1 from paren level). The key is that you should 
never go negative AND you should be at zero when you end (the paragraph or 
statement, etc). Note that a zero ending level can still obscure a negative 
transition.

Paren counting must competently ignore comments ( also REMARKS paragraphs if 
backward compatibility is provided), and competently handle literals. And 
while we are musing about it all, the parametric information on line one or 
the first few lines can present parentheses as well.

So after preprocessing, one or more parsers for the procedure division will 
be awakened, and I propose that they be informed as to whether or not the 
procedure they are about to try to parse has balanced parenthesis.

In effect, this adds a task to the preprocessor.

In a procedure that has unbalanced parens, we probably must try to locate the 
problem in the parse, but I am not sure we should jeopardize the compile by 
full interpretation of the code.

If you think about this, it could buy us time. The demand is highest for a 
compiler that can generate valid executables for valid source code. Next 
highest, and still very high, is to diagnose clearly a large list of obvious 
problems in source code, and stop code generation. Down somewhere lower is 
diagnosing severely damaged source code.  I guess I am saying that unbalanced 
parentheses is a kind of severe error, in the sense that the market place 
will tolerate somewhat less effective diagnosis of such code (atleast for a 
while).

Basically, my take on this is we should isolate the procedure (paragraph or 
leading portion of a section), do what we can with it. But we should not let 
this serious challenge bog down the development of positive logic in the 
grammar.

If the preprocessor in fact identifies procedures (sections and paragraphs) 
then I think we are in a position to isolate unbalanced parenthetical code in 
a technical sense and in the sense of managing the development effort.

In a world with A-Margin and B-Margin conventions, paragraph/section 
separators will be fairly easy to find (by the preprocessor). But in the 
future world that we are supposed to have our eye on, a label is going to be 
a little harder to find within a parenthetically unbalance source file. But 
maybe we will let that bridge burn us when we attempt the crossing.

Ideally, we would want to comprehend the unbalanced code at a level below the 
procedure level (that is within the paragraph). But there is already a really 
large amount of code bereft of end of statement periods, and that will just 
be increasing in the future. So it seems challenging to delimit the 
unbalanced code at some level below procedure, ... atleast prior to parsing.

(A middle ground here could be, to isolate at a level lower than procedures, 
if the code _does_ have end-of-statement periods. That could be the next 
compiler.)

But my suggestion is that parenthetically unbalanced code be parsed by an 
entirely distinct set of rules then balanced code. These sets can share some 
rules.  This can be accomplished by separate parsers, or perhaps by a 
technique in the lexer (or filter) that manifests a delimiting token that 
indicates that what follows is unbalanced code.

A parser could have ligh-level rules, such as

balanced_code : various_statements
  {do summary work here};

unbalanced_code : UNBALANCED minimum_stmts END_UNBAL
  {do error summary here};

where the rule various_statements is the whole universe of valid statements 
(and error productions for anticipated problems other than unbalanced 
parens), and the rule minimum_stmts is a collection of some valid statements 
and artful code designed to diagnose the paren problems.

To be honest it might not be a bad idea to be just a little redundant and 
have the rules

balanced_code : BAL various_statements  END_BAL 
  {do summary work here};

unbalanced_code : UNBAL minimum_stmts END_UNBAL
  {do error summary here};

with concommitant requirements on the lexer.

In this regard we may wish to construct the lexer (or filter) to be able to 
detect when a paren level has gone negative (too many right parens). And 
return a third type of paren, to wit PAREN_NEG. (so we would have PAREN_OPEN, 
PAREN_CLOSE and PAREN_NEG). This could support the parser in its UNBALANCED 
mode, and if we were confident that we had it right in the lexer (or the 
filter) the PAREN_NEG in the normal parse mode would send us into a hard 
stop. That would represent a difference in analysis by the lexer and the 
preprocessor.  (Such a hard stop would naturally be suppressible with compile 
time parm).

These kinds of things could be moderated with counters, say five PAREN_NEGs 
in supposedly balanced code would perhaps halt the compiler. A humorous 
variation on this moderation could be that if we find neither a PAREN_NEG nor 
a final non-zero paren count in unbalanced code that could be a show stopper! 
 (That, by the way, implies that the last thing out of the lexer on every 
procedure should be its paren count and paren level).

We could call all of this the paren protocol, assuming we can find the 
personpower to do any of it. But I think that we should handle unbalanced 
code very differently then balanced code, and it _is_ possible for the 
preprocessor to detect it.

Bob Rayhawk
RKRayhawk@aol.com

--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.