[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnubol: procedure types
In a message dated 11/3/99 3:05:50 AM EST,
Randall Bart (Barticus@att.net) writes:
<<
I agree that invalid expressions should have error productions, but I don't
see why we should end up "way into" anything. Expression parsing should
always halt on any verb or keyword other than expression keywords, and
should resume at the first verb after the error (including the verb that
terminated the expression). Eg,
IF ((A = B) OR (C = GO TO X
As soon as the verb GO is encountered, we should be in an error
production. Parsing should resume at GO.
>>
The following comments are not exactly responsive to the specifics of
Randall's comments. I just want to take this opportunity to talk about
parentheses and damaged code.
I would like to propose that the interface bewteen the first processor to
recognize the procedure division and enumerate it' s parts and the next
processor that will actually try to interpret it, have a handshaking that
cognizes the following basic procedure types.
A procedure is any part of the procedure division executable as a separate
piece. Basically a section or a paragraph is a procedure, as the word is used
here, but there are other variations. So a procedure is really just a
collection of statements.
The types might include
- statements after the procedure division heading but before any section or
paragraph name
- statements under a paragraph heading but not under section heading
- statements under a section heading but not under a paragraph
- statements under a paragraph heading that is under a section heading
So in the phrase "collection of statements" I have avoided the use of the
term 'named', and intentionally so. The first of the four types listed above,
statements after the procedure division heading but before any section or
paragraph name, is actually unnamed. And I mean that lexically. We would be
justified in preventing code within the program or any of its invoked
programs from entering unnamed code. This is certainly getting way ahead to
linkage and security issues. But I wanted to suggest this high level
attribute before making the following suggestions (because it would be hard
to build in later).
Also notice that functions are a procedure type. (it is not entirely
unreasonable to suggest that condition names are actually a kind of procedure
type, thus syntax rules for data references and condition name references
ought be distinct).
So any way, there could be several different procedure types. These notion
apply to the source code and the executable. What follows applies to the
source code only.
Separately I would propose for procedure division parsing that we have a
balanced parentheses level attribute. So that a procedure with unbalanced
parentheses might be parsed by an entirely distinct collection of rules then
a parenthetically balanced procedure. The purpose is to keep us way out of
trouble on code such as
IF ((A = B) OR (C = GO TO X
((Again I am not being responsive to the post I am quoting, the focus there
can be exampled with something like
IF A = B OR C = GO TO X
I think. I am just picking on the unbalanced parens for my own reasons here.
))
But where there is smoke there is fire. It is not rare that coding errors get
compounded. And even robust compilers can be confronted with very strange
code.
Unbalanced parentheses, however, present really serious problems to a COBOL
parse.
There are two basic computations for balancing parentheses, that I can think
of.
First, obviously your left paren count should equal your right paren count.
Surely a paragraph should not end with an unequal count. That can be
embellished; a statement should not end with an unequal count, or an
expression should not ... But somewhere in there we meet the enemy who is us
the coder and the essence of the terms and factors become the parens
themselves.... The Godellean threshold is reached when you try to say that a
thing that requires balanced parens for it to be that kind of thing, can not
contain unbalanced parens. If we stay out at the procedure level, that abyss
is not entered. (Practically you stay out of the abyss if you can trap the
problem before you reduce any rules dependent upon token pairing, which in
the case at hand means multipasses in the compiler).
Secondly, you can track levels. When you encounter a left paren you go one
level in (add 1 to paren level), when you encounter the right paren you go
one level back out (subtract 1 from paren level). The key is that you should
never go negative AND you should be at zero when you end (the paragraph or
statement, etc). Note that a zero ending level can still obscure a negative
transition.
Paren counting must competently ignore comments ( also REMARKS paragraphs if
backward compatibility is provided), and competently handle literals. And
while we are musing about it all, the parametric information on line one or
the first few lines can present parentheses as well.
So after preprocessing, one or more parsers for the procedure division will
be awakened, and I propose that they be informed as to whether or not the
procedure they are about to try to parse has balanced parenthesis.
In effect, this adds a task to the preprocessor.
In a procedure that has unbalanced parens, we probably must try to locate the
problem in the parse, but I am not sure we should jeopardize the compile by
full interpretation of the code.
If you think about this, it could buy us time. The demand is highest for a
compiler that can generate valid executables for valid source code. Next
highest, and still very high, is to diagnose clearly a large list of obvious
problems in source code, and stop code generation. Down somewhere lower is
diagnosing severely damaged source code. I guess I am saying that unbalanced
parentheses is a kind of severe error, in the sense that the market place
will tolerate somewhat less effective diagnosis of such code (atleast for a
while).
Basically, my take on this is we should isolate the procedure (paragraph or
leading portion of a section), do what we can with it. But we should not let
this serious challenge bog down the development of positive logic in the
grammar.
If the preprocessor in fact identifies procedures (sections and paragraphs)
then I think we are in a position to isolate unbalanced parenthetical code in
a technical sense and in the sense of managing the development effort.
In a world with A-Margin and B-Margin conventions, paragraph/section
separators will be fairly easy to find (by the preprocessor). But in the
future world that we are supposed to have our eye on, a label is going to be
a little harder to find within a parenthetically unbalance source file. But
maybe we will let that bridge burn us when we attempt the crossing.
Ideally, we would want to comprehend the unbalanced code at a level below the
procedure level (that is within the paragraph). But there is already a really
large amount of code bereft of end of statement periods, and that will just
be increasing in the future. So it seems challenging to delimit the
unbalanced code at some level below procedure, ... atleast prior to parsing.
(A middle ground here could be, to isolate at a level lower than procedures,
if the code _does_ have end-of-statement periods. That could be the next
compiler.)
But my suggestion is that parenthetically unbalanced code be parsed by an
entirely distinct set of rules then balanced code. These sets can share some
rules. This can be accomplished by separate parsers, or perhaps by a
technique in the lexer (or filter) that manifests a delimiting token that
indicates that what follows is unbalanced code.
A parser could have ligh-level rules, such as
balanced_code : various_statements
{do summary work here};
unbalanced_code : UNBALANCED minimum_stmts END_UNBAL
{do error summary here};
where the rule various_statements is the whole universe of valid statements
(and error productions for anticipated problems other than unbalanced
parens), and the rule minimum_stmts is a collection of some valid statements
and artful code designed to diagnose the paren problems.
To be honest it might not be a bad idea to be just a little redundant and
have the rules
balanced_code : BAL various_statements END_BAL
{do summary work here};
unbalanced_code : UNBAL minimum_stmts END_UNBAL
{do error summary here};
with concommitant requirements on the lexer.
In this regard we may wish to construct the lexer (or filter) to be able to
detect when a paren level has gone negative (too many right parens). And
return a third type of paren, to wit PAREN_NEG. (so we would have PAREN_OPEN,
PAREN_CLOSE and PAREN_NEG). This could support the parser in its UNBALANCED
mode, and if we were confident that we had it right in the lexer (or the
filter) the PAREN_NEG in the normal parse mode would send us into a hard
stop. That would represent a difference in analysis by the lexer and the
preprocessor. (Such a hard stop would naturally be suppressible with compile
time parm).
These kinds of things could be moderated with counters, say five PAREN_NEGs
in supposedly balanced code would perhaps halt the compiler. A humorous
variation on this moderation could be that if we find neither a PAREN_NEG nor
a final non-zero paren count in unbalanced code that could be a show stopper!
(That, by the way, implies that the last thing out of the lexer on every
procedure should be its paren count and paren level).
We could call all of this the paren protocol, assuming we can find the
personpower to do any of it. But I think that we should handle unbalanced
code very differently then balanced code, and it _is_ possible for the
preprocessor to detect it.
Bob Rayhawk
RKRayhawk@aol.com
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.