[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnubol: procedure types
In a message dated 11/12/99 6:14:19 AM EST,
Randall Bart,Barticus@att.net, writes some excellent comments about finding
scope terminators, end of statements and paragraphs names.
He was kind enough to address some of my previous posts, and I wanted to
comment further. Not to push the subject but to clarify. I had a more narrow
focus then he, which surely stems from the breadth of his valuable experience.
IMHO in the new world which will lack A-margin and B-margins the paragraph
names and section names _will_ be a little harder find generally (I was
actually also really tightly focused on the problem that this is even more
true in unbalanced parentheses text).
Refering to a paragraph name or section name Randall says
<<
A header comes right after a separator period.
>>
and then very effectively enumerates the kind of preceding things which would
have the period. Thats very right. Except, ...
My frame of reference is tilted from yours. My theme is that we are not
compiling all possible valid programs, we are compiling all possible
programs, period. :) (The English language emphatic 'period' is meant as a
pun). And the glyph is meant to send your paren balancer off scope.
So let me change the phrasing, twice. The first approach is that "a
separator period comes right before the header." The second approach is that
"a separator period SHOULD come right before the header."
In the old world, the compiler can use the A-margin to help out. I confess
that I am a little confused by the examples depicting procedure headers
floating around in various places, with posted text describing them as
acceptable. My confusion is that I can not tell if you are saying that stuff
is valid code now. Is that COBOL-85, or dialectic extension?
So anyway this is a narrow focus. Randall's posting has a much more
exhaustive treatment of the subject. But today on conservative compilers a
paragraph that ends without a period causes a warning. There may be numerous
techniques to sense this problem and stay on your feet, but,... In the new
world it will not be easy. For example, an ADD statement (which could
certainly be at the end of a procedure) is allowed to have multiple targets.
A simple ADD would be
ADD a to b.
But lets say you want to add a current amount to an employee total, a
department total, and a division total. COBOL allows
ADD a to b, c, d.
Permit me to strip the optional commas for clarity.
ADD a to b c d.
There is supposed to be a period on the end of that if it is the last thing
in a paragraph. So my point is what if there is not. I will format it as
A-margin and B-margin; but assume the following is free format
ADD a to b
c
d
e.
Is e a procedure name (a paragraph name) or an undeclared data item.
In the following I chop the period off the e, and suggest that _maybe_ a
SECTION token follows:
ADD a to b
c
d
e
[SECTION]
So what would all that code mean. If c were mispelled, is it a reference to
an undefined data item or a paragraph/section name. If we get wild because of
c mispelling, then what are we going to think of d and e. In the old world
the redundancy of the margin did provide a _possible_ elleviation.
If that is just a syntax issue, then someone will code the grammar rule or
lex mechanism to catch it or limit the effect. But that was not the issue I
had hoped to be discussing in the original post.
Back at the farm we occassionally discuss whether there will be separate
parsers for each of the divisions. If we do that then some means must be
devised to sense the division separators and pass those markers on to the
individual division parsers.
There is no design or current code to this effect, but discussion seems to
guess that the preprocessor could be the separator detector. In some of those
discussions, someone suggested that we can identify each procedure division
section and paragraph in that early phase. Leading to the possibility of
distinct parses for each procedure. I did not originate that notion but it is
very powerful. My comments about new world marginless free format followed
that sequence. I do not think it is impossible to deal with this situation,
I am just saying that the gravitation of the collection of code to a
period-less style, and the float of the procedure headers (section and
pragraph names) are phenomena on a collision course. And it crunches together
on the desk of the lucky person who perfects the preprocessor (or wherever we
try to sense the separators).
Randal has a better perspective than I on this, and a wider one. But I hope
we can merge the aspect that it is all possible programs that we are
digesting, not just the good ones. The periods you elucidate are all
putative. The compiler must deal with their absence in each an every
situation you are kind enough to list!
But yet still my narrow focus was dramatically more telescoped in on an
ittsy-bittsy point. What I really said was that this whole problem,
periodlessness and header float, is much more difficult to deal with in code
portions that have unbalanced parentheses.
My idea of the unbalanced parentheses detection is really a safety mechanism.
It is one of my trump cards in the backtracking undertow around here. You
see, I hate error recovery modes in parsers (that is my failing, surely). But
error recovery modes mean unstable compilers to me, and dissatisfied tool
users.
The things in our grammars that correspond to arithmetic expressions, boolean
expressions, subscripting, reference modification, and function invocation
stand naked and very exposed to unbalance parentheses. To say nothing of pic
clauses!
If we can, I suggest that we atleast try to detect unbalanced parenthesis as
early as at all possible. I would consider treating it entirely differently
than normal code. And _that_ notion fits the discussion of distinct parsers
for each division (and possibly each section, and possibly each paragraph).
As a first cut I simply suggest that there would be a distinct parser, and
its job is to find the paren problem (further syntax checking in that code
portion is either low priority or abandoned).
We would hear bad noise if we tried to tell the world we would short circuit
the whole procedure division because of just one paragraph with unbalanced
parens. Same for the data division and pernicious pic clauses. So the
preprocessor might contribute here. The preprocessor sees every character.
It can count parens (even sensing negative threshold transitions). If we have
the preprocessor detect division and section names and possibly paragraph
names for the scheduler of division/section parsers, then just add a field
that designates the code as clean or in paren trouble. I am just trying to
get that idea in early enough in the evolution of the preprocessor so that
the interface to the next phase can countenance any data items we need to
pass with each separator description.
I am not in favor of making the preprocessor a whole lot more complex, we
should have mercy. But I am saying that, as of the point we have expanded the
source file to have all of its COPY code and any other kind of INCLUDE
matter, then is our first chance to sense the separators, _and_ there is our
first chance to clamp down on unbalanced parens.
As a brief further clarification, let me say that the preprocessor has its
work cut out for it if it is to count parens. Some were kind enough to
respond to my comments about difficulties in paren counting relating to
comments and literal strings I am sure that the problem here is that my
original expression were not clear enough.
It is a fact that the rules for comments and literal strings are clear enough
that they can be detected as needed in the preprocessor. My point was only
that if we plug a paren count mechanism into the preprocessor, it can not
just be lexical, it must be syntactic.
It is a sneak attack, but paren counting partly obviates the need for
backtracking. More importantly it positions us to possibly keep the parsers
out of error recovery free fall.
However, I also see that for this to payoff at some point something must be
coded to dig further into the source code. Rejecting whole paragraphs, with
mere paren diagnostics, will in some cases be very inadequate. The
alternative in the absence of more sophisticated code is to send the text
through a very vulnerable parser. So early on in the project I would say
detecting paren unbalanced text is a high priority, in the sense we need it
design for it now, and we need to code it now, and we need to respond to it
in the parser to protect the parser.
Creating a response more impressive than mere paren diagnostic is less of a
priority, So, consider a switch, that allows the user to force full parse;
caveat switcher. But leave room for a handler to be evolved that is much more
interesting.
A handler of unbalanced text would divide and conquer; text that precedes the
problem could be parsed, text that follows the problem could be parsed. That
is the problem could be isolated down to the sentence or statement level by a
scan that just tries to cut the problem out of full parse, but let the other
code go through normal parse. Leaving room for this type of further
development allows us to avoid any discussion of pushing the preprocessor
down to the statement or sentence level in its paren counting, it cannot get
there because it would have to come after the division parsers! (I think
Goedel said something like that).
And if that makes sense to the viewers of this text, then surely you see that
we will need to get the data division paren problems honed to a more precise
level then section, at a much earlier date than the procedure division. From
a project management point of view, it might be that if they give us trouble
in the procedure division we do not have the resources to be perfect there
now, if they give us trouble in the data division with parens then it might
be that we probably can not afford to _not_ engage it with some specificity
even in the earliest release.
So if you are confident that all of this is rot, then I encourage you to
advocate that the parameter default to 'full compile paren unbalanced text',
and if you hate free fall in parsers then advocate, like humble I, that this
meager parameter default to 'isolate paren problems and do minimum
diagnostics'.
I do realise that most of the participants in this effort do not see this
thing as nearly the issue I portray. But I am here with this read flag. And
it is waving.
Best Wishes,
Bob Rayhawk
RKRayhawk@aol.com
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.