[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: procedure types



In a message dated 11/12/99 6:14:19 AM EST, 
Randall Bart,Barticus@att.net, writes some excellent comments about finding 
scope terminators, end of statements and paragraphs names.

He was kind enough to address some of my previous posts, and I wanted to 
comment further. Not to push the subject but to clarify. I had a more narrow 
focus then he, which surely stems from the breadth of his valuable experience.

IMHO in the new world which will lack A-margin and B-margins the paragraph 
names and section names _will_ be a little harder find generally (I was 
actually also really tightly focused on the problem that this is even more 
true in unbalanced parentheses text).

Refering to a paragraph name or section name Randall says 
<<
 A  header comes right after a separator period.
>>
and then very effectively enumerates the kind of preceding things which would 
have the period. Thats very right.  Except, ...

My frame of reference is tilted from yours.  My theme is that we are not 
compiling all possible valid programs, we are compiling all possible 
programs, period. :) (The English language emphatic 'period' is meant as a 
pun). And the glyph is meant to send your paren balancer off scope.

So let me change the phrasing, twice.  The first approach is that "a 
separator period comes right before the header."  The second approach is that 
"a separator period SHOULD come right before the header."

In the old world, the compiler can use the A-margin to help out. I confess 
that I am a little confused by the examples depicting procedure headers 
floating around in various places, with posted text describing them as 
acceptable. My confusion is that I can not tell if you are saying that stuff 
is valid code now. Is that COBOL-85, or dialectic extension?

So anyway this is a narrow focus. Randall's posting has a much more 
exhaustive treatment of the subject. But today on conservative compilers a 
paragraph that ends without a period causes a warning. There may be numerous 
techniques to sense this problem and stay on your feet, but,... In the new 
world it will not be easy. For example, an ADD statement (which could 
certainly be at the end of a procedure) is allowed to have multiple targets.

A simple ADD would be
   ADD a to b.

But lets say you want to add a current amount to an employee total, a 
department total, and a division total.  COBOL allows
  ADD a to b, c, d.

Permit me to strip the optional commas for clarity.
  ADD a to b c d.

There is supposed to be a period on the end of that if it is the last thing 
in a paragraph. So my point is what if there is not. I will format it as 
A-margin and B-margin; but assume the following is free format


    ADD a to b 
 c 
 d
 e.

Is e a procedure name (a paragraph name) or an undeclared data item.


In the following I chop the period off the e, and suggest that _maybe_ a 
SECTION token follows:
    ADD a to b 
 c 
 d
 e
 [SECTION] 


So what would all that code mean.  If c were mispelled, is it a reference to 
an undefined data item or a paragraph/section name. If we get wild because of 
c mispelling, then what are we going to think of d and e.  In the old world 
the redundancy of the margin did provide a _possible_ elleviation.

If that is just a syntax issue, then someone will code the grammar rule or 
lex mechanism to catch it or limit the effect.  But that was not the issue I 
had hoped to be discussing in the original post.

Back at the farm we occassionally discuss whether there will be separate 
parsers for each of the divisions. If we do that then some means must be 
devised to sense the division separators and pass those markers on to the 
individual division parsers. 

There is no design or current code to this effect, but discussion seems to 
guess that the preprocessor could be the separator detector. In some of those 
discussions, someone suggested that we can identify each procedure division 
section and paragraph in that early phase. Leading to the possibility of 
distinct parses for each procedure. I did not originate that notion but it is 
very powerful. My comments about new world marginless free format followed 
that sequence.  I do not think it is impossible to deal with this situation, 
I am just saying that the gravitation of the collection of code to a 
period-less style, and the float of the procedure  headers (section and 
pragraph names) are phenomena on a collision course. And it crunches together 
on the desk of the lucky person who perfects the preprocessor (or wherever we 
try to sense the separators).

Randal has a better perspective than I on this, and a wider one. But I hope 
we can merge the aspect that it is all possible programs that we are 
digesting, not just the good ones. The periods you elucidate are all 
putative. The compiler must deal with their absence in each an every 
situation you are kind enough to list! 

But yet still my narrow focus was dramatically more telescoped in on an 
ittsy-bittsy point. What I really said was that this whole problem, 
periodlessness and header float, is much more difficult to deal with in code 
portions that have unbalanced parentheses.

My idea of the unbalanced parentheses detection is really a safety mechanism. 
It is one of my trump cards in the backtracking undertow around here.  You 
see, I hate error recovery modes in parsers (that is my failing, surely). But 
error recovery modes mean unstable compilers to me, and dissatisfied tool 
users.

The things in our grammars that correspond to arithmetic expressions, boolean 
expressions, subscripting, reference modification, and function invocation 
stand naked and very exposed to unbalance parentheses.  To say nothing of pic 
clauses! 

If we can, I suggest that we atleast try to detect unbalanced parenthesis as 
early as at all possible.  I would consider treating it entirely differently 
than normal code. And _that_ notion fits the discussion of distinct parsers 
for each division (and possibly each section, and possibly each paragraph). 
As a first cut I simply suggest that there would be a distinct parser, and 
its job is to find the paren problem (further syntax checking in that code 
portion  is either low priority or abandoned). 

We would hear bad noise if we tried to tell the world we would short circuit 
the whole procedure division because of just one paragraph with unbalanced 
parens. Same for the data division and pernicious pic clauses.  So the 
preprocessor might contribute here. The preprocessor sees every character.  
It can count parens (even sensing negative threshold transitions). If we have 
the preprocessor detect division and section names and possibly paragraph 
names for the scheduler of division/section parsers, then just add a field 
that designates the code as clean or in paren trouble.  I am just trying to 
get that idea in early enough in the evolution of the preprocessor so that 
the interface to the next phase can countenance any data items we need to 
pass with each separator description.

I am not in favor of making the preprocessor a whole lot more complex, we 
should have mercy. But I am saying that, as of the point we have expanded the 
source file to have all of its COPY code and any other kind of INCLUDE 
matter, then is our first chance to sense the separators, _and_ there is our 
first chance to clamp down on unbalanced parens.

As a brief further clarification, let me say that the preprocessor has its 
work cut out for it if it is to count parens.  Some were kind enough to 
respond to my comments about difficulties in paren counting relating to 
comments and literal strings  I am sure that the problem here is that my 
original expression were not clear enough.

It is a fact that the rules for comments and literal strings are clear enough 
that they can be detected as needed in the preprocessor. My point was only 
that if we plug a paren count mechanism into the preprocessor, it can not 
just be lexical, it must be syntactic.

It is a sneak attack, but paren counting partly obviates the need for 
backtracking. More importantly it positions us to possibly keep the parsers 
out of error recovery free fall.

However, I also see that for this to payoff at some point something must be 
coded to dig further into the source code.  Rejecting whole paragraphs, with 
mere paren diagnostics, will in some cases be very inadequate. The 
alternative in the absence of more sophisticated code is to send the text 
through a very vulnerable parser.  So early on in the project I would say 
detecting paren unbalanced text is a high priority, in the sense we need it 
design for it now, and we need to code it now, and we need to respond to it 
in the parser to protect the parser.
Creating a response more impressive than mere paren diagnostic is less of a 
priority, So, consider a switch, that allows the user to force full parse; 
caveat switcher. But leave room for a handler to be evolved that is much more 
interesting.

A handler of unbalanced text would divide and conquer; text that precedes the 
problem could be parsed, text that follows the problem could be parsed. That 
is the problem could be isolated down to the sentence or statement level by a 
scan that just tries to cut the problem out of full parse, but let the other 
code go through normal parse.  Leaving room for this type of further 
development allows us to avoid any discussion of pushing the preprocessor 
down to the statement or sentence level in its paren counting, it cannot get 
there because it would have to come after the division parsers! (I think 
Goedel said something like that).

And if that makes sense to the viewers of this text, then surely you see that 
we will need to get the data division paren problems honed to a more precise 
level then section, at a much earlier date than the procedure division. From 
a project management point of view, it might be that if they give us trouble 
in the procedure division we do not have the resources to be perfect there 
now, if they give us trouble in the data division with parens then it might 
be that  we probably can not afford to _not_ engage it with some specificity 
even in the earliest release.

So if you are confident that all of this is rot, then I encourage you to 
advocate that the parameter default to 'full compile paren unbalanced text', 
and if you hate free fall in parsers then advocate, like humble I, that this 
meager parameter default to 'isolate paren problems and do minimum 
diagnostics'.

I do realise that most of the participants in this effort do not see this 
thing as nearly the issue I portray. But I am here with this read flag. And 
it is waving.

Best Wishes,
Bob Rayhawk
RKRayhawk@aol.com





















--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.