[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnubol: How do we parse this language, anyway?

To: gnu-cobol@lusars.net
Subject: Re: gnubol: How do we parse this language, anyway?
From: RKRayhawk@aol.com
Date: Sat, 4 Dec 1999 13:55:16 EST
Delivered-To: gnu-cobol-outgoing@wallace.lusars.net
Reply-To: gnu-cobol@lusars.net
Sender: owner-gnu-cobol@wallace.lusars.net

The test case that is being used may well be too much of a worst case. It has 
open conditionals on open conditionals on open conditionals. That is not the 
problem, if there is a problem.  It is the simpler cases that matter. They 
are atleast plausible, where as repeated recurse leaving things open is not 
the problem.

The problem is that at just one further level, the last arithmetic can appear 
to be either a simple or a conditional open ended, depending upon whether you 
interpret the claue to its right as belonging to it or its parent. 
Regrettably, much of the effort to test this has not helped because it is not 
looking at the simple cases that are likely.

The only problem here is at any level of recurse if the last two level are 
both open you can bind the last clause in more than one way. If you are three 
levels deep and finish open ended there simply must be an open conditional on 
the end of something and it would not matter how you argue it should bind 
because we do not have to code for that since open conditionals can not live 
on the edge of open conditionals.

Conversely if you are deep deep deep and the outer statements are going to 
end explicitly, then any depth is okay, any branch is okay ... except if one 
has a loose end. A simple arithmetic can be a tight ending! That is the 
problem. If you bind the last found clause to the inner verb it can look 
conditional, if you bind it to an antecedent, then the inner looks like an 
imperative (if you take it as simple).

A difficulty we have with the test is that you are three or more deep with 
open ended conditionals, no matter how you glue it together. This in part 
happens because of the monotone verb ADD. The following shows a simpler, and 
more plausible problem

SUBTRACT a FROM b
   ON SIZE ERROR
       ADD 1 to error-counter
    NOT ON SIZE ERROR
       ADD 2 to success-counter
END-SUBTRACT.

The problem is that if you are recursing with concurrent scopes the NOT ON 
SIZE ERROR could glue in or it could glue out. If it is illegal to have a 
conditional on a conditional clause,
it is illegal to code

       ADD 1 to error-counter  NOT ON SIZE ERROR ADD 2 to success-counter

onto the first clause because that tiumvirate is conditional. To see it as an 
error you must assume the coder made a mistake. But the coder knows the 
standard, why would she code an mistake? So, in this case (and not in the 
three deep cases), you have a choice. There is an interpretation that 
resolves it as correct.

The problem is that  "ADD 1 to error-counter" _is_ an imperative statement. 
That is the problem. The bind point is a fantastic challenge for a compiler 
writer. It can be a fixation. But it utterly destroys the ability to see the 
problem. 

"ADD 1 to error-counter" _is_ an imperative statement!

If we have a parser that has concurrent scopes it is possible to bind that 
NOT ON SIZE ERROR clause back to the SUBTRACT.

I am now much less convinced that we should. But just to keep us on the track 
of monitoring the code base correctly. I insist that the tests are too open. 
Three deep is not the problem. Stated as a standards issue, there is no 
requirement on the fashion in which you slam dunk a three deep problem. You 
can say the inner arithmetic is busted or the outer or you can drift off 
happily into "unexpected" somnabulism, it matters not. When you are three 
deep with no explicit scope terminator in site it is wrong.  In fact you can 
only contemplate that as relevant if you do misunderstand the standard and 
commence to bind to the inner. You cannot bind to the inner even once if it 
is an open ended conditional.

This is not associativity.

The issue is a matter of concurrent scope. We need concurrent scope so that 
we can do the deep nesting of END-xyz hardened conditionals.  Those who have 
ecperimented with that then find that they can just toggle a sensor when they 
'extend' COBOL to allow the inner to be conditional, the problem is that 
'extends' infinitely. (or, of course, the feature is that it extends 
infinitely). We actually do not want that, IMHO. There isn't much use for it. 
But please see my actual point; it is certainly just as valid to want the 
thing to go one level deep with one interpretation or the other (I am not 
advocating my point of view as though I like the bind back interpretation, my 
concern had been more the possibility of either changed semantics in a 
migration from other platforms, or, worse, different semantics in the field 
already  creating a harder target to hit: but all with just the last two 
level competing not a dragon).

For my own part I have back tracked to see how I got to where this seems so 
important. I have a song to sing but it is cacophonous to do so while eating 
crow.

What I think happenend starts with IBM's OS/VS COBOL. I believe that my 
understanding or misunderstanding of the nested conditional starts there, but 
I am unsure because I cannot recover a reference manual. And lurkers are 
welcome to clarify this. I think that that compiler allowed NOT alternatives 
on some conditionals. I think that simple arithmetics were allowed on 
conditional clause. I do not know how they resolved ambiguities. But my 
orientation to this problem comes from that time, so either I always 
misunderstood, or there was a presumption in that compiler that an arithmetic 
on a conditional clause was a simple arithmetic. There was no such thing as 
an END-arith. I assume that the NOT alternate clause was an IBM extension. I 
am not sure of that.

For clarity it is important to understand that this early compiler went 
through release stages, the best was Realease 2.

At this point I should say that old code never dies we just keep regressing 
new compilers. Current managers enjoy gifting conversion efforts to the next 
generation. IBM released a VS COBOL II compiler which was a sequence of 
products that lead stepwise to COBOL '85. The earliest version of this did 
not support NOT alternatives. That is actually a cleansing phase. If you 
wanted to use this unparameterized, you NOT clauses died. But inertia is 
important in legacy code markets, so Big Blue offered a COMPR2 parameter that 
allowed you to make this new compiler accept the old syntax of Release 2 of 
the previous compiler. That bought time to convert your code. You could 
actually use the new compiler and run time, and figure out how you finance 
conversion later. The dropping of NOT was probably not much of an issue 
compared to other syntax changes that would be required to compile without 
the regressing parameter. 

But IBM got off the hook here. What ever the semantics of nesting in 
proximity to the earlier NOT, it did not matter. The new compiler in its 
early stages dropped the syntax. That is documented in there migration 
manuals (but like all extensions that died when you move forward you have to 
dig deep to find the notations). Some portion of IBM's code base then was 
potentially cleansed of any NOT conditional clause. But lots of folks did not 
do that so a subtle shell game ensued.

Later stages of VS COBOL II move into COBOL '85 syntax.  Now NOT clause we 
'added' to the compiler when it was unclamped (that is 'new' syntax, 
NOCMPR2). Some portion of the code base for this vendor leap froged the early 
version of the compiler that would have thrown the NOT conditional out. Now 
the coder was plunged head first into a new semantic world. This portion of 
the code base either 1) got repaired to deal with new semantics 2) had 
perhaps few incidences relating to our 'alternative bind point' interactions, 
or 3) is in production and noone has ever seen or notice a problem, this is 
exceptions within exceptions after all, or 4) no one cares about the very 
rare strange behavior that could be occuring in the new semantics context 
(note well that the original code might even have been wrong).

But the vendor was probably neither obligated to flag the ambiguous bind 
point, nor discourse about extensions. The upgrade to the better VS COBOL II 
was from the lower VS COBOL II, and there was no NOT clause in the lower 
release. The cleansing phase of product deployment created two different 
migration descriptions. In the second migration there was no loss of NOT or 
change of NOT, it simply was 'new'.

However, inertia still dominates this picture. By the time the millenium 
threshold approaches there is yet another set of products available from this 
important vendor that are sometimes called coloquially COBOL THREE, or COBOL 
for this and that. These are COBOL '85 like the higher levels of the 
preceeding product, and have the NOT clauses. A very large number of shops 
leap frog over intermediates and land in COBOL THREE land.

What I think this means is that that portion of the code base a) has few 
nested conditionals, b) partly was cleansed (an maybe reborn), c) is mostly 
originated in a COBOL '85 context.

The NOT alternatives in the early compiler, OS/VS COBOL, may be the source of 
some of the mimmickry that a few of you have discussed in PC platform 
products.

At any rate it is the simple cases that matter. Like

INITIALIZE file-statuses
PERFORM UNTIL file-2-file-status-EOF
   READ primary-file
      AT END
            MOVE 2 to file-layout-flag
            READ second-file
      NOT AT END
            MOVE 1 TO file-layout-flag
   END-READ 
   IF file-layout-flag = 1
       PERFORM layout-1-interpretation
   ELSE
       PERFORM layout-2-interpretation
   END-IF
END-PERFORM.

It is not inconceivale that INVALID KEY and NOT INVALID KEY clause could 
drift into this arrangement. Certainly modern disciplines would lead 
programmers into structuring this. But not necessarily.

And though I would say that the simpler two-deep examples are more useful as 
code base reviews for the conditionals, and that varying semantics are 
extremely important issues for assisting decision makers in choosing this 
projects compilers; the issue looks a lot smaller to me now.

But just saying that my mind is changed in the sense of the urgency of this 
matter is not enough. Nor does the verbage about the early semantics in IBM's 
COBOL '74 compiler language extensions engage the more important issue.

I think that I have waste peoples time here.  And I do regret that. 

The modern compilers are probably not binding back, they are probably binding 
in as an 'extension', or just barfing when lost. I think that for some of the 
compilers that the deeply nested open conditonals that have incidentally a 
closing END-xyz that matches the inner and some outer scope are actually 
probably broken, but that does not matter because as long as you are three 
deep in one interpretation or the other there is no requirement to compile 
successfully.

I think that some very small portion of code may be present that actually 
survived from earlier semantics that leap frogged cleansing product releases 
and functions under new semantics. Exactly how much variation there might be 
in the products I don't know. All of these clauses deal with exceptions, and 
so freequently we are into an exception within an exception at run time if 
the program has nested conditionals.  That in general is a rare approach. The 
concern would only be really important in rare cercumstances if _we_ were to 
change a programs semantics _silently_. ((It is a long song sung above, but I 
think that happened for some leap froggers.)) If the code base product array 
to day is consistent on the simple two deep 'ambiguous' code situation, then 
we have essential one target.

The code base product array would be consistent if there are some that 
disallow the syntax and only other that bind in just one way. If we have 
different bind points in the various production compilers we have a 
convergence problem. That problem is obscured by run-on test cases  We need 
to be just two levels deep and separate sentences with implicit scope 
terminators like the period.

Separately we might find, and it is probably not worth researching, that 
available products actually are quite different in the heterogenous embedding 
condition. For example conditional arithmetic on the end of conditional I/Os. 
 The reason for that speculation is that limitations on budget would lead to 
short cuts.  I think that the decision to bind the loose arithmetic 
condtional is a short cut actually. It is the junk in the standard about some 
previous that does not yet have the clause that gets people into chopping or 
attempting heroics.

So in the context of simple test cases you may yet find compilers that bind 
back to the outer conditional. But I am increasingly convince that large 
portions of the mainframe stuff went through either a cleansing sequence or 
were reborn with new semantics that can be predated to any potential 
migration to the compiler to be constructed here.

There is still a need to be able to deeply recurse conditionals that are 
hardended with explicit scope terminators. All of the conditionals can 
recurse within one another.

Bob Rayhawk

 






























Yet we are already nearing a concensus. 






--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.
Prev by Date: Re: gnubol: Re: ref touch points
Next by Date: Qualification (was RE: gnubol: Re: ref touch points
Prev by thread: RE: gnubol: How do we parse this language, anyway?
Next by thread: RE: gnubol: How do we parse this language, anyway?
Index(es):
- Date
- Thread