[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnubol: How do we parse this language, anyway?
The test case that is being used may well be too much of a worst case. It has
open conditionals on open conditionals on open conditionals. That is not the
problem, if there is a problem. It is the simpler cases that matter. They
are atleast plausible, where as repeated recurse leaving things open is not
the problem.
The problem is that at just one further level, the last arithmetic can appear
to be either a simple or a conditional open ended, depending upon whether you
interpret the claue to its right as belonging to it or its parent.
Regrettably, much of the effort to test this has not helped because it is not
looking at the simple cases that are likely.
The only problem here is at any level of recurse if the last two level are
both open you can bind the last clause in more than one way. If you are three
levels deep and finish open ended there simply must be an open conditional on
the end of something and it would not matter how you argue it should bind
because we do not have to code for that since open conditionals can not live
on the edge of open conditionals.
Conversely if you are deep deep deep and the outer statements are going to
end explicitly, then any depth is okay, any branch is okay ... except if one
has a loose end. A simple arithmetic can be a tight ending! That is the
problem. If you bind the last found clause to the inner verb it can look
conditional, if you bind it to an antecedent, then the inner looks like an
imperative (if you take it as simple).
A difficulty we have with the test is that you are three or more deep with
open ended conditionals, no matter how you glue it together. This in part
happens because of the monotone verb ADD. The following shows a simpler, and
more plausible problem
SUBTRACT a FROM b
ON SIZE ERROR
ADD 1 to error-counter
NOT ON SIZE ERROR
ADD 2 to success-counter
END-SUBTRACT.
The problem is that if you are recursing with concurrent scopes the NOT ON
SIZE ERROR could glue in or it could glue out. If it is illegal to have a
conditional on a conditional clause,
it is illegal to code
ADD 1 to error-counter NOT ON SIZE ERROR ADD 2 to success-counter
onto the first clause because that tiumvirate is conditional. To see it as an
error you must assume the coder made a mistake. But the coder knows the
standard, why would she code an mistake? So, in this case (and not in the
three deep cases), you have a choice. There is an interpretation that
resolves it as correct.
The problem is that "ADD 1 to error-counter" _is_ an imperative statement.
That is the problem. The bind point is a fantastic challenge for a compiler
writer. It can be a fixation. But it utterly destroys the ability to see the
problem.
"ADD 1 to error-counter" _is_ an imperative statement!
If we have a parser that has concurrent scopes it is possible to bind that
NOT ON SIZE ERROR clause back to the SUBTRACT.
I am now much less convinced that we should. But just to keep us on the track
of monitoring the code base correctly. I insist that the tests are too open.
Three deep is not the problem. Stated as a standards issue, there is no
requirement on the fashion in which you slam dunk a three deep problem. You
can say the inner arithmetic is busted or the outer or you can drift off
happily into "unexpected" somnabulism, it matters not. When you are three
deep with no explicit scope terminator in site it is wrong. In fact you can
only contemplate that as relevant if you do misunderstand the standard and
commence to bind to the inner. You cannot bind to the inner even once if it
is an open ended conditional.
This is not associativity.
The issue is a matter of concurrent scope. We need concurrent scope so that
we can do the deep nesting of END-xyz hardened conditionals. Those who have
ecperimented with that then find that they can just toggle a sensor when they
'extend' COBOL to allow the inner to be conditional, the problem is that
'extends' infinitely. (or, of course, the feature is that it extends
infinitely). We actually do not want that, IMHO. There isn't much use for it.
But please see my actual point; it is certainly just as valid to want the
thing to go one level deep with one interpretation or the other (I am not
advocating my point of view as though I like the bind back interpretation, my
concern had been more the possibility of either changed semantics in a
migration from other platforms, or, worse, different semantics in the field
already creating a harder target to hit: but all with just the last two
level competing not a dragon).
For my own part I have back tracked to see how I got to where this seems so
important. I have a song to sing but it is cacophonous to do so while eating
crow.
What I think happenend starts with IBM's OS/VS COBOL. I believe that my
understanding or misunderstanding of the nested conditional starts there, but
I am unsure because I cannot recover a reference manual. And lurkers are
welcome to clarify this. I think that that compiler allowed NOT alternatives
on some conditionals. I think that simple arithmetics were allowed on
conditional clause. I do not know how they resolved ambiguities. But my
orientation to this problem comes from that time, so either I always
misunderstood, or there was a presumption in that compiler that an arithmetic
on a conditional clause was a simple arithmetic. There was no such thing as
an END-arith. I assume that the NOT alternate clause was an IBM extension. I
am not sure of that.
For clarity it is important to understand that this early compiler went
through release stages, the best was Realease 2.
At this point I should say that old code never dies we just keep regressing
new compilers. Current managers enjoy gifting conversion efforts to the next
generation. IBM released a VS COBOL II compiler which was a sequence of
products that lead stepwise to COBOL '85. The earliest version of this did
not support NOT alternatives. That is actually a cleansing phase. If you
wanted to use this unparameterized, you NOT clauses died. But inertia is
important in legacy code markets, so Big Blue offered a COMPR2 parameter that
allowed you to make this new compiler accept the old syntax of Release 2 of
the previous compiler. That bought time to convert your code. You could
actually use the new compiler and run time, and figure out how you finance
conversion later. The dropping of NOT was probably not much of an issue
compared to other syntax changes that would be required to compile without
the regressing parameter.
But IBM got off the hook here. What ever the semantics of nesting in
proximity to the earlier NOT, it did not matter. The new compiler in its
early stages dropped the syntax. That is documented in there migration
manuals (but like all extensions that died when you move forward you have to
dig deep to find the notations). Some portion of IBM's code base then was
potentially cleansed of any NOT conditional clause. But lots of folks did not
do that so a subtle shell game ensued.
Later stages of VS COBOL II move into COBOL '85 syntax. Now NOT clause we
'added' to the compiler when it was unclamped (that is 'new' syntax,
NOCMPR2). Some portion of the code base for this vendor leap froged the early
version of the compiler that would have thrown the NOT conditional out. Now
the coder was plunged head first into a new semantic world. This portion of
the code base either 1) got repaired to deal with new semantics 2) had
perhaps few incidences relating to our 'alternative bind point' interactions,
or 3) is in production and noone has ever seen or notice a problem, this is
exceptions within exceptions after all, or 4) no one cares about the very
rare strange behavior that could be occuring in the new semantics context
(note well that the original code might even have been wrong).
But the vendor was probably neither obligated to flag the ambiguous bind
point, nor discourse about extensions. The upgrade to the better VS COBOL II
was from the lower VS COBOL II, and there was no NOT clause in the lower
release. The cleansing phase of product deployment created two different
migration descriptions. In the second migration there was no loss of NOT or
change of NOT, it simply was 'new'.
However, inertia still dominates this picture. By the time the millenium
threshold approaches there is yet another set of products available from this
important vendor that are sometimes called coloquially COBOL THREE, or COBOL
for this and that. These are COBOL '85 like the higher levels of the
preceeding product, and have the NOT clauses. A very large number of shops
leap frog over intermediates and land in COBOL THREE land.
What I think this means is that that portion of the code base a) has few
nested conditionals, b) partly was cleansed (an maybe reborn), c) is mostly
originated in a COBOL '85 context.
The NOT alternatives in the early compiler, OS/VS COBOL, may be the source of
some of the mimmickry that a few of you have discussed in PC platform
products.
At any rate it is the simple cases that matter. Like
INITIALIZE file-statuses
PERFORM UNTIL file-2-file-status-EOF
READ primary-file
AT END
MOVE 2 to file-layout-flag
READ second-file
NOT AT END
MOVE 1 TO file-layout-flag
END-READ
IF file-layout-flag = 1
PERFORM layout-1-interpretation
ELSE
PERFORM layout-2-interpretation
END-IF
END-PERFORM.
It is not inconceivale that INVALID KEY and NOT INVALID KEY clause could
drift into this arrangement. Certainly modern disciplines would lead
programmers into structuring this. But not necessarily.
And though I would say that the simpler two-deep examples are more useful as
code base reviews for the conditionals, and that varying semantics are
extremely important issues for assisting decision makers in choosing this
projects compilers; the issue looks a lot smaller to me now.
But just saying that my mind is changed in the sense of the urgency of this
matter is not enough. Nor does the verbage about the early semantics in IBM's
COBOL '74 compiler language extensions engage the more important issue.
I think that I have waste peoples time here. And I do regret that.
The modern compilers are probably not binding back, they are probably binding
in as an 'extension', or just barfing when lost. I think that for some of the
compilers that the deeply nested open conditonals that have incidentally a
closing END-xyz that matches the inner and some outer scope are actually
probably broken, but that does not matter because as long as you are three
deep in one interpretation or the other there is no requirement to compile
successfully.
I think that some very small portion of code may be present that actually
survived from earlier semantics that leap frogged cleansing product releases
and functions under new semantics. Exactly how much variation there might be
in the products I don't know. All of these clauses deal with exceptions, and
so freequently we are into an exception within an exception at run time if
the program has nested conditionals. That in general is a rare approach. The
concern would only be really important in rare cercumstances if _we_ were to
change a programs semantics _silently_. ((It is a long song sung above, but I
think that happened for some leap froggers.)) If the code base product array
to day is consistent on the simple two deep 'ambiguous' code situation, then
we have essential one target.
The code base product array would be consistent if there are some that
disallow the syntax and only other that bind in just one way. If we have
different bind points in the various production compilers we have a
convergence problem. That problem is obscured by run-on test cases We need
to be just two levels deep and separate sentences with implicit scope
terminators like the period.
Separately we might find, and it is probably not worth researching, that
available products actually are quite different in the heterogenous embedding
condition. For example conditional arithmetic on the end of conditional I/Os.
The reason for that speculation is that limitations on budget would lead to
short cuts. I think that the decision to bind the loose arithmetic
condtional is a short cut actually. It is the junk in the standard about some
previous that does not yet have the clause that gets people into chopping or
attempting heroics.
So in the context of simple test cases you may yet find compilers that bind
back to the outer conditional. But I am increasingly convince that large
portions of the mainframe stuff went through either a cleansing sequence or
were reborn with new semantics that can be predated to any potential
migration to the compiler to be constructed here.
There is still a need to be able to deeply recurse conditionals that are
hardended with explicit scope terminators. All of the conditionals can
recurse within one another.
Bob Rayhawk
Yet we are already nearing a concensus.
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.