[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gnubol: Re: which parser tool
In a message dated 11/20/99 9:28:19 AM EST, mck@tivoli.mv.com writes:
<<
I don't want to appear to be on the stump for pccts again
...
>>
Advocacy of each tool is, IMHO, highly relevant now. Any advocate should also
be free to openly review and appreciate more than one alternative. Please do
advocate!
The basic concern I have with PCCTS is that it is left-hand reduction. I
acually don't know that that is bad. But it seems to me that we are lost if
we reduce on the left and encounter junk. We are lost in the sense that the
earlier reductions may have commited emissions to semantics, which at best
become orphanned partial sequences.
To try to avoid that we must discipline ourselves to put the actions only at
the real end of a complete syntactic group. That just becomes right hand
reduction in disguise.
To be honest the problem is almost within my grasp but not quite. If we do
not actually exploit a lookahead greater than k=1, then the basic difference
between LH and RH reduction is that RH reduction is using a stack to preserve
what it has not yet completely reduced. That internal data structure suggest
to me a better possibility of error handling. ((There are other features of
specific tools that go beyond this basic difference, which are great to hear
about)). But like I say this is just out of reach for me, I can not say that
I see clearly that the presence of the stack positions us to be more robust,
but I do _feel_ like it.
A high level consideration here is how much specificity for errors do we want
in the parser(s). That discussion is ongoing. But I think it is true that if
my view is not taken, and therefore if we have fairly minimum specificity for
errors in the parser(s) this matter of the value of the stack to error
handling infact would not matter.
My view of a COBOL program is very different than that implied by some other
commenters.
I do not view a COBOL program as something that might be valid, I view it as
something that is almost certainly wrong. I have said enough about all that,
but perhaps this will help.
All of the available tools get us into the really silly fantasy that the
rules of our grammar are a hierarchy. The industry nomenclature requires that
the highest level of the hierarchy be called the start rule. It is commonly
quipped that in right-hand reducers the top is really the stop rule. But
those who join this interaction are sufficiently astute to understand that it
is all the same.
But you see, the entire concept is inapplicable. Hierarchies are great. the
ability to subsume rules into other rules is fine. But that is not what is
out there.
There have been a few posts that pointed to and commented on the syntax rules
available at the University of Amsterdam. That is a great example. Really
good. Its big. It has one summary comment about 1100 rules, if I remember
correctly. But note that it is entirely an optimistic grammar.
It is a brilliant piece of work. But it is intended as an aid for remediation
work (relating to Y2K). It can legitimately assume programs are reasonably
close to correct. The arithmetic productions each errors like air. That is
fine for what they are up to.
But real programs have clauses in the wrong sequence, clauses occuring the
wrong number of times, wrong statements imbedded within other statements,
damaged code that could orphan subsequent valid code.
The thing I am after is this notion of the hierarchy leading naturally up.
Wrong! Not in real code! The hierarchies do not lead up to the top, and they
frequently do not lead up properly in their own neighborhood.
So I end up in the position that stands on the foundation of considerable
error productions (which some dissagree with). And my exact sense, very
generally, is that we can only use the available tools a little bit. And in
that sense, I consider what is the best underlying algorithm replete with
specific internal data structures, if our objective is to handle whatever
syntax in whatever order. That is much different than an optimistic grammar.
I _feel_ like the basic notions of the lookahead, backtraking, and maybe the
error recovery of some tools misleads us. But obviously I have some homework
to do on PCCTS. I just believe that we are going to have to right the code
that keeps the parser(s) on their feet when real programs present us with
seriously disrutpted syntax.
I guess that I can fling this our there for discussion in another way. The
programs I have seen have one awesome amount of nested conditionals. Mostly
this is the IF-THEN-ELSE and the EVALUATE-WHEN kind, but of late folks have
hung grossly nested blocks off of AT END and NOT AT END dealios which by the
way have no particular sequence requirement (and thus defy 'associativity').
So the problem is that from any given point of disruption we could have a
very long way to go to get the parser back on it's feet.
In this regard I am saying that our underlying concept that hand coding the
parser is inconceivable represents a major issue. It is not the
left-handedness or the right-handedness that is a issue for me, or the extra
features (since many of them I don't fully understand yet).
The issue is this nieve paradigm of smooth hierarchical topologies. That is
simply not what is out there. Many programs contain serious syntactic
disruptions. So the question becomes, if this actually means, as I suggest,
that we can only get a limmitted result from the available tools, which
category then is best for error detection LL or LR? Trust me, I am not at all
sure. But the presence of the stack seems to mean that a heavily nested
source image might have the latter portion of its text processed more
successfully, because we will still have something around to glue the
recurse-exiting clauses when we get down to them.
My idea of the parser is that it basically says,
- give me a word I will try to understand it and separately I will try to
glue it to a phrase;
- give me a phrase I will try to understand it and separately I will try to
glue it to a clause;
and so on up to a clause, up to a statement.
That much we can see in the available tools (LL & LR). But I emphasize that
the attempt to glue something is distinct from merely recognizing and
syntactically understand it. The tools do not keep that distinct. The
hierarchies destroy the distinction. Recognize and glue in are the same.
It is certainly the case that I am aware of some theorethic aspect of the
pre-Backus-Naur considerations of COBOL and am just cathecting the living
daylights out of it. But I am not so interested in the
not-even-nonassociativity of COBOL per standard. It is the actual fact that
programs are frequently messed up. And the mess occurs not rearely in a deep
nest.
It is as though the available code describes a high probability that
optimistic grammars will fly off the edge fairly typically. The most
important feature of optimistic grammars, IMHO, is the nieve
hierarchicization of the syntax. (now how about that for a koinage!).
I think that a form of mini-ScoreCarding could help at the syntax level, just
as I believe that any outter major-ScoreCarding is relevant to the parser(s)
mainline sizing of the worksheet it gets from the preprocessor. But comments
on that would be useful only if we will consider that maybe the programs
won't fit into the optimistic hierarchies, so I wait to see if any one cares.
So where I get to is that I think, the tool can not handle the errors for us,
but I have lots of homework to do on PCCTS. I think the tools blind us to
errors, because of the inapplicability of the start-rule idea, and its cousin
the simplistic mid rule hierarchy practice. The code is just not like that.
We can get anything and everything, and we can get it in any sequence.
This I think leads to a certain minimum use of available tools. A preference
for LL/LR then looks like a different decision. My priority is competent
error handling (not everyone agrees). But as we progress, if error handling
is a priority, and if tools are used somewhat minimally, then does the stack
of right-hand reducers position us better?
Best Wishes
Bob Rayhawk
RKRayhawk@aol.com
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.