[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gnubol: Re: which parser tool

To: gnu-cobol@lusars.net
Subject: gnubol: Re: which parser tool
From: RKRayhawk@aol.com
Date: Sat, 20 Nov 1999 14:56:09 EST
Delivered-To: gnu-cobol-outgoing@wallace.lusars.net
Reply-To: gnu-cobol@lusars.net
Sender: owner-gnu-cobol@wallace.lusars.net
In a message dated 11/20/99 9:28:19 AM EST, mck@tivoli.mv.com writes:

<< 
 I don't want to appear to be on the stump for pccts again 
...
>>


Advocacy of each tool is, IMHO, highly relevant now. Any advocate should also 
be free to openly review and appreciate more than one alternative. Please do 
advocate!

The basic concern I have with PCCTS is that it is left-hand reduction. I 
acually don't know that that is bad. But it seems to me that we are lost if 
we reduce on the left and encounter junk. We are lost in the sense that the 
earlier reductions may have commited emissions to semantics, which at best 
become orphanned partial sequences. 

To try to avoid that we must discipline ourselves to put the actions only at 
the real end of a complete syntactic group. That just becomes right hand 
reduction in disguise.

To be honest the problem is almost within my grasp but not quite.  If we do 
not actually exploit a lookahead greater than k=1, then the basic difference 
between LH and RH reduction is that RH reduction is using a stack to preserve 
what it has not yet completely reduced. That internal data structure suggest 
to me a better possibility of error handling. ((There are other features of 
specific tools that go beyond this basic difference, which are great to hear 
about)).  But like I say this is just out of reach for me, I can not say that 
I see clearly that the presence of the stack positions us to be more robust, 
but I do _feel_ like it.

A high level consideration here is how much specificity for errors do we want 
in the parser(s). That discussion is ongoing.  But I think it is true that if 
my view is not taken, and therefore if we have fairly minimum specificity for 
errors in the parser(s) this matter of the value of the stack to error 
handling infact would not matter.

My view of a COBOL program is very different than that implied by some other 
commenters.
I do not view a COBOL program as something that might be valid, I view it as 
something that is almost certainly wrong. I have said enough about all that, 
but perhaps this will help.

All of the available tools get us into the really silly fantasy that the 
rules of our grammar are a hierarchy. The industry nomenclature requires that 
the highest level of the hierarchy be called the start rule. It is commonly 
quipped that in right-hand reducers the top is really the stop rule. But 
those who join this interaction are sufficiently astute to understand that it 
is all the same.

But you see, the entire concept is inapplicable.  Hierarchies are great. the 
ability to subsume rules into other rules is fine. But that is not what is 
out there.

There have been a few posts that pointed to and commented on the syntax rules 
available at the University of Amsterdam. That is a great example.  Really 
good.  Its big.  It has one summary comment about 1100 rules, if I remember 
correctly.  But note that it is entirely an optimistic grammar. 

It is a brilliant piece of work. But it is intended as an aid for remediation 
work (relating to Y2K).  It can legitimately assume programs are reasonably 
close to correct.  The arithmetic productions each errors like air. That is 
fine for what they are up to.

But real programs have clauses in the wrong sequence, clauses occuring the 
wrong number of times, wrong statements imbedded within other statements, 
damaged code that could orphan subsequent valid code.

The thing I am after is this notion of the hierarchy leading naturally up.  
Wrong! Not in real code!  The hierarchies do not lead up to the top, and they 
frequently do not lead up properly in their own neighborhood. 

So I end up in the position that stands on the foundation of considerable 
error productions (which some dissagree with). And my exact sense, very 
generally, is that we can only use the available tools a little bit.  And in 
that sense, I consider what is the best underlying algorithm replete with 
specific internal data structures, if our objective is to handle whatever 
syntax in whatever order.  That is much different than an optimistic grammar.

I _feel_ like the basic notions of the lookahead, backtraking, and maybe the 
error recovery of some tools misleads us.  But obviously I have some homework 
to do on PCCTS. I just believe that we are going to have to right the code 
that keeps the parser(s) on their feet when real programs present us with 
seriously disrutpted syntax.

I guess that I can fling this our there for discussion in another way.  The 
programs I have seen have one awesome amount of nested conditionals. Mostly 
this is the IF-THEN-ELSE and the EVALUATE-WHEN kind, but of late folks have 
hung grossly nested blocks off of AT END and NOT AT END dealios which by the 
way have no particular sequence requirement (and thus defy 'associativity').  
So the problem is that from any given point of disruption we could have a 
very long way to go to get the parser back on it's feet.

In this regard I am saying that our underlying concept that hand coding the 
parser is inconceivable represents a major issue. It is not the 
left-handedness or the right-handedness that is a issue for me, or the extra 
features (since many of them I don't fully understand yet).
The issue is this nieve paradigm of smooth hierarchical topologies.  That is 
simply not what is out there. Many programs contain serious syntactic 
disruptions. So the question becomes, if this actually means, as I suggest, 
that we can only get a limmitted result from the available tools, which 
category then is best for error detection LL or LR? Trust me, I am not at all 
sure. But the presence of the stack seems to mean that a heavily nested 
source image might have the latter portion of its text processed more 
successfully, because we will still have something around to glue the 
recurse-exiting clauses when we get down to them.

My idea of the parser is that it basically says, 
 - give me a word I will try to understand it and separately I will try to 
glue it to a phrase; 
- give me a phrase I will try to understand it and separately I will try to 
glue it to a clause; 

and so on up to a clause, up to a statement.

That much we can see in the available tools (LL & LR). But I emphasize that 
the attempt to glue something is distinct from merely recognizing and 
syntactically understand it. The tools do not keep that distinct. The 
hierarchies destroy the distinction. Recognize and glue in are the same.

It is certainly the case that I am aware of some theorethic aspect of the 
pre-Backus-Naur  considerations of COBOL and am just cathecting the living 
daylights out of it. But I am not so interested in the 
not-even-nonassociativity of COBOL per standard. It is the actual fact that 
programs are frequently messed up. And the mess occurs not rearely in a deep 
nest.

It is as though the available code describes a high probability that 
optimistic grammars will fly off the edge fairly typically. The most 
important feature of optimistic grammars, IMHO, is the nieve 
hierarchicization of the syntax. (now how about that for a koinage!).

I think that a form of mini-ScoreCarding could help at the syntax level, just 
as I believe that any outter major-ScoreCarding is relevant to the parser(s) 
mainline sizing of the worksheet it gets from the preprocessor.  But comments 
on that would be useful only if we will consider that maybe the programs 
won't fit into the optimistic hierarchies, so I wait to see if any one cares. 
 

So where I get to is that I think, the tool can not handle the errors for us, 
but I have lots of homework to do on PCCTS.  I think the tools blind us to 
errors, because of the inapplicability of the start-rule idea, and its cousin 
the simplistic mid rule hierarchy practice. The code is just not like that. 
We can get anything and everything, and we can get it in any sequence.
This I think leads to a certain minimum use of available tools. A preference 
for LL/LR then looks like a different decision. My priority is competent 
error handling (not everyone agrees). But as we progress, if error handling 
is a priority, and if tools are used somewhat minimally, then does the stack 
of right-hand reducers position us better?


Best Wishes
Bob Rayhawk
RKRayhawk@aol.com

--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.
Follow-Ups:
- Re: gnubol: Re: which parser tool
  - From: Tim Josling <TIMJOSLING@prodigy.net>
Prev by Date: Re: gnubol: procedure types
Next by Date: Re: gnubol: subsets
Prev by thread: Re: gnubol: Re: which parser tool
Next by thread: Re: gnubol: Re: which parser tool
Index(es):
- Date
- Thread