[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gnubol: Re: which parser tool




On Date 10/31/99 1:51:52 PM EST
Michael McKernan, mck@tivoli.mv.com writes

in part 
<<
Pccts is the exception, .... It _does_ have infinite look-ahead. That sounds
expensive, but it's only used when it's required so as a practical matter,
it isn't a problem
>>


The issue is epsilon. Excellent syntaxtic sugar like 
curly brackets {} as in

    refmod  : OPAREN arith_expr COLON { arith_expr } CPAREN

does not create a node for arith_expr and then a non-node that is not 
anything but is instead FREE nothing.  No no no! It creates epsilon which is
actual. There are a couple of ways to express these optionals in the
PCCTS language: such as zero or one - {}, 
or zero or more {}*. (Don't get mad with me if my PCCTS is not perfectly 
presented).

These are generating an epsilon entry in the table each time. 
The syntax is defintely pulling the wool over your eyes. 
The part ...{rule_name}... is not a node, it is two nodes. The problem is 
not one node or two nodes, the problem is that the extra node is epsilon all
over the place, in nearly any grammar using this tool; but ESPECIALLY COBOL
because of the optionality of nearly every clause and phrase.

This problem is deep, but quite real. Any rule which is optional is 
not a rule that is ambiguous. Including such a rule optionally (such as zero 
or 
one times, of zero or more times), does not in and of itself create an 
ambiguity either.

It is when the rule that includes another rule optionally is included in 
yet another rule that the ambiguities burst forth. In a COBOL grammar we 
do that alot. When you use more rigid parsers you get the chatter from the 
gen and you go back and flatten rules, enumerating possibile combinations, 
one of which simply lacks the optional rule - which is categorically 
different from epsilon.

When you use PCCTS, you get either no feed back on what you have done; or 
else you develop a style of tollerating numerous informational diagnostics.

This is not intended to pick on any one person or a particular phrase; there
are numerous posts in the project with phrases similar to Michael's, to the
effect " ... it's only used when it's required ... ". What may not be clear
to some of you is that you are using it very, very frequently. I think I did 
not use enough very's in that statement.

Whatever they did in PCCTS they cannot have avoided epsilon. I am pretty 
confident that there are gobs of ambiguities in the parser tables we are
genning so far. Lookahead and backtracking is engaged most of the time.
The ambiguities will increase as the project progresses.

What I am thinking of is large applications with numerous large programs. 
Backtracking is an issue, and C intermediate code is an issue. I am not 
saying that any contributor should do anything different right now. It is
so valuable to gather the fruits of these efforts. Yet there is this step to 
the next development phase; and there is an openness to discussing parser
tool selection. So I am trying to approach the issue where it really counts.

There is an underlying unarticulated assumption that you are not engaging 
the special aspects of PCCTS frequently. I am confident that that is wrong.

The collection of rules for COBOL are horrendously ambiguous because of 
optionality. The is the nature of COBOL. But this meaning of the term 
ambiguous is pedestrian; in and of itself that does not map to 'ambiguous'
as parser tool workers mean it.

If you use a tool that can generate epsilon nodes, and you do 
generate epsilon nodes, then the table _might_ contain 'ambiguities' 
as the parser tool people mean the word. This situation produces error
messages in simpler parsers, but advanced parsers encode it, and later
recognize occurences of the situation and respond (at some expense).

This is happening to us when the epsilon is one of the branches of a rule 
at level three in any given structure (a top structure or a 
substructure).

(For those new to parsers permit me to emphasize that 'ambiguities' in the 
technical sense can  occur in many other ways than epsilon).

So, for example, unless I am really disoriented, because a data reference 
can include an optional OF clause, then the rule's optionality creates an
epsilon on _path_from_ every node that has a data reference. So,
  MOVE data_ref TO data_ref
has ambiguities on both data references for every such legal statement 
(you know, like ADD and clauses, like VARYING phrases), _every_time_ the
higher level rules are coursed to reduction. The fact that gargantuum 
portions of compilable code do not have these OFIN clauses does not free
you from the expense of engaging the infinite lookahead and backtracking
potentiality.

Every such actual program statement
   MOVE simple-name-1 TO simple-name-2.
engages the thing you all are thinking is free, engages it twice.
(some VARYING clauses thrice, more for arithmetic statements that
have lists of data references).

And depending upon how you coded your rules, the full engagement did not 
merely harness optional OFIN, but also optional subsctipt as well as optional
reference modification, as well as the front part of the rule for subsctipt 
followed by reference modification...

I am certain that that is not free.  

We need lookahead. But when you use the fancy device, the algorithm must 
ratchet up the resource consumption when it encounters ambiguous epsilon 
continuations (it is the optionality that does that, although there are 
other ways into the 'ambiguity' game with that tool).

When you use the less happy tool, things are different. You are forced to 
figure out solutions to the tool workers 'ambiguities'. I asure you COBOL 
remains ambiguous as we use the term in English.

I think I am warranted in discoursing on this because of the performance
implications. But there is another point here. When the dumb parsers tell 
you they cannot handle your 'ambiguity' they tell you something much more 
valuable.  We are in trouble in the grammar where we try that. Regrettably,
using PCCTS you are habituating yourselves to being oblivious to these 
situations. We are suppressing valuable information during our development
effort if we tollerate techical 'ambiguities' in the grammar work.


Let me say again this is not aimed at anyone. Picking your editor or 
favorite programming language can be an emotional thing that engages a 
persons identity. So I guess parser tool preference might trigger similar
sensitivity, and sure do hope to avoid that.

Also, PCCTS is an excellent tool. It represents not only a lot of great 
work by a number of generous people, but it is definitely an significant
intellectual achievement. So we all have reason to be greatful for PCCTS.


Best Wishes
Bob Rayhawk
RKRayhawk@aol.com






--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.