[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnubol: Re: Cost of Backtracking
In a message dated 11/14/99 3:13:16 PM EST,
Tim Josling,tej@melbpc.org.au, writes:
concerning backtracking
<<
1000 lines of
a of b of c.
a of b of c (e).
a of b of c (f:g).
a of b of c (e) (f:g).
(ie 250 occurrences of each line).
>>
This is a continuation of excellent work to keep tabs on a design commitment.
Per the actual tool, PCCTS, what is the cost of the predicate technology?
Note that the current scheme is to clamp k to 1. So we are not really
backtracking, we are not really doing infinite lookahead. We are .... I will
name that lower down in the post.
As the product matures, the list of predicates may grow. Are we not engaging
them on quite a numerber of amibigiuity nodes?
But still, btyacc is an excellent instant calculator-like test for
considerations that we can
capsulize, although not all of us can do it as cogently as has Tim Josling.
But the real parser must go after errors. Is it unreasonable to suggest that
there will be a rule to catch a backward expression by a junior programmer
such as
data (refmod) (subscript).
Will you not need some rules in there to catch that? I mean generally you
don't really think
that you will just have positive logic productions do you?
Perhaps a quick test of the same subset of 4 x 250 inpur records, but against
a set of rules enhanced to trap errors, and, maybe a rule or two for
strangeness like
data (subscript) (subscript) (subscript)
for transferees from the peecee weenee department who don't quite get the
gist of COBOL
yet. Test with expanded rules to diagnose errors, and stay on your feet. Use
the same
test data.
Now send in some junk, such as evil stuff to conform to those rules, and some
other
stuff like
data (sub1) of data (sub2) of data (sub3) (refmod)
if you can stay on your feet at all push some of that to your current counts.
See what happens.
Did you say 1000? Send in 50,000 or a 100,000 and have real apps going
against your
HD resources, be sure to add error message output from your own compiler to
the resource drain picture. Typically business environments do not load COBOL
compilers into idle CPUs.
I make these suggestion with sincere hopes of continued good results.
Do try to extend the length of your data_names, you are cheating. The
_elapsed_ time is
truely going to be dependent upon how fast you can get back to the I/O
channel. The longer
it takes you to get back to the channel (or the PC backplane, or the network)
the more likely
it is statistically to be in the worst place for your request by then. Keep
the lexer in realistic
recurses.
Now, for naming things. But this is not name calling it is sincere
interaction. This is meant
really positively. If anyone is in a sensitive mood, comeback later to this.
The use of PCCTS predicates is equivalent functionally to a hack in the lexer
to parser interface. (please, read that sentence carefully, it is an
analogy. I know that the predicate mechinery is in the parser algorithm.)
It is just comprehended by a much smaller number of contributors. Lexer
hacks, in a funny sense, would be more portable across alternative parser
tools, because we would have project ownership of the source; but if we
consider jumping from the PCCTS ship we do not necessarily have friendly
seas even with btyacc and or any LL parser. Use of predicates in the
technology of the parser tool represent a dependency, an inflexibility, a
loss of choice.
They are awesome. They are also apparently working. But appearances _might_
be deceiving.
If the goal is a compiler and not a COBOL to C translator, _T_H_E_N_ our task
_I_S_ error
diagnosing, not valid program code gen.
Add error production rules, just a few and see if your backtracking stats
reveal anything.
Then, and I am a lone wolf, but howl, howl, howl, add enough error
productions to see, just
to see mind 'ya, what it means if we have twice as many error productions as
positive logic
productions.
Put some stuff in there for function invocations. In think the terminal
symbol FUNCTION will flat out convice some such expresions would be one step
away form the point of the sketched experiment. But hey. Drop the token!
Now that _will_ happen in real programs. Code a small fat set of error
productions to catch that! I mean for example
write a rule to trap MID(1 2 3), as in
MOVE MID(1 2 3) TO your-favoriter_long_name
where the irksome FUNCTION token go forgot (or angrilly left out).
Tell me how any parsers will distinguish that anyway from a subscripting data
reference.
Surely we want to be able to get to the date functions for example. Or did we
plan to release our first version in 1999, with enhancement a little later?
If you don't manifest type in the
data references (that is you just go ahead and reduce MID(1 2 3) as a
subscripted reference)
you have only left it the the actions or to semantics. Every action then will
be very busy
with the same consideration. Your 'lookahead' and 'backtracking' have just
enabled your
self deception. Instead of doing the work in the grammar rules you will do
it in the actions and/or semantics. Ah but ..., don't take all that as
important. Just go back to
data, subscript and reference modification and spin it around some and lay in
some
error productions. Tell us what happens on backtracking... same data (same
source code).
And after you do these kinds of things extrapolate. Not a thousand lines, but
tens of thousands times hundreds of modules. On busy systems. If you got .21
to go to .24 with
easy variations with the available optimistic grammars, then try some grammar
rules that
deal with the programs that are really out there.
I am thinking about this in a way that is different than most posters. Please
excuse the abruptness that might be perceived in these expressions.
Best Wishes,
Bob Rayhawk
RKRayhawk@aol.com
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.