[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: Re: Cost of Backtracking

To: gnu-cobol@lusars.net
Subject: Re: gnubol: Re: Cost of Backtracking
From: RKRayhawk@aol.com
Date: Sun, 14 Nov 1999 22:13:57 EST
Delivered-To: gnu-cobol-outgoing@wallace.lusars.net
Reply-To: gnu-cobol@lusars.net
Sender: owner-gnu-cobol@wallace.lusars.net

In a message dated 11/14/99 3:13:16 PM EST, 
Tim Josling,tej@melbpc.org.au, writes:
concerning backtracking
<< 
 1000 lines of 
    a of b of c.
    a of b of c (e).
    a of b of c (f:g).
    a of b of c (e) (f:g).
 (ie 250 occurrences of each line).
 >>

This is a continuation of excellent work to keep tabs on a design commitment. 
Per the actual tool, PCCTS, what is the cost of the predicate technology? 
Note that the current scheme is to clamp k to 1. So we are not really 
backtracking, we  are not really doing infinite lookahead. We are .... I will 
name that lower down in the post.

As the product matures, the list of predicates may grow. Are we not engaging 
them on quite a numerber of amibigiuity nodes?

But still, btyacc is an excellent instant calculator-like test for 
considerations that we can 
capsulize, although not all of us can do it  as cogently as has Tim Josling. 
But the real parser must go after errors. Is it unreasonable to suggest that 
there will be a rule to catch a backward expression by a  junior programmer 
such as  

  data (refmod) (subscript). 

Will you not need some rules in there to catch that? I mean generally you 
don't really think
that you will just have positive logic productions do you?

Perhaps a quick test of the same subset of 4 x 250 inpur records, but against 
a set of rules enhanced to trap errors, and, maybe a rule or two for 
strangeness like 
  data (subscript) (subscript) (subscript) 
for transferees from the peecee weenee department who don't quite get the 
gist of COBOL
yet.  Test with expanded rules to diagnose errors, and stay on your feet. Use 
the same 
test data.

Now send in some junk, such as evil stuff to conform to those rules, and some 
other 
stuff like
 data (sub1) of data (sub2) of data (sub3) (refmod)
if you can stay on your feet at all push some of that to your current counts. 
See what happens.

Did you say 1000? Send in 50,000 or a 100,000 and have real apps going 
against your 
HD resources, be sure to add error message output from your own compiler to 
the resource drain picture. Typically business environments do not load COBOL 
compilers into idle CPUs.

I make these suggestion with sincere hopes of continued good results.

Do try to extend the length of your data_names, you are cheating. The 
_elapsed_ time is 
truely going to be dependent upon how fast you can get back to the I/O 
channel. The longer
it takes you to get back to the channel (or the PC backplane, or the network) 
the more likely
it is statistically to be in the worst place for your request by then.  Keep 
the lexer in realistic
recurses.

Now, for naming things. But this is not name calling it is sincere 
interaction. This is meant
really positively.  If anyone is in a sensitive mood, comeback later to this.

The use of PCCTS predicates is equivalent functionally to a hack in the lexer 
to parser interface.  (please, read that sentence carefully, it is an 
analogy. I know that the predicate mechinery is in the parser algorithm.)   
It is  just comprehended by a much smaller number of contributors. Lexer 
hacks, in a funny sense, would be more portable across alternative parser 
tools, because we would have  project ownership of the source; but if we 
consider jumping  from the PCCTS ship we do not necessarily have friendly
seas even with btyacc and or any LL parser.  Use of predicates in the 
technology of the parser tool represent a dependency, an inflexibility, a 
loss of choice.

They are awesome. They are also apparently working. But appearances _might_ 
be deceiving.

If the goal is a compiler and not a COBOL to C translator, _T_H_E_N_ our task 
_I_S_ error
diagnosing, not valid program code gen.

Add error production rules, just a few and see if your backtracking stats 
reveal anything.

Then, and I am a lone wolf, but howl, howl, howl, add enough error 
productions to see, just 
to see mind 'ya, what it means if we have twice as many error productions as 
positive logic
productions.

Put some stuff in there for function invocations. In think the terminal 
symbol FUNCTION will flat out convice some such expresions would be one step 
away form the point of the  sketched experiment. But hey.  Drop the token! 
Now that _will_  happen in real programs.  Code a small fat set of error 
productions to catch that! I mean for example
write a rule to trap MID(1 2 3), as in
   MOVE MID(1 2 3) TO your-favoriter_long_name
where the irksome FUNCTION token go forgot (or angrilly left out).

Tell me how any parsers will distinguish that anyway from a subscripting data 
reference.
Surely we want to be able to get to the date functions for example. Or did we 
plan to release our  first version in 1999, with enhancement a little later?  
If you don't manifest type in the 
data references (that is you just go ahead and reduce MID(1 2 3) as a 
subscripted reference)
you have only left it the the actions or to semantics. Every action then will 
be very busy
with the same consideration. Your 'lookahead' and 'backtracking' have just 
enabled your
self deception.  Instead of doing the work in the grammar rules you will do 
it in the actions and/or semantics. Ah but ..., don't take all that as 
important. Just go back to
data, subscript and reference modification and spin it around some and lay in 
some
error productions. Tell us what happens on backtracking... same data (same 
source code).

And after you do these kinds of things extrapolate. Not a thousand lines, but 
tens of thousands times hundreds of modules. On busy systems. If you got .21 
to go to .24 with
easy variations with the available optimistic grammars, then try some grammar 
rules that
deal with the programs that are really out there. 

I am thinking about this in a way that is different than most posters. Please 
excuse the abruptness that might be perceived in these expressions.

Best Wishes,
Bob Rayhawk
RKRayhawk@aol.com

--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.

Follow-Ups:
- Re: gnubol: Re: Cost of Backtracking
  - From: Tim Josling <tej@melbpc.org.au>

Prev by Date: gnubol: x/open information
Next by Date: gnubol: DIVISION/SECTION headers.
Prev by thread: Re: gnubol: DIVISION/SECTION headers.
Next by thread: Re: gnubol: Re: Cost of Backtracking
Index(es):
- Date
- Thread