[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnubol: How do we parse this language, anyway?
Well, I'll welcome you back and I hope you'll stay for a while. I
don't know that I can keep you entertained, but your effort deserves
some response.
>>>>> "Randall" == Randall Bart <Barticus@att.net>
>>>>> wrote the following on Sat, 27 Nov 1999 17:29:58 -0800
Randall> I've gotten behind in my email, and if I wait until I
Randall> catch up, I'll never catch up. Here are my responses to
Randall> several messages about parsing (mostly from Bob).
Randall> At 10:53 AM 11/20/99 , RKRayhawk@aol.com wrote:
>> The supposed retained buffers are not going to be retained in a
>> real shared system; rather than us doing the I/O, the OS will be
>> paging. That means that we will loose control of
>> performance. Strategies that hold vast token lists in core
>> exacerbate this. Again the idea is that real systems are
>> shared; in development environments the sharees might each be
>> using our tool which is assuming retained buffers and holding
>> vast token lists. A real regen of a real COBOL application is
>> going to thrash on the virtual memory device. We will not be
>> able to get to the problem. Much of that would not be necessary.
Randall> I started programming in a world where we measured in
Randall> kilohertz, and kilobytes. Now that we measure in mega and
Randall> giga and tera it's hard to worry about a few extra bytes
Randall> per token. But I agree, we should reduce the tokens as
Randall> much as possible in the preprocessor.
Oh good! War stories.
Me too. My first one ran in 40K with no random access storage. It's
kind of interesting that that compiler had fewer restrictions than
many modern compilers. Imposing restrictions based on the resources
available to us would have made the compiler virtually worthless.
I'll agree with your conclusion, of course.
Randall> While we work on designing the parser(s), Tim should
Randall> continue enhancing the preprocessor, nibbling around the
Randall> edges of the language. Each token should be reduced to an
Randall> index into a symbol table. Verbs and keywords can be
Randall> identified as such (perhaps as reserved symbol table
Randall> indexes). The preprocessor could do a lot of token
Randall> manipulation. A OF B could be reduced to a single token.
That could reduce the number of tokens but it would require the text
capacity of a token be sufficient to hold 50 maximum size data-names
plus separators unless you have a compression scheme in mind. I
wouldn't want the preprocessor to have a symbol table but the main
lexer might be able to reduce such things to symbol table references.
Randall> There are a class of tokens I call pseudo-verbs: Period,
Randall> ELSE, WHEN, SIZE ERROR, OVERFLOW, END-OF-PAGE, INVALID
Randall> KEY, EXCEPTION, END, END-x, (that list may not be
Randall> exhaustive). These could be identified and matched to
Randall> their antecedents in an early pass, or at least they could
Randall> be marked for easier digestion by the parser. Perhaps we
Randall> could pass the parser just a single statement at a time.
Randall> Actually, that's a statement fragment, since it would
Randall> already be split at the pseudo-verbs, eg,
IF A = B OR C OR > D
ADD 1 TO Z
ON SIZE ERROR
PERFORM P1
NOT ON SIZE ERROR
PERFORM P2
END-ADD
ELSE
PERFORM P3 VARYING X FROM 1 BY 1 UNTIL X > 10
END-IF
.
Randall> As I've shown this, each line is a statement fragment,
Randall> beginning with a verb or pseudo-verb, except that NOT is
Randall> found in front of the pseudo-verb, as are the optional
Randall> words ON and AT.
>> I mean when you are into paragraph 17 of section 42, why do you
>> still hold raw material from s1p1, especially the keyword
>> minutia? So as you get to certain stages in the iteration,
>> hopefully it will be discernable what got hung on some surviving
>> structure, and what is getting obsolete.
Randall> At each period everything becomes obsolete. At each verb
Randall> or pseudo-verb, everything becomes obsolete except
Randall> matching pseudo-verbs to antecedents.
Randall> Someone suggested multiple levels of parsing. Imagine two
Randall> different parsers, the statement parser and the sentence
Randall> parser. The statement parser would be called once for
Randall> each statement or statement fragment (each line of my
Randall> example). The sentence parser would be called once for
Randall> each sentence (once in my example), but the tokens to the
Randall> sentence parser are the statement fragments.
Do you think this would help? Has anyone ever attempted something
like this? Most grammars (even COBOL grammars) comprise hierarchies
of rules. Is there something to be gained by doing the inferior ones
first? Well, perhaps for errors, but I'm not convinced.
< stuff about segmentation >
Randall> At 09:09 AM 11/21/99 , Michael McKernan wrote:
>> ADD ... SIZE ... SUBTRACT ... SIZE ... PERIOD
>>
>> which appears to be as invalid as the first, but trashes the law
>> of least astonishment, since the PERIOD has traditionally,
>> legitimately, terminated anything that's going on.
Randall> Period terminates anything, but the statement above is
Randall> invalid in COBOL-74, COBOL-85, and COBOL-20XX. SUBTRACT
Randall> with a SIZE ERROR phrase is a conditional statement, and
Randall> the object of the ADD's SIZE ERROR phrase is required to
Randall> be imperative.
That's the astonishing part.
>> I am maintaining that an unmatched END-x should imply an
>> appropriate END-x for any unterminated conditional part that
>> exists when it is encountered. This isn't the letter of the
>> law, but it does not affect correct programs, and is arguably
>> less astonishing than the strict interpretation.
Randall> I agree.
Now there are two of us.
>> I'll say it again. We do not have a floor and ceiling standard.
>> No compiler has ever been denied certification for permissive
>> interpretation.
Randall> I disagree. The only certification there ever was was
Randall> FIPS certification. Some of the FIPS tests verified that
Randall> errors were issued for invalid statements like the
Randall> foregoing.
A little too much hyperbole in the heat of battle. Yes, I remember
some tests like that, but these cases escaped them, or else the '85
compiler that I worked on would not have been certified. It's quite
possible that my statement is true, though, since the audits were an
open book, and no one invited the auditor before having run them a few
hundred times.
Speaking of such things, do you know who owns the audit tests? Is
there any possibility that this group might be able to obtain them?
It's going to be a lot harder when we get closer to the end game if
we don't have something like that.
Randall> Which brings up an important point: No organization is
Randall> currently providing certification for COBOL-85 (or any
Randall> other COBOL) and there is no organization planning
Randall> certification for COBOL-20XX. Perhaps a certifying
Randall> organization will arise, but I won't predict what they
Randall> will or will not test for.
< lots more on unterminated conditional parts >
I don't disagree with any of your comments subsequent to this point,
so I would not be able to add even as much value as I have in the
foregoing.
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.