[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gnubol: How do we parse this language, anyway?
I've gotten behind in my email, and if I wait until I catch up, I'll never
catch up. Here are my responses to several messages about parsing (mostly
from Bob).
At 10:53 AM 11/20/99 , RKRayhawk@aol.com wrote:
>The supposed retained buffers are not going to be retained in a real shared
>system; rather than us doing the I/O, the OS will be paging. That means that
>we will loose control of performance. Strategies that hold vast token lists
>in core exacerbate this. Again the idea is that real systems are shared; in
>development environments the sharees might each be using our tool which is
>assuming retained buffers and holding vast token lists. A real regen of a
>real COBOL application is going to thrash on the virtual memory device. We
>will not be able to get to the problem. Much of that would not be necessary.
I started programming in a world where we measured in kilohertz, and
kilobytes. Now that we measure in mega and giga and tera it's hard to
worry about a few extra bytes per token. But I agree, we should reduce the
tokens as much as possible in the preprocessor.
While we work on designing the parser(s), Tim should continue enhancing the
preprocessor, nibbling around the edges of the language. Each token should
be reduced to an index into a symbol table. Verbs and keywords can be
identified as such (perhaps as reserved symbol table indexes). The
preprocessor could do a lot of token manipulation. A OF B could be reduced
to a single token.
There are a class of tokens I call pseudo-verbs: Period, ELSE, WHEN, SIZE
ERROR, OVERFLOW, END-OF-PAGE, INVALID KEY, EXCEPTION, END, END-x, (that
list may not be exhaustive). These could be identified and matched to
their antecedents in an early pass, or at least they could be marked for
easier digestion by the parser. Perhaps we could pass the parser just a
single statement at a time. Actually, that's a statement fragment, since
it would already be split at the pseudo-verbs, eg,
IF A = B OR C OR > D
ADD 1 TO Z
ON SIZE ERROR
PERFORM P1
NOT ON SIZE ERROR
PERFORM P2
END-ADD
ELSE
PERFORM P3 VARYING X FROM 1 BY 1 UNTIL X > 10
END-IF
.
As I've shown this, each line is a statement fragment, beginning with a
verb or pseudo-verb, except that NOT is found in front of the pseudo-verb,
as are the optional words ON and AT.
>I mean when you
>are into paragraph 17 of section 42, why do you still hold raw material from
>s1p1, especially the keyword minutia? So as you get to certain stages in the
>iteration, hopefully it will be discernable what got hung on some surviving
>structure, and what is getting obsolete.
At each period everything becomes obsolete. At each verb or pseudo-verb,
everything becomes obsolete except matching pseudo-verbs to antecedents.
Someone suggested multiple levels of parsing. Imagine two different
parsers, the statement parser and the sentence parser. The statement
parser would be called once for each statement or statement fragment (each
line of my example). The sentence parser would be called once for each
sentence (once in my example), but the tokens to the sentence parser are
the statement fragments.
At 12:47 PM 11/20/99 , RKRayhawk@aol.com wrote:
><< segmentation (not to be done), >>
>
>I would encourage syntax that tolerates some of the segmentation markers. It
>makes little sense in memory rich environs.
Segmentation is obsolete. It is adequate to accept the syntax and ignore
it. There's some nonsense in COBOL-85 about initial state versus last used
state, but there's some note saying this doesn't apply to PERFORM ranges,
so it only applies to ALTER. TTBOMK, there was never a test for this. Do
we really care about when ALTER in an overlaid segment is reinitialized?
At 09:09 AM 11/21/99 , Michael McKernan wrote:
> ADD ... SIZE ... SUBTRACT ... SIZE ... PERIOD
>
>which appears to be as invalid as the first, but trashes the law of
>least astonishment, since the PERIOD has traditionally, legitimately,
>terminated anything that's going on.
Period terminates anything, but the statement above is invalid in COBOL-74,
COBOL-85, and COBOL-20XX. SUBTRACT with a SIZE ERROR phrase is a
conditional statement, and the object of the ADD's SIZE ERROR phrase is
required to be imperative.
>I am maintaining
>that an unmatched END-x should imply an appropriate END-x for any
>unterminated conditional part that exists when it is encountered.
>This isn't the letter of the law, but it does not affect correct
>programs, and is arguably less astonishing than the strict
>interpretation.
I agree.
>I'll say it again. We do not have a floor and
>ceiling standard. No compiler has ever been denied certification
>for permissive interpretation.
I disagree. The only certification there ever was was FIPS
certification. Some of the FIPS tests verified that errors were issued for
invalid statements like the foregoing.
Which brings up an important point: No organization is currently providing
certification for COBOL-85 (or any other COBOL) and there is no
organization planning certification for COBOL-20XX. Perhaps a certifying
organization will arise, but I won't predict what they will or will not
test for.
At 11:36 AM 11/23/99 , RKRayhawk@aol.com wrote:
>But your actual comments seem to imply that you would
>associate the second ON SIZE clause with the inner arithmetic statment when
>it is not explicitly scope terminated. Please do not do that. That would not
>be an astonishment, but many would consider it a compiler error. Statements
>like
>
> ADD ... SIZE ... SUBTRACT ... NOT SIZE ... END-ADD
>
>would commence to behave differently when migrated to a platforms hosted by
>such a compiler.
From CD 1.7, p 401, 14.6.4.1 Scope of Statements:
>When statements are nested within other statements that allow optional
>conditional phrases, any optional conditional phrase encountered is
>considered to be the next phrase of the nearest preceding unterminated
>statement with which that phrase is permitted to be associated according
>to the general format and the syntax rules for that statement, but with
>which no such phrase has already been associated.
According to this, the NOT SIZE phrase must be associated with the
SUBTRACT. But then consider
ADD ... SIZE ... SUBTRACT ... NOT SIZE ... NOT SIZE ... END-ADD
As I read 14.6.4.1, the second NOT SIZE is associated with the ADD, the
nearest preceding statement where no NOT SIZE phrase has been
associated. This is where the legalisms come into play, because the
SUBTRACT is still conditional, thus this code is invalid.
At 02:49 PM 11/23/99 , RKRayhawk@aol.com wrote:
> IF a > b
> display ...
> add a to b
> on size error
> ADD 1 to error-count
> ELSE
> ....
>
>that [THEN] clause block has what I think is in fact called an imperative ADD
>statement (the add a to b).
No, the ADD is conditional. Implicit scope termination doesn't make a
statement imperative.
> IF a > b
> display ...
> add a to b
> on size error
> ADD 1 to error-count
> NOT on size error
> ADD 1 to we-actually-can-add-count
> ELSE
> ....
In this case, the NOT SIZE is attached to the second ADD. The first ADD is
invalid, because the second ADD is conditional. But let's modify it:
IF a > b
display ...
add a to b
on size error
ADD 1 to error-count
NOT on size error
PERFORM DO-THAT
END-ADD
ELSE
This will compile, but it won't do what the programmer expects.
>Notice in these nested IF and EVALUATES, that blocks of imperatives become an
>'imperative statement', :-) But notice that other arithemtic statements do
>not terminate the scope of a currenly running conditional clause on an
>aritmetic statement (or I/O statement), further MOVE does not, and PERFORM
>does not. BUT ELSE (or a WHEN clause if we are in an EVALUATE) does terminate
>ongoing scope.
WHEN pairs to either EVALUATE or SEARCH. However, only IF allows
statements (as opposed to imperative statements), so only ELSE, END-IF, and
period can implicitly terminate scope (despite 14.6.4.1). If we allow
implicit termination of imperative statements, then WHEN terminates scope,
but so do all the others (SIZE ERROR can terminate READ, EXCEPTION can
terminate ADD, etc).
--
RB |\ Randall Bart
aa |/ Barticus@usa.net 818-985-3259 Barticus@att.net
nr |\ 8321 Burnet Av #1, North Hills, CA 91343
dt ||\
a |/ Y2K website: http://users.aol.com/PanicYr00
l |\
l |/ DOT-HS-808-065 I Love You MS^7=6/28/107
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.