[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gnubol: How do we parse this language, anyway?



I've gotten behind in my email, and if I wait until I catch up, I'll never 
catch up.  Here are my responses to several messages about parsing (mostly 
from Bob).

At 10:53 AM 11/20/99 , RKRayhawk@aol.com wrote:

>The supposed retained buffers are not going to be retained in a real shared
>system; rather than us doing the I/O, the OS will be paging. That means that
>we will loose control of performance. Strategies that hold vast token lists
>in core exacerbate this.  Again the idea is that real systems are shared; in
>development environments the sharees might each be using our tool which is
>assuming retained buffers and holding vast token lists. A real regen of a
>real COBOL application is going to thrash on the virtual memory device. We
>will not be able to get to the problem. Much of that would not be necessary.

I started programming in a world where we measured in kilohertz, and 
kilobytes.  Now that we measure in mega and giga and tera it's hard to 
worry about a few extra bytes per token.  But I agree, we should reduce the 
tokens as much as possible in the preprocessor.

While we work on designing the parser(s), Tim should continue enhancing the 
preprocessor, nibbling around the edges of the language.  Each token should 
be reduced to an index into a symbol table.  Verbs and keywords can be 
identified as such (perhaps as reserved symbol table indexes).  The 
preprocessor could do a lot of token manipulation.  A OF B could be reduced 
to a single token.

There are a class of tokens I call pseudo-verbs:  Period, ELSE, WHEN, SIZE 
ERROR, OVERFLOW, END-OF-PAGE, INVALID KEY, EXCEPTION, END, END-x, (that 
list may not be exhaustive).  These could be identified and matched to 
their antecedents in an early pass, or at least they could be marked for 
easier digestion by the parser.  Perhaps we could pass the parser just a 
single statement at a time.  Actually, that's a statement fragment, since 
it would already be split at the pseudo-verbs, eg,

     IF  A = B OR C OR > D
         ADD 1 TO Z
             ON SIZE ERROR
                 PERFORM P1
             NOT ON SIZE ERROR
                 PERFORM P2
         END-ADD
     ELSE
         PERFORM P3 VARYING X FROM 1 BY 1 UNTIL X > 10
     END-IF
     .

As I've shown this, each line is a statement fragment, beginning with a 
verb or pseudo-verb, except that NOT is found in front of the pseudo-verb, 
as are the optional words ON and AT.

>I mean when you
>are into paragraph 17 of section 42, why do you still hold raw material from
>s1p1, especially the keyword minutia? So as you get to certain stages in the
>iteration, hopefully it will be discernable what got hung on some surviving
>structure, and what is getting obsolete.

At each period everything becomes obsolete.  At each verb or pseudo-verb, 
everything becomes obsolete except matching pseudo-verbs to antecedents.

Someone suggested multiple levels of parsing.  Imagine two different 
parsers, the statement parser and the sentence parser.  The statement 
parser would be called once for each statement or statement fragment (each 
line of my example).  The sentence parser would be called once for each 
sentence (once in my example), but the tokens to the sentence parser are 
the statement fragments.

At 12:47 PM 11/20/99 , RKRayhawk@aol.com wrote:
><<  segmentation (not to be done), >>
>
>I would encourage syntax that tolerates some of the segmentation markers. It
>makes little sense in memory rich environs.

Segmentation is obsolete.  It is adequate to accept the syntax and ignore 
it.  There's some nonsense in COBOL-85 about initial state versus last used 
state, but there's some note saying this doesn't apply to PERFORM ranges, 
so it only applies to ALTER.  TTBOMK, there was never a test for this.  Do 
we really care about when ALTER in an overlaid segment is reinitialized?

At 09:09 AM 11/21/99 , Michael McKernan wrote:

>     ADD ... SIZE ... SUBTRACT ... SIZE ... PERIOD
>
>which appears to be as invalid as the first, but trashes the law of
>least astonishment, since the PERIOD has traditionally, legitimately,
>terminated anything that's going on.

Period terminates anything, but the statement above is invalid in COBOL-74, 
COBOL-85, and COBOL-20XX.  SUBTRACT with a SIZE ERROR phrase is a 
conditional statement, and the object of the ADD's SIZE ERROR phrase is 
required to be imperative.

>I am maintaining
>that an unmatched END-x should imply an appropriate END-x for any
>unterminated conditional part that exists when it is encountered.
>This isn't the letter of the law, but it does not affect correct
>programs, and is arguably less astonishing than the strict
>interpretation.

I agree.

>I'll say it again.  We do not have a floor and
>ceiling standard.  No compiler has ever been denied certification
>for permissive interpretation.

I disagree.  The only certification there ever was was FIPS 
certification.  Some of the FIPS tests verified that errors were issued for 
invalid statements like the foregoing.

Which brings up an important point:  No organization is currently providing 
certification for COBOL-85 (or any other COBOL) and there is no 
organization planning certification for COBOL-20XX.  Perhaps a certifying 
organization will arise, but I won't predict what they will or will not 
test for.

At 11:36 AM 11/23/99 , RKRayhawk@aol.com wrote:

>But your actual comments seem to imply that you would
>associate the second ON SIZE clause with the inner arithmetic statment when
>it is not explicitly scope terminated. Please do not do that. That would not
>be an astonishment, but many would consider it a compiler error. Statements
>like
>
>      ADD ... SIZE ... SUBTRACT ... NOT SIZE ... END-ADD
>
>would commence to behave differently when migrated to a platforms hosted by
>such a compiler.

 From CD 1.7, p 401, 14.6.4.1 Scope of Statements:

>When statements are nested within other statements that allow optional 
>conditional phrases, any optional conditional phrase encountered is 
>considered to be the next phrase of the nearest preceding unterminated 
>statement with which that phrase is permitted to be associated according 
>to the general format and the syntax rules for that statement, but with 
>which no such phrase has already been associated.

According to this, the NOT SIZE phrase must be associated with the 
SUBTRACT.  But then consider

     ADD ... SIZE ... SUBTRACT ... NOT SIZE ... NOT SIZE ... END-ADD

As I read 14.6.4.1, the second NOT SIZE is associated with the ADD, the 
nearest preceding statement where no NOT SIZE phrase has been 
associated.  This is where the legalisms come into play, because the 
SUBTRACT is still conditional, thus this code is invalid.

At 02:49 PM 11/23/99 , RKRayhawk@aol.com wrote:

>  IF a > b
>      display ...
>      add a to b
>             on size error
>                   ADD 1 to error-count
>  ELSE
>  ....
>
>that [THEN] clause block has what I think is in fact called an imperative ADD
>statement (the add a to b).

No, the ADD is conditional.  Implicit scope termination doesn't make a 
statement imperative.

>  IF a > b
>      display ...
>      add a to b
>             on size error
>                   ADD 1 to error-count
>             NOT on size error
>                   ADD 1 to we-actually-can-add-count
>   ELSE
>  ....

In this case, the NOT SIZE is attached to the second ADD.  The first ADD is 
invalid, because the second ADD is conditional.  But let's modify it:

IF a > b
    display ...
    add a to b
        on size error
            ADD 1 to error-count
        NOT on size error
            PERFORM DO-THAT
     END-ADD
ELSE

This will compile, but it won't do what the programmer expects.

>Notice in these nested IF and EVALUATES, that blocks of imperatives become an
>'imperative statement', :-) But notice that other arithemtic statements do
>not terminate the scope of a currenly running conditional clause on an
>aritmetic statement (or I/O statement), further MOVE does not, and PERFORM
>does not. BUT ELSE (or a WHEN clause if we are in an EVALUATE) does terminate
>ongoing scope.

WHEN pairs to either EVALUATE or SEARCH.  However, only IF allows 
statements (as opposed to imperative statements), so only ELSE, END-IF, and 
period can implicitly terminate scope (despite 14.6.4.1).  If we allow 
implicit termination of imperative statements, then WHEN terminates scope, 
but so do all the others (SIZE ERROR can terminate READ, EXCEPTION can 
terminate ADD, etc).

--
RB |\  Randall Bart
aa |/  Barticus@usa.net 818-985-3259 Barticus@att.net
nr |\  8321 Burnet Av #1, North Hills, CA 91343
dt ||\
a   |/ Y2K website:    http://users.aol.com/PanicYr00
l   |\
l   |/ DOT-HS-808-065     I Love You    MS^7=6/28/107

--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.