[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gnubol: Hacks needed to parse COBOL

To: gnu-cobol@lusars.net
Subject: gnubol: Hacks needed to parse COBOL
From: Tim Josling <TIMJOSLING@prodigy.net>
Date: Thu, 02 Dec 1999 06:13:52 +1000
Delivered-To: gnu-cobol-outgoing@wallace.lusars.net
References: <199912011347.IAA08106@tivoli.mv.com> <3.0.3.32.19991201130804.0069c97c@tiac.net>
Reply-To: gnu-cobol@lusars.net
Sender: owner-gnu-cobol@wallace.lusars.net

You are right I think as I suggested. But the amount and ugliness of the hacks
needed is very high, and they involve a lot of scanning ahead. I would like to
have confirmation that in an expression list, this is the way you are meant to
know when an expression ends.

In the draft COBOL 4 standard:

01  a.
01  b.
    02  c occurs 5 pic 9.

move 1 to c (a (1)).

What is the correct error message here? "Too many array index expressions for c"?
Or  "a is not an array and should not have array index expression"?

The ugly hacks I found I need in the Nucleus follow.

Tim Josling

-------------------

/* problem 1.
   - need two token lookahead to distinguish paragraph from section
  solution 1.
  - change tokens in prescan as follows.
*/

/* from tk_generic_name, when followed by "section" */

%token tk_M_generic_M_section_M_name

/* from tk_generic_number_integer, when followed by "section" */

%token tk_M_generic_M_section_M_number_integer

/* problem 2.
   - after a qualified name, when hitting a "(" you can't tell if it is an array
reference or a reference modification
  solution 2.
  - change tokens in prescan as follows.
  - this solution also included in problem 7's solution
*/

/* from tk_generic_name, when followed by a reference modification but no array
*/

%token tk_generic_name_M_M_refmod_M

/* from tk_generic_name, when followed by an array then a reference modification
*/

%token tk_generic_name_M_M_array_refmod_M

/* problem 3.
   - you need infinite lookahead to determine whether something is an imperative
statement
   solution 3.
   - scan ahead for matching end-verb and adjust verb token to magic value to
fake the lookahead
*/

%token tk_add_M_M_has_end_M
%token tk_compute_M_M_has_end_M
%token tk_divide_M_M_has_end_M
%token tk_if_M_M_has_end_M
%token tk_multiply_M_M_has_end_M
%token tk_search_M_M_has_end_M
%token tk_string_M_M_has_end_M
%token tk_subtract_M_M_has_end_M
%token tk_unstring_M_M_has_end_M

/* problem 4.
   - you need to know that there is a giving coming to know whether a literal is
valid after the to
   and whether rounded is permitted
   solution 4.
   - scan ahead for matching giving before the next verb
   - this is combined with problem 3 solution
*/

%token tk_add_M_M_has_giving_M
%token tk_add_M_M_has_end_has_giving_M

%token tk_divide_M_M_has_giving_M
%token tk_divide_M_M_has_end_has_giving_M

/* problem 5.
   - you need to know if something is a condition name or
   an identifier in a number of contexts
   solution 5.
   - look up the type in the symbol table and change the token type
*/

%token tk_generic_name_M_M_condition_M

/* problem 6.
   - you need to know if something is a switch name or
   an identifier within a simple condition
   solution 6.
   - look up the type in the symbol table and change the token type
*/

%token tk_M_generic_M_switch_M_name

/* problem 7.
   - you need to know if something is followed by class_condition
   solution 7.
   - scan backward from class condition
   - change first prior "is" or "not" to magic token until something else found
*/

%token tk_is_M_M_class_condition_follows_M
%token tk_not_M_M_class_condition_follows_M

/* problem 8.
   - you need to know if an arithmetic expression is followed by a sign condition

   solution 8.
   - scan backward from sign condition
   - change first prior "is" or "not" to magic token
*/

%token tk_is_M_M_sign_condition_follows_M
%token tk_not_M_M_sign_condition_follows_M

/* problem 9.

   - shift reduce conflict on else in conditional if statement ie the classic
dangling else problem

   - same issue on sundry nested conditionals (eg add size error
   subtract size error). These are in fact errors (which need to be
   manually flagged but the grammar used appears to be needed to force
   the correct grouping of non-conditional forms of add and the like
   verbs.

   - parser resolves this correctly as shift. %left didn't help so allow the
conflict

*/

/* nothing to do here */

/* problem 10.
   - you need to know if something is a symbolic character name or
   an identifier to verify if 'all' is valid and to recognise that
   a of b is invalid in this context

   solution 10.
   - look up the type in the symbol table and change the token type
*/

%token tk_M_generic_name_M_symbolic_character_name_M


/* problem 11.
   - you need infinite lookahead to determine whether you are expecting a
procedure name or an expression after 'perform'.
     You could look up the type in the symbol table but what about forward
references etc.

   solution 11.
   - scan ahead for matching end-verb and adjust verb token to magic value to
fake the lookahead
*/

%token tk_perform_M_M_has_end_M

/* problem 12.
   - two different repeating constructs in the inspect verb start with
'identifier' creating an ambiguity
   when there is an identifier after 'before/after initial literal/identifier'.

   solution 12.
   - scan ahead within inspect verb and change the tk_generic name token to a
magic name
   if it is followed by 'for'
   (with possible intervening of/in tk_generic_name... and array reference)

*/

%token tk_M_generic_name_M_has_for_following_in_inspect_M

/* problem 13.
   - there is no natural delimiter in COBOL to hand panic mode error recovery
from, like ';' '}' ')' in C

   solution 13.
   - add in a magic token
   before any verb not preceeded by a fullstop;
   before any fullstop

*/

%token tk_magic_eos

/* problem 14.

   - it is hard to do good error recovery for if/else without approximately 514
RR
   conflicts. Need to know if there is an else coming up that's mine.

   solution 14.
   - adjust the if token
   if it is *not* followed by a matching end-if but it is followed by an else
change the token

*/

%token tk_if_M_M_has_else_M

"J & C Migrations, Pty." wrote:

>
> Bottom line: COBOL parsing requires context sensitivity and frequent symbol
> table lookups.


--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.

Follow-Ups:
- Re: gnubol: Hacks needed to parse COBOL
  - From: Randall Bart <Barticus@att.net>
- Re: gnubol: Hacks needed to parse COBOL
  - From: "J & C Migrations, Pty." <migrate@tiac.net>

References:
- Re: gnubol: refmod again (fwd)
  - From: Michael McKernan <mck@tivoli.mv.com>
- Re: gnubol: refmod again (fwd)
  - From: "J & C Migrations, Pty." <migrate@tiac.net>

Prev by Date: Re: gnubol: refmod again (fwd)
Next by Date: Re: gnubol: Hacks needed to parse COBOL
Prev by thread: Re: gnubol: refmod again (fwd)
Next by thread: Re: gnubol: Hacks needed to parse COBOL
Index(es):
- Date
- Thread