[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gnubol: New bison Grammar available (long)



I have put a new version of the bison grammar for the nucleus and
some extensions at timjosling.homepage.com. This includes the
extra code to work around the limitations of bison. 

It compiles cleanly but has not been tested at all. There are
lots of lll; (later) annotations where things need to be done. I
have included a list of the hacks below.

There has been no effort to pretty up the error messages at this
stage. The messages will be along the lines of 'syntax error,
expecting xxx ...'.

The grammar is in file cobgoprs.y and the hacks are in
cobgopru.c. Any comments or fixes welcome.

***

Thoughts on the structure of the parse tree... any comments
welcome.

The parser should generate a parse tree that is as simple as
possible and the symbol table containing all the names. All work
after that is on the parse tree, including filling in all the
details in the symbol table and the consistency checks. 

Example - working-storage ( I am assuming COBOL is the lingua
franca of the list).

01 working-storage-header.
  02  magic usage binary-long value 1.
  02  address-of-main-token usage pointer.
  02  count-of-top-level-data-items usage binary-long.
  02  address-of-first-top-level-data-item usage pointer.

01  data-item.
  02  magic usage binary-long value 2.
  02  address-of-main-token usage pointer.
  02  count-of-subordinate-items usage binary-long.
  02  addresss-of-first-subordinate-item usage pointer.
  02  count-of-subordinate-items usage binary-long.
  02  address-of-owning-item usage pointer. 
  02  usage-value usage binary-char.
  02  flags.
      03  has-occurs pic x.
      03  has-occurs-depending pic x.
      03  has-redefines pic x.
      03  has-usage pic x.
      etc... 
  02  min-occurs usage binary-long.
  02  max-occurs usage binary-long.
  02  address-of-pic usage pointer.
  02  length-of-pic usage binary-long.
      etc...

All fields are present, just zero/null if not used.

Example - procedure division.


01  verb.
  02  magic usage binary-long value 3.
  02  verb-id usage binary-char.
  02  verb-main-token-address usage pointer.
  02  verb-details usage pointer.

01  verb-details-display.
  02  magic usage binary-long value 4.
  02  count-of-things-to-display.
  02  address-first-thing-to-display.
  02  address-of-where-to-display. *> token address

01  data-item-used.
  02  magic usage binary-long value 5.
  02  literal-flag pic x.
  02  data-item-address usage pointer. *> points to data item
above or token of literal.

etc...

***

Compiler subset. The subset of the language that can be used in
the runtime of the compiler and in compiling the rest of the
language will be a limited subset.

The compiler will be built in two phases. One, build a compiler
that can handle the limited subset. Then build the full compiler.
The full compiler can include within itself the subset of cobol
that the limited subset supports - both the compiler itself and
the runtime.

The subset will have access to all the C runtime (including
memory allocation, file IO, formatting, string, date and time). 

What should be in the subset?

I am thinking:

Only the Nucleus, plus support for functions (required to
interface to C), plus some new data types (pointers, binary-xxx)
from the new draft standard, and parts of interprogram
communication, and ability to create and call functions.

Excluding lots of things:

IDENTIFICATION DIVISION:

AUTHOR - comments can do this
INSTALLATION - comments
DATE-WRITTEN - comments
DATE-COMPILED - comments
SECURITY - comments

So you can have program-id/function-id and the repository
paragraph from cd1.7.

ENVIRONMENT DIVISION:

SOURCE-COMPUTER - obsolete (no debugging mode)
OBJECT-COMPUTER - obsolete
SPECIAL-NAMES - luxury

So get basically nothing here.

DATA DIVISION:

PIC - only pic x(nnn) or pic x. All numerics are via binary-xxx
USAGE - only display plus binary-char/short/long/double plus
pointer
SIGN IS
SYNCHRONIZED
JUSTIFIED
BLANK WHEN ZERO
RENAMES (66 level)
condition names (88 level) - syntactic sugar only.
OCCURS is allowed but not ascending/descending key. Occurs
depending can have occurs from n to 0 where the zero means 'no
limit'.

So you get linkage and working storage, structures, redefines,
pointers binary numbers and alphanumeric data (pic x), and
occurs.

Procedure division:

ACCEPT - OK but  "from" phrase not allowed.
ADD, DIVIDE, SUBTRACT, MULTIPLY - use compute
ALTER - obsolete
COMPUTE - OK but only one receiving item allowed, no rounding, no
size error.
(CONTINUE - allowed)
DISPLAY - OK but "upon" phrase not allowed
ENTER - obsolete
EVALUATE - not allowed
EXIT - has no effect, not allowed
GO TO - OK but go to without procedure name not allowed.
IF - OK but next sentence not allowed
INITIALIZE - not allowed
INSPECT - not allowed
MOVE - OK but corresponding phrase not allowed.
(PERFORM - allowed)
SEARCH - not allowed
SET - not allowed
STOP - OK but stop literal not allowed
STRING - not allowed
UNSTRING - not allowed
USE - not allowed

FUNCTION - ability to define and call functions, but the COBOL
instrinsic functions will not be there.

So you have limited forms of

accept, compute, display, go to, if, move, stop.

Also from inter-program communication you would have 

Procedure division using/returning but no by reference/content
CALL - OK but no on overflow or on exception
CANCEL - not allowed
GLOBAL/EXTERNAL phrases - not allowed
EXIT PROGRAM - allowed (no 'goback')
linkage section.

Is this enough? I stress, not for the final product, but to
enable useful work to be done as part of the compiler project.
Tell me now about 'must haves'.

Tim Josling

LIST OF HACKS
*************

/* 

   Problem 1.

   In certain cases, such as function parameter lists, you need
to determine whether an identifier can have an array. 

   If so, the next parenthesis starts an array reference

   If not, the next parenthesis starts a new expression 

*/

/*
  from tk_generic_name, when followable by an array 
  lll; require function names placed in the symbol table to do
the same once I am parsing them

*/

%token tk_M_generic_name_M_name_that_can_have_array_M
%token tk_M_function_name_M_function_that_can_have_arguments_M

/* 

   problem 2.
   
   - after a qualified name, when hitting a "(" you can't tell if
it
   is an array reference or a reference modification or a new
   expression (inside a function argument list)
   
  solution 2. 
  - change tokens in prescan as follows.

/* from tk_left_parenthesis */

%token tk_left_parenthesis_M_M_starts_refmod_M

/* problem 3 
   - you need to know if something is a condition name or 
   an identifier in a number of contexts
   solution 3.
   - look up the type in the symbol table and change the token
type
*/

%token tk_M_generic_name_M_condition_name_M

/* problem 4. 
   - you need to know if something is a switch name or 
   an identifier within a simple condition
   solution 4.
   - look up the type in the symbol table and change the token
type
*/

%token tk_M_generic_M_switch_M_name

/* problem 5.  
   - you need to know if something is a symbolic character name
or 
   an identifier to verify if 'all' is valid and to recognise
that 
   a of b is invalid in this context

   solution 5. 
   - look up the type in the symbol table and change the token
type
*/

%token tk_M_generic_name_M_symbolic_character_name_M


/* problem 6. 
   - you need to know if 'is' or 'not' is followed by
class_condition within conditional expressions.

   - otherwise the parser does not know whether to reduce or not.
Even
   if you managed to get to the actual name, it is not possible
to
   disambiguate abbreviated relational conditions etc.

   solution 6.
   - scan backward from class condition
   - change prior "is" or "not" to magic token until something
else found */

%token tk_is_M_M_class_condition_follows_M
%token tk_not_M_M_class_condition_follows_M

/* problem 7. 
   - you need to know if an arithmetic expression is followed by
a sign condition
   solution 7.
   - scan backward from sign condition
   - change prior "is" or "not" to magic token
*/

%token tk_is_M_M_sign_condition_follows_M
%token tk_not_M_M_sign_condition_follows_M

/* problem 8.  

   - 'not' or 'on' following a verb can be '[not] [on] size' or
'[not] [on] overflow' or others (ouside nucleus)

   solution 8.
   - adjust the not and on token - whichever is first - to tell
the parser whether to shift or reduce the current verb

*/


/*
  these tokens live in the appropriate place for precedence 

  %token tk_on_M_M_has_size_M
  %token tk_not_M_M_has_size_M
  
  %token tk_on_M_M_has_overflow_M
  %token tk_not_M_M_has_overflow_M
  
*/

/* problem 9. 
   - two different repeating constructs in the inspect verb start
with 'identifier' creating an ambiguity
   when there is an identifier after 'before/after initial
literal/identifier'.

   solution 9.
   - scan ahead within inspect verb and change the tk_generic
name token to a magic name 
   if it is followed by 'for'
   (with possible intervening of/in tk_generic_name... and array
reference)

*/

%token tk_M_generic_name_M_has_for_following_in_inspect_M

/* problem 10.  
   - it is hard to distinguish the two forms of perform (inline
and outline) syntactically

   solution 10. 
   - check if the token after perform is the start of an
identifier in the data division; if so, it must be the outline
perform
   otherwise, it needs an end-perform
*/

%token tk_perform_M_M_that_requires_end_verb_M

/* problem 11. 
   - there is no natural delimiter in COBOL to handle panic mode
error recovery from, like ';' '}' ')' in C

   solution 11
   - add in a magic token before any verb

*/

%token tk_start_of_statement

--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.