[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: New bison Grammar available (long)




tej@melbpc.org.au (Tim Josling),
Date:   12/22/99 6:19:34 PM EST

posts that he "... put a new version of the bison grammar for the nucleus and
some extensions at timjosling.homepage.com. "

Tim,  thankyou for your hardwork.


It is notable that you are bringing this much forward during the holiday 
season.  You have accomplished a lot, and your work is valuable.


I have an initial response based upon a partial review of your grammar at cobg
oprs.y .
This is in several parts here, a long response, but hopefully it is worth 
your time.
--------------------------------------------------
Part I
--------------------------------------------------


It may benefit you to realize that %prec tokens do not actually compete
with  _one_another_. So your layout of %prec tokens at the beginning of the
grammar like the following, reflects only a partial understanding:


/* production precedences to force bison to make the correct SR choice */

%nonassoc PREC_LESS_PARENTHESIS

%nonassoc PREC_LESS_END_VERB

%nonassoc PREC_LESS_EXCEPTION

%nonassoc PREC_LESS_ELSE

%nonassoc PREC_LESS_END_IF


These present comments of mine are meant only as constructive criticism.


The relationship amongst these %prec tokens
is almost certainly not important. The %prec
trick in bison allows you to tag the _end_ of one statement that could
possibly reduce; and establish a clear winner in a competition with another
rule which is potentially about to continue by the shift of a token.  The 
interest is in the
relationship between the %prec tag at the end of one statement
and the token in the middle of a competing rule (which token, not having been
shifted yet, is actually in the lookahead position).

When  rule...%prec < lookahead, then shift (you must have a competitor)
When  rule...%prec > lookahead, then reduce (competitor looses out)



Your layout of %prec precedence tokens makes all of them less than all tokens.
That is a bit nonsensical, meant mildly (once you catch the drift of this
you too will laugh). So if you tag anything with any of your %prec tokens,
then everything will try to shift on. If that made sense, all you would need
is

%nonassoc PREC_LESS_SHIFT_ANYTHING_JOB_DONE


and nothing more.

That is not going to work.

So again the idea is to set up an expression that resolves the competition
between the last token of a rule wanting to reduce and the continuation
token of a rule that wants to continue. (you are not setting up competitions
between dummy tokens, that has no meaning).


So lets get into the classic; as you coded:


pr_if_statement_conditional: /* lll; check for next sentence - not allowed if 
end-if */
 tk_if pr_conditional_expression             pr_if_do %prec PREC_LESS_ELSE
|tk_if pr_conditional_expression             pr_if_do tk_else pr_if_do  %prec 
PREC_LESS_END_IF
|tk_if pr_conditional_expression tk_then     pr_if_do  %prec PREC_LESS_ELSE
|tk_if pr_conditional_expression tk_then     pr_if_do tk_else pr_if_do  %prec 
PREC_LESS_END_IF
|tk_if error tk_start_of_statement
|tk_if error pr_if_do  %prec PREC_LESS_ELSE
|tk_if error tk_then pr_if_do %prec PREC_LESS_ELSE
|tk_if error pr_if_do tk_else pr_if_do %prec PREC_LESS_END_IF
|tk_if error tk_then pr_if_do tk_else pr_if_do %prec PREC_LESS_END_IF
;
pr_if_statement_imperative:
 tk_if pr_conditional_expression             pr_if_do tk_end_if
|tk_if pr_conditional_expression tk_then     pr_if_do tk_end_if
|tk_if pr_conditional_expression             pr_if_do tk_else pr_if_do 
tk_end_if
|tk_if pr_conditional_expression tk_then     pr_if_do tk_else pr_if_do 
tk_end_if



In the rule
 | tk_if pr_conditional_expression pr_if_do tk_else pr_if_do
 %prec PREC_LESS_END_IF


The precedence of
 %prec PREC_LESS_END_IF
is less than ELSE (tk_else) !!!!!!!!!!!!!!

Consider a nested situation
  IF-1 cond
    THEN-1
      IF-2 cond
    THEN-2
      block
    ELSE-2
      block
    ELSE-1
      block
  END-IF.


Note that
  %prec PREC_LESS_END_IF
does not reduce on ELSE !!!!!!!!!!
it will keep shifting  !!!!!!!!!!!!!
Pow. Parse error. (Because there is no "IF THEN ELSE _ELSE_" rule).

The above shifts an additional ELSE (named ELSE-1) onto the inner IF-2.
We really want a reduction, upon ELSE-1 of the entire THEN-1 clause (which
encompasses the entire IF-2 statement).

your
 %prec PREC_LESS_ELSE
is functional, but your
 %prec PREC_LESS_END_IF
is dysfunctional


There is a need to gather like tokens, and establish a thorough hierarchy.
Surround the bundled tokens with the %prec LOWER/HIGHER dummy tokens

We got to this subject once before. I said, you should not put your
tokens in alphabetical order. You said in response, you still plan to do
that.  It will not work. You need to study bisons s/r resolution mechanism
a bit closer. Competition between the 'dummy' tokens has no meaning. Your
'dummies' must be interspersed with the logically gathered tokens, to 
establish an actual precedence.  It is the order of token declaration that
determines precedence (later ones are higher). So,

%right THIS
%nonassoc PREC_LESS_THAN_THIS

is _not_ reasonable, for example.



Here is a sketch, using some of your names and some of new ones.
(note the %right association on the tokens we want to sometimes shift on
to certain rules) (This is not presented as a complete solution, more an 
illustration 
of ideas.)



%nonassoc PREC_LESS_VERB
%right    ADD PERFORM IF
%nonassoc PREC_HIGHER_VERB

%nonassoc PREC_LESS_ELSE
%right    ELSE
%nonassoc PREC_HIGHER_ELSE

%nonassoc PREC_LESS_END_xyz
%right    END_THIS END_THAT END_ADD END_IF
%nonassoc PREC_HIGHER_END_xyz

With this arrangement anything tagged
 %prec PREC_LESS_END_xyz
because it is higher than verbs and higher than ELSE
will reduce upon almost any lookahead, except that END-xyzs
can glue on. If you
have no competing rule(s) that would try to glue anything on then this fact
does not matter!  If you have a competing rule that tries to glue, say 
END_ADD onto, say an ADD rule, 
then that will win out with a shift (and the rule
tagged with the %prec attrophies and dies off).


Typically we want most statements to reduce upon a verb as a lookahead token.
This may not require any precedence marking. But some rules we may want to
keep _shifting_, such as conditional clause phrases. These we might mark
with
 %prec PREC_LESS_VERB
An AT END phrase on an I/O verb for example, might want to have the
verbs commence to shift on. But in this case it would only make sense if you
had two competing rules, one with a continuation that has the verb/statement
block (which rule you want to win the competition and thus mark the
_other_ with a lower %prec that will yield to its shift logic), 
and the _other_
without the verb/statement block continuation; which would have to be an
error production intended to capture a blockless conditional clause.

Moving our attention back to the IF-THEN-ELSE challenge,
a basic scenario might be


   tk_if pr_conditional_expression pr_if_do
   %prec PREC_LESS_ELSE
 | tk_if pr_conditional_expression pr_if_do tk_else pr_if_do
   %prec PREC_LESS_END_xyz


competitors

   tk_if pr_conditional_expression pr_if_do tk_end_if
 | tk_if pr_conditional_expression pr_if_do tk_else pr_if_do tk_end_if


The rule marked
   %prec PREC_LESS_ELSE
yields to tk_else as well as tk_end_if.
Yet the rule tagged
   %prec PREC_LESS_END_xyz
will not yield to tk_else, because
 %prec PREC_LESS_END_xyz > precedence of tk_else
by not yielding to the shift it reduces in the context of ELSE as the 
lookahead. (that is the essence of implicit scope termination). A second  
ELSE (ELSE-1 in the example listed above) implicitly terminates an inner 
complete IF-THEN-ELSE that is not  explicitly terminated.

That can only work if
%nonassoc PREC_LESS_END_xyz
is higher in prececence (is defined later in the grammar then)
%right    ELSE

despite the name blah_blah_LESS_blah_blah, the dummy token
%nonassoc PREC_LESS_END_xyz is ! HIGHER ! than ELSE, because if defined later
as I am suggesting, it is just higher in precedence. It is higher, the rule 
is higher, the lookahead is lower; therefore we reduce the rule (tk_else as 
lookahead implicitly terminates
an if-then-else construct).
 
The rule 
 | tk_if pr_conditional_expression pr_if_do tk_else pr_if_do
   %prec PREC_LESS_END_xyz
means reduce me implicitly on anything less than PREC_LESS_END_xyz

I have built redundancy into my sketch (this much is not necessary in 
practice, it is 
only for illustration). There is nothing between the to dummy tokens
  %nonassoc PREC_HIGHER_ELSE
and
  %nonassoc PREC_LESS_END_xyz

So for educational purposes only we could rewrite your rule as
 | tk_if pr_conditional_expression pr_if_do tk_else pr_if_do
   %prec PREC_HIGHER_ELSE

And our explanation would perhaps be devoid of distractions about blah blah 
blah, and just say the dummy token
%nonassoc PREC_HIGHER_ELSE is ! HIGHER ! than ELSE, because if defined later
as I am suggesting, it is just higher in precedence. It is higher, the rule 
is higher, the lookahead is lower; therefore we reduce the rule (tk_else as 
lookahead implicitly terminates
an if-then-else construct).

Now notice that the low precedence verbs do not reduce an if-then-else's else 
clause, they
just glue onto the end ad infinitum. This is because PREC_HIGHER_ELSE is not 
only higher than else, it is higher than all of the verbs.  Do you see the 
point in gathering like tokens? Do you see now that you do not want to list 
tokens in alphabetic order?


It is important to understand that the
   %prec PREC_LESS_END_xyz
does not say glue an END_xyz token onto anything. It simply says this rule
will yield and not try to reduce upon sighting an END-xyz. It is the other
competing rule that says
  ... tk_end_if
meaning shift for such other exact rule.


But before we get back up to such an if_rule, the blocks are recursing,
potentially doing there own END-xyz onto numerous statements internal to
the block (including explicitly terminated IF-THEN-ELSE-END_IF statements).
It is important to know that that happens below this statement. Yet if no 
open scope 
exists in that block recurse the block could reduce and an END_xyz of some 
nature
other than END-IF could implicitly terminate the ELSE clause, also!

Implicit scope terminators can cascade up.


Thus the blocks hanging on the conditional clauses will have to be tagged
with a precedence higher than a conditional clause (to get them to reduce
implicitly) and lower than an END_xyz to shift (to get them to 
yield/attrophy in favor of an explicitly scope terminated alternative)
((Yet within those blocks you want statements of the same verb class to be 
able to see and shift their own set of conditional clauses. That is tough
to start with; transposition makes things even more interesting)). At any
rate designing the alternate conditionals just like the [THEN] ELSE 
clauses is not a bad first approximation.

((As long as we use optimistic rule hierarchies we will not get to 
transposition of alternating 
conditionals, which means we will have a code base issue, and a grammar 
design legacy issue for future GNU COBOL project contributors.)).


--------------------------------------------------
Part II
--------------------------------------------------

But all of this may be going in the wrong direction.


Nesting and stacking Discussion
-----------------------------------------------

A literal correspondence between the physically instantiated topology of
an actual real world deeply nested COBOL program
and the material deployed onto the parser's
stack means that a large amount of material can be on that stack. You are
not at the point where that matters to you. You will be.  It is a freight
train comming down the track. An idealistic hierarchical grammar is
going to tax resources. (Although let me hasten to say I can read fairly
well and I am entirely aware that everytime I say resources, those kind
enough to take the time to respond say "resources do not matter". I am
conviced that resources do matter).

But anyway, just so that you will be able to look back at the fork in
the road and not feel like the world abandoned you when you choose the
path you are on, let me say this.  Your blocks within blocks within blocks;
are going to jam a tremendous amount of material onto the stack with an LR
parser like bison (when confronted with large deeply nested source programs).

I am neither for or against LL, I am neither for or against LR. I just want
to help.  The LR parsers have a stack (that is the point), they allow you to
reduce real late (actually you can do that too in LL if you intend to).
That means in LR that blocks within blocks, etc. keep putting things on the 
stack
before reduction of the enclosing construct.


It took me some time to fully realize this as I have done some
minimum experimentation with a minimum number of COBOL like constructs
with bison, but actually you do not need to hold onto the leading portions
of all the enclosing constructs.  The LL parsers tend to entice coders
to do very early emissions to semantics.  LR does the opposite, things build
up in the LR parsers. It turns out to be greatly useful, IMHO, to recognize
that early emits are fine, even in an LR tool endeavor! But how do you get
there from here.

A normal use of bison would probably produce inverted output to semantics:
such as;
  block begin
    simple statement 1
    simple statement 2
    simple statement 3
    simple statement 4
  block end (you don't know what kind of block yet Mr Semantics).
  ELSE clause reduction
  block begin
    simple statement a
    simple statement b
    simple statement c
    simple statement d
  block end (you don't know what kind of block yet Mr Semantics).
  ELSE clause reduction (surprise, could have been a THEN clause)
  ....
  [THEN] clause reduction
  IF cond reduction
  ....
  [THEN] clause reduction
  IF cond reduction


In other words the top of the hierarchy reduces last, under normal use of
the LR tool. To get a different sequence to semantics, you must either
imbed actions, or break the hierarchy into pieces and not recurse. Imbedded
actions are difficult, as some may trigger precociously; and they create
our nemeses epsilons.

Breaking the hierarchy into pieces is practical, but you must take a lot of
work into manual discharge in action code.  It seems a disappointment,
because the recursive features of the bison scripting language seems so
promising.

(This is a broken record so be offended not if it bugs you; the ambiguity of
transposable alternate conditionals will force us to break up the rule
hierarchy anyway).


When you break up the hierarchy, all pressure is removed from the parser's
stack: no matter what the nesting level of actual source code.


You may not be ready for that message. But when you look back to this fork
in the road, you will see me back here waving; you have not been
abandoned. I am all for you. I believe that much will be done manually in
the end.



So we need a plan. And we need to know why we are planning.

My intial concern was that optimistic hierarchies would tend to callapse
upon unexpected input (ommissions and commissions). Posters tell me not
to worry.

My further concern is that deeply nested programs
will generate considerable stack
push material under LR tool mediation: and LR will incidentally _tend_ to
lead coders to late emissions that do not clarify code blocks to semantics
until very late.  Posters, should they bother to respond, will probably say
that the stack pushes maybe should not cause concern at all.

The invert structure of emissions to semantics are another matter. For here
we are simply meandering into design of the syntax to semantics interface.

If we break up the hierarchy in the grammar, we can enclose block emissions
with wrappers that characterize the block clearly from the outset for
semantics. (That is they can be non inverted).


That can actually be done, if we intend it, with the existing optimistic
hierarchy of rules, sub-rules can emit the wrappers.  But we should not
happen into this. We should design the syntax to semantics interface, and
implement rules that accoumplish it, intentionally.

IMHO, there is a good chance that breaking the rules up will 1) make it much
easier to emit in wrapped-block fashion, and 2) allow clear manual code
management of implicit scope termination. (and since nobody cares except me,
I will reduce to parenthetical note, that the stack will be free for other
purposes than simply nested source code walks).

Breaking up the rules requires a clear plan for the syntax to semantics
interface, and a
clear plan for block-context management in the action code of the parser.


The task of writing the grammar looks very different if you give 
consideration to breaking up the hierarchy. The earliest posts in this 
project's discussions, seem to dismiss out of hand even considering a manual
parser (although it is not ever really assessed, the idea of a manual lexer 
and a manual parser being vaguely superposed). A tool assisted manual parser 
is achievable.

An important insight is gained when you realize the two part fact that an 
LR, like bison, holds onto a lot of stuff for quite a while, and that this
is not at all necessary (it is really a side effect of the use of a stack in 
an LR tool). Furthermore, early reduction or atleast early emissions to 
semantics could be advantageous. But we should do that by design.

I feel that we should list out what the interface from syntax to semantics 
is, before going too much further with the grammar.


--------------------------------------------------
Part III
--------------------------------------------------

This is a first, and most generalized, approximation of the syntax to 
semantics interface for the GNU COBOL compiler.  The interface is a well 
formed document. It is a sequence of !ENTITIES marked clearly and explictly 
with scope originators and scope terminators, and typically with content 
inbetween. The content of !ENTITIES may be other !ENTITIES which may also 
have imbedded !ENTITIES, to an unlimited depth.  (OSF may publish working 
limits as advisories).

The parser enforces grammar rules and implements support for implicit and 
explicit scope origination and termination. The interface from syntax to 
semantics completely hides all implicit scope aspects.

Semantics has no need to know about anything implicit.

As a first guideline, all construct representations are emitted to semantics 
top side up. That is, the leading portion of statements are reduced, as much 
as possible and packaged for semantics, prior to any subordinated blocks of 
statements.



With this sketch we might be able to continue work on an optimistic grammar 
hierarchy and
discuss means and trade-offs in breaking up the hierarchy and manually 
controlling block context.  A manual lexer is probably considered rightly out 
of reasonable bounds, but a hybrid tool assisted manual parser may well be 
useful if not outright necessitated.

--------------------------------------------------
Final note with emphasis!
--------------------------------------------------

Thanks again for your hard work, Tim.


Merry Christmas,
Bob Rayhawk
RKRayhawk@aol.com


--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.