[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: problem 2.



RKRayhawk@aol.com wrote:
> 
> 
>     problem 2.
> 
>     - after a qualified name, when hitting a "(" you can't tell if
>  it
>     is an array reference or a reference modification or a new
>     expression (inside a function argument list)
> 
>    solution 2.
>    - change tokens in prescan as follows.
> 
>  /* from tk_left_parenthesis */
> 
>  %token tk_left_parenthesis_M_M_starts_refmod_M
>   >>
> 
> A form like
>     data_name
> 
> with no following parenthetical form must be easy to detect.
> 
> A form like
>    data_name ( something )
> 
> also is easy to detect, and should be easy to distinguish from
>     data_name ( something something )
> 
> With prudent rules and precedence we should be able to keep this distinct from
>   data_name ( something :  )
> and
>   data_name ( something : something )
> 

Why Oh why did they make commas and semicolons noise words. A lot
of the problem stems from this. 

What could be simpler than 
dataname (x)?

In various contexts it could be an indexed data item, or two
expressions such as within a function call paremeter list. You
have to look up the data item and see if it can have an array and
if so, you have an array reference. Otherwise it had better be in
a context where two expresions can follow one another.

Because the syntax is ambiguous, it is not clear which error
message to give (invalid array ref, or you may not start a new
expression here).

It is not easy to distinguish this from 
a (b c) 

as you suggest because the end of 'b' depends on parsing an
expression - b must start when the expression runs out of gas (in
a function invocation, but also in array references although
there the expressions are simpler). 

The standards committee tends to create grammars that are not
quite ambiguous. But of course when the programmer makes a
mistake you have to guess what he was up to. 

Here is an example

a (b + c)

This could be 
two expressions

b
+ c (with a unary +)

or one

b + c

So they have a special little rule that says in this case it is
one expression.

> 
> So assume that every data_ref is switched to a data_list in your grammar.

I just don't feel comfortable with this at the moment; I want to
mininise the number of places where I do lexer feedback hacks.

> Hey, why not just always look for a list?

The problem here is that making the grammar more lenient
generally creates conflicts, and I am not sure what you gain in
this case. The programmer has made a mistake, you tell him about
it, and he fixes it.

If the programmer left out an of/in this would create the
impression of two data items. But I don't remember ever having
done this myself. I don't see putting a list where none is
allowed is a common syndrome, like say misspelling a variable
name.

> (FUNCTION X (A (B))) You must apply the rules
> associated
> with the external definition of the function's signature to know if the
> source code is right.

Let's say X takes two numeric parameters, but A is an array of
numerics. The error is ambiguous. It could be 'a should have
array index specified' or 'X needs two parameters'. Now if B is
not a valid array index expression, it is even worse. It should
be clear, the programmer believes, that he forgot the array
index. This is where the ambiguous syntax messes you up. C++ and
Fortran have the same problem. It affects programmer productivity
because error messages get very speculative.

Tim Josling


--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.