[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: subsets



In a message dated 11/17/99 9:22:59 PM EST, mck@tivoli.mv.com writes
comments on semantics:

<< It validates and  compiles pictures.  >>

The first part of these comments relate to picture clauses as edit pictures.

 That phrase 'compiles pictures' is instructive. Pictures are like regular 
expressions, and some of the machies with EDIT instructions are regex 
engines. Maybe we should call these 'pikex', 'pikexes' and 'pikex engines'. 
(the k to prevent sibilantization into something strange with pies). Although 
COBOLer might think pikex is a PIC X, so we should pronouce it 
'pike ex'.

How many pikex engines do we need?. Can runtime work with just one universal 
pikex engine? Or will efficiency consideration lead to several speed-devils 
that get harnessed for a spin on pikexes with subsets of picture characters?

Can anything about pikex for PICTURE clauses relate as a generalization for 
UNSTRING ... DELIMITED BY, or INSPECT variations? If applicable to procedure 
division concepts can pikex engines get harnessed potentially only for 
literals known at compile time or can we stretch it?  I know what compiler 
means in your phrase, is there any way to compile in runtime? (For data named 
'delimitors' or 'replacors'.) Can we design that well?

I think that 'validate' may not be a semantics function. In the exact sense 
of validate picture clauses.  There even the lexer must be strong, and I for 
one can not imagine the parser being out of contact with SYMT.

Further it might be an interesting technique to assume that the pikex engines 
need a lot of generality. A compiled pikex becomes a kind of set of commands 
that drive a pikex engine.
We need to leave room for COBOL2K or other future stuff here too. The key 
seems to be to make the meaning of a PICTURE clause deteministic as of the 
point it crosses from syntax to semantics.  I am on really thin ice here with 
this teminology, but the point is rather than validate, the semantics might 
be more interested in closing the signification of a pikex. 

Actually, if semantics sees a thing in a pikex that is not right, then it is 
literal. If that looks like a bug in our compiler either semantics _IS_ 
failing to see it as symbolic or syntax must be strengthened to stop this 
error from flowing through.  See what I mean? The compilation of pikexes and 
the functions of any pikex engines gets a little easy if we clear the drawing 
board of 'validation'.  Because of the need for the compiled image to be 
determinisitic, it must be valid 1) to implement an early version that just 
prints the original picture clause, 2) in an increasingly mature 
compiler/engine when a specific position is unexpeced it must be a 
litteral.

On picture clauses that are not editing mechanisms. I think that validating 
these might be considered wholely syntactic. Much of the picture is really 
type information. And I feel that much of the management of type needs to be 
near syntax, and some even just in time in front of syntax. 


Please take all of that as IMHO.

You mentioned so many good ideas. Wow!  But I kind of think of syntax as 
validation. I am not idealistic about it. I know that we need to see what 
they mean by code before we can see
if it is valid in the sense of doable (and there are a great number of such 
senses in the area of semantics). But I think we should scrub things pretty 
well before semantics.

A notion that is implicit in some of my other posts is substitutes on certain 
nodes of structures that might be emitted by actions in procedure division 
rules. If we have an otherwise valid PERFORM statement, conceptually, it 
might be worthwhile sending it to semantics even if the UNTIL clause is junk. 
 This exactly because we can't sense certain classes of errors until we get 
there, and maybe the named procedures or VARYING clause has something we want 
semantics to tell us about.

In the background is the beleive that we should diagnose as much as possible 
on every compile.  As long as we would be willing to consider carrying the 
necessary error flags, semantics _could_ be presented with constructs that 
are a tad imperfect, but made processible by substituting certain nodes in 
the construct (say maybe an UNTIL clause that just looked fine but was a 
complete dummy).

I am not pushing this approach really. I am pushing full program compile and 
as many diagnosings as possible. So I kind of see the validate issue 
differently.  I think one approach might be to say that syntax is obligated 
to emit only constructs that are valid as far as it can tell.  No partial, 
chopped constructs, all complete.  So to complete add dummy for necessary 
portions where missing (always taking minimum approach), and substitute parts 
that are detected as errors in certain productions. So it is also in this 
exact sense that I enquire as to what is the unit of measure of the interface 
from syntax to semantics? It is like it is a multilegged creature.

You also comment about "otherwise useless END-x delimiters " in any possible 
interface into semantics.  With that I strongly agree. But perhaps for 
different reasons. I have two. First as stated, I think the interface is 
structured. The structure of the interface implies the end of each unit of 
work, maybe. My second reason actually relates back to paren balancing 
believe it or not.

I remain concerned about a parser diving into paren unbalanced code. But I 
think I see that an open paren and a close paren are like an IF and an 
END-IF. And so if we are smart, and we might have to be very smart for this, 
but if we are smart then an open paren that lacks a matching close paren is 
analogous to an IF that has no END-IF (which notably is optional). That 
little intrigue may cut the unbalanced paren problem down.  We most 
definitely need to generalize our thinking about unbalance BEGIN END type 
tokens, cause that's the biz we iz in. Combined with transformation of 
negative close parens as a distinct token from close paren, the whole thing 
might be in reach. Who knows maybe we could code this thing!

Nearly all of that concept, if useful, lands in the parser and the lexer. 
However, when studied as a lexeme stream, COBOL basically has BEGIN tokens 
(incidentally and perhaps agravatingly it also has the _noise_ of optional 
END tokens). This is distinct from C and Pascal. which have varieties of b/e 
sets that must marry and a universal scope terminating semicolon that is 
slightly different as a separator in Pascal and a terminator in C. You can 
almost, not too close but almost, just look for the BEGIN type tokens in 
COBOL (that has been implied and stated many ways in the early work of this 
project). But regrettably, you actually need the END type tokens sometimes.  
Ah! But only for syntax delimitation, if you will allow that phrase.  The 
interface into semantics does not need END-x noise.

But by the same token :-) Really, I would not conceive of the semantics work 
flow as having headers or leading tokens like verbs or clause tokens. In 
effect grammar rules not only catch things in sequence, they have gathered 
things; very ofter you can wait until late to emit the 
action. Appropriately so, IMHO.  

This is very subjective, and I can't hand this to you, but I think we should 
really generalize that interface, and send major bundles all at once... Not 
like paragraphs all at once or anything like that but, I am saying we should 
not send VERB, CLAUSE, CLAUSE, as a sequence. Instead a single item  highly 
decorated.  I think that the abstraction of this thing should come from a 
complete review of the procedure division requirements. That will even 
suggest some of the things we need for some of the data division (a small 
part).


Pease tell us more of _your_ ideas though, you clearly have had some good 
experience with the latter portions of compilers.

Best Wishes,
Bob Rayhawk
RKRayhawk@aol.com




--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.