[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnubol: subsets
In a message dated 11/17/99 9:22:59 PM EST, mck@tivoli.mv.com writes
comments on semantics:
<< It validates and compiles pictures. >>
The first part of these comments relate to picture clauses as edit pictures.
That phrase 'compiles pictures' is instructive. Pictures are like regular
expressions, and some of the machies with EDIT instructions are regex
engines. Maybe we should call these 'pikex', 'pikexes' and 'pikex engines'.
(the k to prevent sibilantization into something strange with pies). Although
COBOLer might think pikex is a PIC X, so we should pronouce it
'pike ex'.
How many pikex engines do we need?. Can runtime work with just one universal
pikex engine? Or will efficiency consideration lead to several speed-devils
that get harnessed for a spin on pikexes with subsets of picture characters?
Can anything about pikex for PICTURE clauses relate as a generalization for
UNSTRING ... DELIMITED BY, or INSPECT variations? If applicable to procedure
division concepts can pikex engines get harnessed potentially only for
literals known at compile time or can we stretch it? I know what compiler
means in your phrase, is there any way to compile in runtime? (For data named
'delimitors' or 'replacors'.) Can we design that well?
I think that 'validate' may not be a semantics function. In the exact sense
of validate picture clauses. There even the lexer must be strong, and I for
one can not imagine the parser being out of contact with SYMT.
Further it might be an interesting technique to assume that the pikex engines
need a lot of generality. A compiled pikex becomes a kind of set of commands
that drive a pikex engine.
We need to leave room for COBOL2K or other future stuff here too. The key
seems to be to make the meaning of a PICTURE clause deteministic as of the
point it crosses from syntax to semantics. I am on really thin ice here with
this teminology, but the point is rather than validate, the semantics might
be more interested in closing the signification of a pikex.
Actually, if semantics sees a thing in a pikex that is not right, then it is
literal. If that looks like a bug in our compiler either semantics _IS_
failing to see it as symbolic or syntax must be strengthened to stop this
error from flowing through. See what I mean? The compilation of pikexes and
the functions of any pikex engines gets a little easy if we clear the drawing
board of 'validation'. Because of the need for the compiled image to be
determinisitic, it must be valid 1) to implement an early version that just
prints the original picture clause, 2) in an increasingly mature
compiler/engine when a specific position is unexpeced it must be a
litteral.
On picture clauses that are not editing mechanisms. I think that validating
these might be considered wholely syntactic. Much of the picture is really
type information. And I feel that much of the management of type needs to be
near syntax, and some even just in time in front of syntax.
Please take all of that as IMHO.
You mentioned so many good ideas. Wow! But I kind of think of syntax as
validation. I am not idealistic about it. I know that we need to see what
they mean by code before we can see
if it is valid in the sense of doable (and there are a great number of such
senses in the area of semantics). But I think we should scrub things pretty
well before semantics.
A notion that is implicit in some of my other posts is substitutes on certain
nodes of structures that might be emitted by actions in procedure division
rules. If we have an otherwise valid PERFORM statement, conceptually, it
might be worthwhile sending it to semantics even if the UNTIL clause is junk.
This exactly because we can't sense certain classes of errors until we get
there, and maybe the named procedures or VARYING clause has something we want
semantics to tell us about.
In the background is the beleive that we should diagnose as much as possible
on every compile. As long as we would be willing to consider carrying the
necessary error flags, semantics _could_ be presented with constructs that
are a tad imperfect, but made processible by substituting certain nodes in
the construct (say maybe an UNTIL clause that just looked fine but was a
complete dummy).
I am not pushing this approach really. I am pushing full program compile and
as many diagnosings as possible. So I kind of see the validate issue
differently. I think one approach might be to say that syntax is obligated
to emit only constructs that are valid as far as it can tell. No partial,
chopped constructs, all complete. So to complete add dummy for necessary
portions where missing (always taking minimum approach), and substitute parts
that are detected as errors in certain productions. So it is also in this
exact sense that I enquire as to what is the unit of measure of the interface
from syntax to semantics? It is like it is a multilegged creature.
You also comment about "otherwise useless END-x delimiters " in any possible
interface into semantics. With that I strongly agree. But perhaps for
different reasons. I have two. First as stated, I think the interface is
structured. The structure of the interface implies the end of each unit of
work, maybe. My second reason actually relates back to paren balancing
believe it or not.
I remain concerned about a parser diving into paren unbalanced code. But I
think I see that an open paren and a close paren are like an IF and an
END-IF. And so if we are smart, and we might have to be very smart for this,
but if we are smart then an open paren that lacks a matching close paren is
analogous to an IF that has no END-IF (which notably is optional). That
little intrigue may cut the unbalanced paren problem down. We most
definitely need to generalize our thinking about unbalance BEGIN END type
tokens, cause that's the biz we iz in. Combined with transformation of
negative close parens as a distinct token from close paren, the whole thing
might be in reach. Who knows maybe we could code this thing!
Nearly all of that concept, if useful, lands in the parser and the lexer.
However, when studied as a lexeme stream, COBOL basically has BEGIN tokens
(incidentally and perhaps agravatingly it also has the _noise_ of optional
END tokens). This is distinct from C and Pascal. which have varieties of b/e
sets that must marry and a universal scope terminating semicolon that is
slightly different as a separator in Pascal and a terminator in C. You can
almost, not too close but almost, just look for the BEGIN type tokens in
COBOL (that has been implied and stated many ways in the early work of this
project). But regrettably, you actually need the END type tokens sometimes.
Ah! But only for syntax delimitation, if you will allow that phrase. The
interface into semantics does not need END-x noise.
But by the same token :-) Really, I would not conceive of the semantics work
flow as having headers or leading tokens like verbs or clause tokens. In
effect grammar rules not only catch things in sequence, they have gathered
things; very ofter you can wait until late to emit the
action. Appropriately so, IMHO.
This is very subjective, and I can't hand this to you, but I think we should
really generalize that interface, and send major bundles all at once... Not
like paragraphs all at once or anything like that but, I am saying we should
not send VERB, CLAUSE, CLAUSE, as a sequence. Instead a single item highly
decorated. I think that the abstraction of this thing should come from a
complete review of the procedure division requirements. That will even
suggest some of the things we need for some of the data division (a small
part).
Pease tell us more of _your_ ideas though, you clearly have had some good
experience with the latter portions of compilers.
Best Wishes,
Bob Rayhawk
RKRayhawk@aol.com
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.