[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnubol: subsets
>>>>> "Bob" == RKRayhawk <RKRayhawk@aol.com>
>>>>> wrote the following on Thu, 18 Nov 1999 04:40:56 EST
Bob> In a message dated 11/17/99 9:22:59 PM EST, mck@tivoli.mv.com
Bob> writes comments on semantics:
Bob> << It validates and compiles pictures. >>
Bob> The first part of these comments relate to picture clauses as
Bob> edit pictures.
Bob> That phrase 'compiles pictures' is instructive. Pictures are
Bob> like regular expressions, and some of the machies with EDIT
Bob> instructions are regex engines. Maybe we should call these
Bob> 'pikex', 'pikexes' and 'pikex engines'. (the k to prevent
Bob> sibilantization into something strange with pies). Although
Bob> COBOLer might think pikex is a PIC X, so we should pronouce it
Bob> 'pike ex'.
I think the similarity to regex's is pretty superficial, but the
reality is almost as interesting. I did come up with a short
procedure to both validate and compile pictures. I don't have it in
code, but I'm pretty sure I could reconstruct it. If anyone would
like to play, it was based on a bit matrix representation of the
precedence table from the standard. The table is probably in your
COBOL manual as well.
Bob> How many pikex engines do we need?. Can runtime work with just
Bob> one universal pikex engine? Or will efficiency consideration
Bob> lead to several speed-devils that get harnessed for a spin on
Bob> pikexes with subsets of picture characters?
Probably one numeric and one alpha-numeric. Of more interest is your
reference to runtime. If you're referring to some program in the COBOL
runtime library, then you're making the same unwarranted assumption
that I did the last time out. Fortunately, I was working with a very
talented young lady who realized that it was not harder to generate
editing code than it was to write the subroutine. It comes out very
fast and not excessively large.
Bob> Can anything about pikex for PICTURE clauses relate as a
Bob> generalization for UNSTRING ... DELIMITED BY, or INSPECT
Bob> variations? If applicable to procedure division concepts can
Bob> pikex engines get harnessed potentially only for literals
Bob> known at compile time or can we stretch it? I know what
Bob> compiler means in your phrase, is there any way to compile in
Bob> runtime? (For data named 'delimitors' or 'replacors'.) Can we
Bob> design that well?
Well, I think something could be done, but I'm not sure you've found
the right application for it. Try laying out C code for these verbs
to see if the apparent similarity persists.
Bob> I think that 'validate' may not be a semantics function. In
Bob> the exact sense of validate picture clauses. There even the
Bob> lexer must be strong, and I for one can not imagine the parser
Bob> being out of contact with SYMT.
I definitely think it is. Lexing may find obvious things like
invalid characters, but the picture syntax is everything in the
precedence table that I mentioned earlier. Nothing less will serve.
If we do it in semantics, we get compiled editing pictures as a side
effect.
Bob> Further it might be an interesting technique to assume that
Bob> the pikex engines need a lot of generality. A compiled pikex
Bob> becomes a kind of set of commands that drive a pikex engine.
Bob> We need to leave room for COBOL2K or other future stuff here
Bob> too. The key seems to be to make the meaning of a PICTURE
Bob> clause deteministic as of the point it crosses from syntax to
Bob> semantics. I am on really thin ice here with this teminology,
Bob> but the point is rather than validate, the semantics might be
Bob> more interested in closing the signification of a pikex.
Bob> Actually, if semantics sees a thing in a pikex that is not
Bob> right, then it is literal. If that looks like a bug in our
Bob> compiler either semantics _IS_ failing to see it as symbolic
Bob> or syntax must be strengthened to stop this error from flowing
Bob> through. See what I mean? The compilation of pikexes and the
Bob> functions of any pikex engines gets a little easy if we clear
Bob> the drawing board of 'validation'. Because of the need for
Bob> the compiled image to be determinisitic, it must be valid 1)
Bob> to implement an early version that just prints the original
Bob> picture clause, 2) in an increasingly mature compiler/engine
Bob> when a specific position is unexpeced it must be a litteral.
Let's see how close we are on this. Generally, the "engines" that I
have used are based on the mainframe instruction, which is in turn
based on the last generation of the punched card accounting machines.
The basic elements are a fill character, a digit select op and a
start significance op. The last two are characters which precede the
printables in the ASCII collating sequence. A start significance op
signals that succeeding digits from the source number are to be
printed even if we have not yet seen a non-zero digit. Digit selects
take the next digit from the source number and (1) print it if it is
not zero, (2) print it if it has been preceded by a non-zero digit
(3) print it if the start significance has been traversed (4)
otherwise insert the fill character and discard the digit.
Characters in the picture program, other than those cited above are
insertion characters and are inserted or suppressed by the same rules
that apply to digits. There's a somewhat more to it, but that's the
general outline.
Bob> On picture clauses that are not editing mechanisms. I think
Bob> that validating these might be considered wholely
Bob> syntactic. Much of the picture is really type information. And
Bob> I feel that much of the management of type needs to be near
Bob> syntax, and some even just in time in front of syntax.
Sure, but since it falls out of the compilation...
Bob> Please take all of that as IMHO.
Bob> You mentioned so many good ideas. Wow! But I kind of think of
Bob> syntax as validation. I am not idealistic about it. I know
Bob> that we need to see what they mean by code before we can see
Bob> if it is valid in the sense of doable (and there are a great
Bob> number of such senses in the area of semantics). But I think
Bob> we should scrub things pretty well before semantics.
The good ideas are the same ones you would get one by one if you
undertook the semantics phase. They are really the requirements.
Semantics is big and touches everything. I wouldn't strain too hard
to make things perfect, because even correct things need attention in
this phase. Sometimes it's the easiest place to do diagnoses.
Bob> A notion that is implicit in some of my other posts is
Bob> substitutes on certain nodes of structures that might be
Bob> emitted by actions in procedure division rules. If we have an
Bob> otherwise valid PERFORM statement, conceptually, it might be
Bob> worthwhile sending it to semantics even if the UNTIL clause is
Bob> junk. This exactly because we can't sense certain classes of
Bob> errors until we get there, and maybe the named procedures or
Bob> VARYING clause has something we want semantics to tell us
Bob> about.
Sure, you can mentally flip back and forth. Think about what
semantics processing would be for the obvious parsed output. If it
doesn't feel right, could the parser do anything differently?
Bob> In the background is the beleive that we should diagnose as
Bob> much as possible on every compile. As long as we would be
Bob> willing to consider carrying the necessary error flags,
Bob> semantics _could_ be presented with constructs that are a tad
Bob> imperfect, but made processible by substituting certain nodes
Bob> in the construct (say maybe an UNTIL clause that just looked
Bob> fine but was a complete dummy).
Usually, I've done things like this because of parser limitations.
Semantics is flexible because it's just code.
Bob> I am not pushing this approach really. I am pushing full
Bob> program compile and as many diagnosings as possible. So I kind
Bob> of see the validate issue differently. I think one approach
Bob> might be to say that syntax is obligated to emit only
Bob> constructs that are valid as far as it can tell. No partial,
Bob> chopped constructs, all complete. So to complete add dummy
Bob> for necessary portions where missing (always taking minimum
Bob> approach), and substitute parts that are detected as errors in
Bob> certain productions. So it is also in this exact sense that I
Bob> enquire as to what is the unit of measure of the interface
Bob> from syntax to semantics? It is like it is a multilegged
Bob> creature.
Well, syntax is primarily responsible for syntax. Semantics is not
so robust that it can deal with arbitrary input. Things like invalid
operands are not going to bother it, though. This is not to say that
getting the parser to do a bit more is not a good idea, but we need
to do the tradeoffs properly. It doesn't make sense to add half a
dozen productions to the parser to save a couple of tests in
semantics.
Bob> You also comment about "otherwise useless END-x delimiters "
Bob> in any possible interface into semantics. With that I
Bob> strongly agree. But perhaps for different reasons. I have
Bob> two. First as stated, I think the interface is structured. The
Bob> structure of the interface implies the end of each unit of
Bob> work, maybe. My second reason actually relates back to paren
Bob> balancing believe it or not.
Unfortunately, the way the language has grown, the END-x delims are
always permitted but only required in rather special circumstances,
so they do not constitute a general structuring mechanism. Can you
really make a case for paren balancing when the right paren is
optional?
Bob> I remain concerned about a parser diving into paren unbalanced
Bob> code. But I think I see that an open paren and a close paren
Bob> are like an IF and an END-IF. And so if we are smart, and we
Bob> might have to be very smart for this, but if we are smart then
Bob> an open paren that lacks a matching close paren is analogous
Bob> to an IF that has no END-IF (which notably is optional). That
Bob> little intrigue may cut the unbalanced paren problem down. We
Bob> most definitely need to generalize our thinking about
Bob> unbalance BEGIN END type tokens, cause that's the biz we iz
Bob> in. Combined with transformation of negative close parens as a
Bob> distinct token from close paren, the whole thing might be in
Bob> reach. Who knows maybe we could code this thing!
Sometimes they're legitimately missing, sometimes they're noise,
sometimes they're critical to determining control flow. I guess I'd
rather discard the one's that are noise instead of trying to add the
noise to the ones that are missing.
Bob> Nearly all of that concept, if useful, lands in the parser and
Bob> the lexer. However, when studied as a lexeme stream, COBOL
Bob> basically has BEGIN tokens (incidentally and perhaps
Bob> agravatingly it also has the _noise_ of optional END
Bob> tokens). This is distinct from C and Pascal. which have
Bob> varieties of b/e sets that must marry and a universal scope
Bob> terminating semicolon that is slightly different as a
Bob> separator in Pascal and a terminator in C. You can almost, not
Bob> too close but almost, just look for the BEGIN type tokens in
Bob> COBOL (that has been implied and stated many ways in the early
Bob> work of this project). But regrettably, you actually need the
Bob> END type tokens sometimes. Ah! But only for syntax
Bob> delimitation, if you will allow that phrase. The interface
Bob> into semantics does not need END-x noise.
Got it!
Bob> But by the same token :-) Really, I would not conceive of the
Bob> semantics work flow as having headers or leading tokens like
Bob> verbs or clause tokens. In effect grammar rules not only catch
Bob> things in sequence, they have gathered things; very ofter you
Bob> can wait until late to emit the action. Appropriately so,
Bob> IMHO.
Bob> This is very subjective, and I can't hand this to you, but I
Bob> think we should really generalize that interface, and send
Bob> major bundles all at once... Not like paragraphs all at once
Bob> or anything like that but, I am saying we should not send
Bob> VERB, CLAUSE, CLAUSE, as a sequence. Instead a single item
Bob> highly decorated. I think that the abstraction of this thing
Bob> should come from a complete review of the procedure division
Bob> requirements. That will even suggest some of the things we
Bob> need for some of the data division (a small part).
In a sense, I think conventional parsing is doing that. What you'll
really see is OPERAND OPERAND VERB which semantics will synthesize
into a statement by building a tree bottom up. It may or may not try
to synthesize larger constructs from individual statements.
Bob> Pease tell us more of _your_ ideas though, you clearly have
Bob> had some good experience with the latter portions of
Bob> compilers.
I think it's amusing that you credit me with these ideas. If anyone
can lay just claim to them, he or she probably served as part of
Grace Hopper's original team. I have been in the trenches, though,
and I know where some of the mines are.
Bob> Best Wishes, Bob Rayhawk RKRayhawk@aol.com
Bob> -- This message was sent through the gnu-cobol mailing list.
Bob> To remove yourself from this mailing list, send a message to
Bob> majordomo@lusars.net with the words "unsubscribe gnu-cobol" in
Bob> the message body. For more information on the GNU COBOL
Bob> project, send mail to gnu-cobol-owner@lusars.net.
--
This message was sent through the gnu-cobol mailing list. To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body. For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.