[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnubol: subsets



>>>>> "Bob" == RKRayhawk  <RKRayhawk@aol.com>
>>>>> wrote the following on Thu, 18 Nov 1999 04:40:56 EST

  Bob> In a message dated 11/17/99 9:22:59 PM EST, mck@tivoli.mv.com
  Bob> writes comments on semantics:

  Bob> << It validates and compiles pictures.  >>

  Bob> The first part of these comments relate to picture clauses as
  Bob> edit pictures.

  Bob>  That phrase 'compiles pictures' is instructive. Pictures are
  Bob> like regular expressions, and some of the machies with EDIT
  Bob> instructions are regex engines. Maybe we should call these
  Bob> 'pikex', 'pikexes' and 'pikex engines'.  (the k to prevent
  Bob> sibilantization into something strange with pies). Although
  Bob> COBOLer might think pikex is a PIC X, so we should pronouce it
  Bob> 'pike ex'.

I think the similarity to regex's is pretty superficial, but the
reality is almost as interesting.  I did come up with a short
procedure to both validate and compile pictures.  I don't have it in
code, but I'm pretty sure I could reconstruct it.  If anyone would
like to play, it was based on a bit matrix representation of the
precedence table from the standard.  The table is probably in your
COBOL manual as well.  

  Bob> How many pikex engines do we need?. Can runtime work with just
  Bob> one universal pikex engine? Or will efficiency consideration
  Bob> lead to several speed-devils that get harnessed for a spin on
  Bob> pikexes with subsets of picture characters?

Probably one numeric and one alpha-numeric.  Of more interest is your
reference to runtime.  If you're referring to some program in the COBOL
runtime library, then you're making the same unwarranted assumption
that I did the last time out.  Fortunately, I was working with a very
talented young lady who realized that it was not harder to generate
editing code than it was to write the subroutine.  It comes out very
fast and not excessively large.

  Bob> Can anything about pikex for PICTURE clauses relate as a
  Bob> generalization for UNSTRING ... DELIMITED BY, or INSPECT
  Bob> variations? If applicable to procedure division concepts can
  Bob> pikex engines get harnessed potentially only for literals
  Bob> known at compile time or can we stretch it?  I know what
  Bob> compiler means in your phrase, is there any way to compile in
  Bob> runtime? (For data named 'delimitors' or 'replacors'.) Can we
  Bob> design that well?

Well, I think something could be done, but I'm not sure you've found
the right application for it.  Try laying out C code for these verbs
to see if the apparent similarity persists.

  Bob> I think that 'validate' may not be a semantics function. In
  Bob> the exact sense of validate picture clauses.  There even the
  Bob> lexer must be strong, and I for one can not imagine the parser
  Bob> being out of contact with SYMT.

I definitely think it is.  Lexing may find obvious things like
invalid characters, but the picture syntax is everything in the
precedence table that I mentioned earlier.  Nothing less will serve.
If we do it in semantics, we get compiled editing pictures as a side
effect.

  Bob> Further it might be an interesting technique to assume that
  Bob> the pikex engines need a lot of generality. A compiled pikex
  Bob> becomes a kind of set of commands that drive a pikex engine.
  Bob> We need to leave room for COBOL2K or other future stuff here
  Bob> too. The key seems to be to make the meaning of a PICTURE
  Bob> clause deteministic as of the point it crosses from syntax to
  Bob> semantics.  I am on really thin ice here with this teminology,
  Bob> but the point is rather than validate, the semantics might be
  Bob> more interested in closing the signification of a pikex.

  Bob> Actually, if semantics sees a thing in a pikex that is not
  Bob> right, then it is literal. If that looks like a bug in our
  Bob> compiler either semantics _IS_ failing to see it as symbolic
  Bob> or syntax must be strengthened to stop this error from flowing
  Bob> through.  See what I mean? The compilation of pikexes and the
  Bob> functions of any pikex engines gets a little easy if we clear
  Bob> the drawing board of 'validation'.  Because of the need for
  Bob> the compiled image to be determinisitic, it must be valid 1)
  Bob> to implement an early version that just prints the original
  Bob> picture clause, 2) in an increasingly mature compiler/engine
  Bob> when a specific position is unexpeced it must be a litteral.

Let's see how close we are on this.  Generally, the "engines" that I
have used are based on the mainframe instruction, which is in turn
based on the last generation of the punched card accounting machines.
The basic elements are a fill character, a digit select op and a
start significance op.  The last two are characters which precede the
printables in the ASCII collating sequence. A start significance op
signals that succeeding digits from the source number are to be
printed even if we have not yet seen a non-zero digit.  Digit selects
take the next digit from the source number and (1) print it if it is
not zero, (2) print it if it has been preceded by a non-zero digit
(3) print it if the start significance has been traversed (4)
otherwise insert the fill character and discard the digit.
Characters in the picture program, other than those cited above are
insertion characters and are inserted or suppressed by the same rules
that apply to digits.  There's a somewhat more to it, but that's the
general outline. 

  Bob> On picture clauses that are not editing mechanisms. I think
  Bob> that validating these might be considered wholely
  Bob> syntactic. Much of the picture is really type information. And
  Bob> I feel that much of the management of type needs to be near
  Bob> syntax, and some even just in time in front of syntax.

Sure, but since it falls out of the compilation...

  Bob> Please take all of that as IMHO.

  Bob> You mentioned so many good ideas. Wow!  But I kind of think of
  Bob> syntax as validation. I am not idealistic about it. I know
  Bob> that we need to see what they mean by code before we can see
  Bob> if it is valid in the sense of doable (and there are a great
  Bob> number of such senses in the area of semantics). But I think
  Bob> we should scrub things pretty well before semantics.

The good ideas are the same ones you would get one by one if you
undertook the semantics phase.  They are really the requirements.
Semantics is big and touches everything.  I wouldn't strain too hard
to make things perfect, because even correct things need attention in
this phase.  Sometimes it's the easiest place to do diagnoses.  

  Bob> A notion that is implicit in some of my other posts is
  Bob> substitutes on certain nodes of structures that might be
  Bob> emitted by actions in procedure division rules. If we have an
  Bob> otherwise valid PERFORM statement, conceptually, it might be
  Bob> worthwhile sending it to semantics even if the UNTIL clause is
  Bob> junk.  This exactly because we can't sense certain classes of
  Bob> errors until we get there, and maybe the named procedures or
  Bob> VARYING clause has something we want semantics to tell us
  Bob> about.

Sure, you can mentally flip back and forth.  Think about what
semantics processing would be for the obvious parsed output.  If it
doesn't feel right, could the parser do anything differently?

  Bob> In the background is the beleive that we should diagnose as
  Bob> much as possible on every compile.  As long as we would be
  Bob> willing to consider carrying the necessary error flags,
  Bob> semantics _could_ be presented with constructs that are a tad
  Bob> imperfect, but made processible by substituting certain nodes
  Bob> in the construct (say maybe an UNTIL clause that just looked
  Bob> fine but was a complete dummy).

Usually, I've done things like this because of parser limitations.
Semantics is flexible because it's just code.  

  Bob> I am not pushing this approach really. I am pushing full
  Bob> program compile and as many diagnosings as possible. So I kind
  Bob> of see the validate issue differently.  I think one approach
  Bob> might be to say that syntax is obligated to emit only
  Bob> constructs that are valid as far as it can tell.  No partial,
  Bob> chopped constructs, all complete.  So to complete add dummy
  Bob> for necessary portions where missing (always taking minimum
  Bob> approach), and substitute parts that are detected as errors in
  Bob> certain productions. So it is also in this exact sense that I
  Bob> enquire as to what is the unit of measure of the interface
  Bob> from syntax to semantics? It is like it is a multilegged
  Bob> creature.

Well, syntax is primarily responsible for syntax.  Semantics is not
so robust that it can deal with arbitrary input.  Things like invalid
operands are not going to bother it, though.  This is not to say that
getting the parser to do a bit more is not a good idea, but we need
to do the tradeoffs properly.  It doesn't make sense to add half a
dozen productions to the parser to save a couple of tests in
semantics. 

  Bob> You also comment about "otherwise useless END-x delimiters "
  Bob> in any possible interface into semantics.  With that I
  Bob> strongly agree. But perhaps for different reasons. I have
  Bob> two. First as stated, I think the interface is structured. The
  Bob> structure of the interface implies the end of each unit of
  Bob> work, maybe. My second reason actually relates back to paren
  Bob> balancing believe it or not.

Unfortunately, the way the language has grown, the END-x delims are
always permitted but only required in rather special circumstances,
so they do not constitute a general structuring mechanism.  Can you
really make a case for paren balancing when the right paren is
optional? 

  Bob> I remain concerned about a parser diving into paren unbalanced
  Bob> code. But I think I see that an open paren and a close paren
  Bob> are like an IF and an END-IF. And so if we are smart, and we
  Bob> might have to be very smart for this, but if we are smart then
  Bob> an open paren that lacks a matching close paren is analogous
  Bob> to an IF that has no END-IF (which notably is optional). That
  Bob> little intrigue may cut the unbalanced paren problem down.  We
  Bob> most definitely need to generalize our thinking about
  Bob> unbalance BEGIN END type tokens, cause that's the biz we iz
  Bob> in. Combined with transformation of negative close parens as a
  Bob> distinct token from close paren, the whole thing might be in
  Bob> reach. Who knows maybe we could code this thing!

Sometimes they're legitimately missing, sometimes they're noise,
sometimes they're critical to determining control flow.  I guess I'd
rather discard the one's that are noise instead of trying to add the
noise to the ones that are missing.  

  Bob> Nearly all of that concept, if useful, lands in the parser and
  Bob> the lexer.  However, when studied as a lexeme stream, COBOL
  Bob> basically has BEGIN tokens (incidentally and perhaps
  Bob> agravatingly it also has the _noise_ of optional END
  Bob> tokens). This is distinct from C and Pascal. which have
  Bob> varieties of b/e sets that must marry and a universal scope
  Bob> terminating semicolon that is slightly different as a
  Bob> separator in Pascal and a terminator in C. You can almost, not
  Bob> too close but almost, just look for the BEGIN type tokens in
  Bob> COBOL (that has been implied and stated many ways in the early
  Bob> work of this project). But regrettably, you actually need the
  Bob> END type tokens sometimes.  Ah! But only for syntax
  Bob> delimitation, if you will allow that phrase.  The interface
  Bob> into semantics does not need END-x noise.

Got it!

  Bob> But by the same token :-) Really, I would not conceive of the
  Bob> semantics work flow as having headers or leading tokens like
  Bob> verbs or clause tokens. In effect grammar rules not only catch
  Bob> things in sequence, they have gathered things; very ofter you
  Bob> can wait until late to emit the action. Appropriately so,
  Bob> IMHO.

  Bob> This is very subjective, and I can't hand this to you, but I
  Bob> think we should really generalize that interface, and send
  Bob> major bundles all at once... Not like paragraphs all at once
  Bob> or anything like that but, I am saying we should not send
  Bob> VERB, CLAUSE, CLAUSE, as a sequence. Instead a single item
  Bob> highly decorated.  I think that the abstraction of this thing
  Bob> should come from a complete review of the procedure division
  Bob> requirements. That will even suggest some of the things we
  Bob> need for some of the data division (a small part).

In a sense, I think conventional parsing is doing that.  What you'll
really see is OPERAND OPERAND VERB which semantics will synthesize
into a statement by building a tree bottom up.  It may or may not try
to synthesize larger constructs from individual statements.

  Bob> Pease tell us more of _your_ ideas though, you clearly have
  Bob> had some good experience with the latter portions of
  Bob> compilers.

I think it's amusing that you credit me with these ideas.  If anyone
can lay just claim to them, he or she probably served as part of
Grace Hopper's original team.  I have been in the trenches, though,
and I know where some of the mines are.

  Bob> Best Wishes, Bob Rayhawk RKRayhawk@aol.com




  Bob> -- This message was sent through the gnu-cobol mailing list.
  Bob> To remove yourself from this mailing list, send a message to
  Bob> majordomo@lusars.net with the words "unsubscribe gnu-cobol" in
  Bob> the message body.  For more information on the GNU COBOL
  Bob> project, send mail to gnu-cobol-owner@lusars.net.




--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.