[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: gnubol: Record delimiter clause and parse order




Caveat: I'm not a Unix guy. I'm commenting only as a Cobol guy (IBM
Mainframe, HP 3000, TI 990, PC (Realia & Micro Focus, both under OS/2 and
WIndows),  System 34, System 36, System 38, Unisys, but no Unix). I'm a bit
obsessive/compulsive so I own personal copies of the 74 standard, the 85
standard and the proposed 2002 standard. Take all of these comments in that
light.

=====[begin]=====
I was toying with both ideas.  How portable would using a default
delimiter, say, '\n', be?  I mean, you conceptually expect records to be
on successive lines, anyhow.  But it may not be portable.  I can't
really find a definitive description of the structure of a sequential
file, besides Microfocus' specification for "line sequential"--which in
itself is not portable between Unix and Windows, for example.  MF says a
"line sequential" file is a file which could have been created with a
text editor, with each record on a successive line.  Now, if you create
the file in something like, oh, Notepad, the lines will be delimited by
"\r\n", whereas on Unix you will only have "\n"  I don't know how much
of an issue, though--guess it depends on how many "\r\n" systems that
this will be used on.

My personal vote is for reading up until a "\n".  I would guess that
this is pretty much implementation-dependent anyhow.  The question would
remain, however--what to do with all that MF code that uses "line
sequential"--treat that the same, or what?
=====[end]=====

My suggestion:

(1) Handle the fixed length in addition to the delimited variable length
read as fixed.  It permits writing/reading files without delimiters.
(2) Handle the \r\n & \n as the same.  (My non-Unix background is showing
here, I'd call them CR-LF and LF). My recollection is that Micro Focus will
handle either.

In other words, my personal suggestion is to handle bothI think you need to
be able to handle either.


=====[begin]=====
> Now consider a "variable length" file. I use the "record length varying .
.
> . depending on" clause to identify a Working Storage variable to contain
the
> record length. On a read, this variable should be set by the file system,
> after the read, to contain the length of the record actually read.


This is good, for *after* a read.  But how do you know how many bytes to
read *before* the read?  There must be some delimiter to mark the end of
a record, otherwise there's no way to know when you've read in the
totality of the record, until you reach the max. number of bytes for the
record definition.
=====[end]=====

The "count" approach is not really suitable for a PC/Unix environment. Such
environments do not have a file system to handle such issues. The delimiter
is much better.

For the PC, a "line sequential" fixed and a "line sequential" variable, look
the same physically. The difference is what happens to the "record" during
the read. If I am reading a fixed length file of 80 bytes as "line
sequential" and it only has "20 characters" I would expect the remaining 60
bytes to be spaces in the record inside the program.  If I am reading a
variable length record (of between 1 and 80 bytes) and i get only 20, I
would expect the remaining 60 bytes to be unchanged from whatever they were
before the read). It is a subtle but important difference.

In summary, my personal suggestions would be to support a delimiter for
variable and both a delimiter and a record length for fixed.

James S. Huggins


--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.