[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gnubol: Sequential File I/O plans



Here's what I've put together, after some research into how others do
it, and into the specification, both in the latest draft, and in
MicroFocus System Reference manuals.  I appreciate feedback, before I
get too far in a coding frenzy. ;)

1) All file information will be parsed into a structure.

2) Said structure will contain the physical file name (or device), any
flags
   to be associated with the file, the read or write buffers (or
pointers to,
   rather).  The file descriptor (either int or FILE *, depending on
best
   implementation) will be held in the structure, as well.  Lastly, the
   structure will contain a pointer to the 01 level data group for the
FD for
   the file.

3) OPENs will work on the file descriptor associated with the filename.
It will
   (f?)open the file according to the flags in the struct. INPUT,
OUTPUT, I-O,
   and EXTEND will set "r", "w", "rw", and a flags, appropriately.  If
we go
   with the low level open() call, then the appropriate flags would be
ORed, of
   course. When a file is opened, an 'open' flag should be set. Multiple
calls 
   to OPEN without first CLOSEing the file will result in an error. 
This is, I
   believe, the behavior of most, if not all, major COBOL compilers.

4) READs will read each sequential record into the buffers--default will
be two
   at a time.  This will be adjustable, of course.  A marker will keep
track of
   which buffer is being accessed.  Successive READs will use the data
in the
   read buffers until they are empty, at which time the marker will be
reset to
   the first buffer, and the buffers filled.  I haven't worked out the
best
   way to handle the group item defined in the FD--whether to make it a
pointer
   to the current buffer, or to make it a separate area into which
records are
   read.  In either case, in the case of READ INTO statements, the
record will
   be moved from the FD description area to the WORKING-STORAGE area,
being
   truncated or padded as necessary.  The mechanics or READs are, of
course,
   open to suggestion.  I feel, however, that the above most closely
represents
   the behavior of other mainstream COBOL implementations.

5) WRITEs will be handled in much the same way as READs.  The buffers
will be
   written to the file once all the buffers are full.  WRITE FROMs will
be
   moved to the buffer areas for writing. The buffer marker will then be
moved
   to the next empty buffer.  If the buffers are full, they will be
flushed to
   disk, cleared, and the buffer marker set to the first buffer.

6) CLOSEs will be just a regular close().  First, flush the buffers to
disk,
   then close() the file handle, and unset the OPEN flag.

As far as parsing goes, when the SELECT clause is parsed, the structure
will 
be named from identifier name.  There will *need* to be a flag set and
there
must be a way to ensure that all SELECTs have an accompanying FD in the
FILE
SECTION.  Going the other way should be fairly simple, but there needs
to be a
mechanism, when the end of the FILE SECTION is reached, and all the FDs
have been parsed, for checking that each SELECT has a corresponding FD. 
E.g., 
if I parse an FD on OUT-FILE, but no structure for OUT-FILE exists, it's
pretty
easy to generate an appropriate error.  But if I have a SELECT for
OUT-FILE,
and reach the WORKING-STORAGE SECTION without encountering an FD for
OUT-FILE,
there needs to be a syntax error generated.  Just food for thought, and
not
immediately pressing, of course.

Regarding relative and indexed files, I have a couple of suggestions. 
My
understanding of relative access is that you are just seeking to a
specific
offset, calculated by some pre-determined value.  This means it could
probably
just be a flatfile, and we could fseek to whatever calculated offset
came up.
It's been a while since I've done anything at all with relative files,
so...
For indexed files, there seems to be a couple of different methods. 
Well, one,
really--using db libraries.  My suggestion here is to use gdbm, but
there may 
be a way to specify at runtime which library should be used to read in
an
indexed file.  That would be best, I think, if it could be done
efficiently.  Similar structures for these filetypes could be used as
what I am planning on using for sequential file I/O.

Now, my last concern is how to get all this off in a runtime library,
supposing
it needs to be. Is this type of operation something that should be in
RTL, or
would it be better to just put it in the binary?  OPENs, et al, are
pretty
common, so it might be better to put it in its own library, along with
other
common COBOL functions.  I'm open to guidance on that, and it's probably
not
a real pressing concern, until I actually have some working code.

Anyhow, those're my ideas, and, barring objections or a better, more
efficient
way, the direction I intend to work on.  I'll put together the additions
for
parser, based on what's already in the grammar.  I'm going to put all
that I
can out in secondary source files, making the parser call the functions
as 
needed.  This will keep the parser a lot cleaner, and also help promote
a 
better, more malleable interface, without causing unnecessary changes to
the
parser.


-- 
Matthew Vanecek
Visit my Website at http://mysite.directlink.net/linuxguy
For answers type: perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
*****************************************************************
For 93 million miles, there is nothing between the sun and my shadow
except me. I'm always getting in the way of something...

--
This message was sent through the gnu-cobol mailing list.  To remove yourself
from this mailing list, send a message to majordomo@lusars.net with the
words "unsubscribe gnu-cobol" in the message body.  For more information on
the GNU COBOL project, send mail to gnu-cobol-owner@lusars.net.