[DFDL-WG] interesting format: RINEX

Mike Beckerle mbeckerle at apache.org
Tue Dec 20 06:23:26 PST 2022


I ran into an interesting format today called RINEX that is problematic for
DFDL v1.0:

 http://acc.igs.org/misc/rinex304.pdf

This format has a multi-line header with various kinds of header lines
which are 80 characters, but characters 1 to 60 are data, and 61 to 80 are
the "label", and you need the label in order to know how to parse the data
of each header line.

This either requires deep (and slow) speculation, or some sort of look
ahead feature.

E.g., See page 69 of that spec document for an example. The headers appear
not only at the start of the file, but at the start of each data block.

We have run into the need for short, fixed-distance look ahead before in
other formats.
In the case of RINEX the lookahead is exactly 60 characters.

In other formats it is also a very short distance, but one which is tricky
to figure out as the spec doesn't say exactly how big the region to be
looked past is. One would have to figure it out from the data format spec.
It's always a constant of course.

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 3603 bytes
Desc: not available
URL: <https://lists.ogf.org/pipermail/dfdl-wg/attachments/20221220/40905644/attachment.txt>


More information about the dfdl-wg mailing list