[DFDL-WG] interesting format: RINEX

Steve Hanson smh at uk.ibm.com
Wed Jan 4 03:39:32 PST 2023


Hi Mike

Happy New Year.

Perhaps I am missing something, but for this format I would use a discriminator with testKind="pattern" and a regex that skips past the first 60 bytes and looks at the next 20. A pattern discriminator is our way of peeking ahead at the data.

Regards

Steve Hanson

IBM Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com<mailto:smh at uk.ibm.com>
tel:+44-7717-378890
Note: I work Tuesday to Friday

-----Original Message-----
From: Mike Beckerle <mbeckerle at apache.org<mailto:Mike%20Beckerle%20%3cmbeckerle at apache.org%3e>>
Reply-To: mbeckerle at apache.org<mailto:mbeckerle at apache.org>
To: DFDL-WG <dfdl-wg at ogf.org<mailto:DFDL-WG%20%3cdfdl-wg at ogf.org%3e>>
Subject: [EXTERNAL] [DFDL-WG] interesting format: RINEX
Date: Tue, 20 Dec 2022 09:23:26 -0500

I ran into an interesting format today called RINEX that is problematic for DFDL v1. 0:  http: //acc. igs. org/misc/rinex304. pdf This format has a multi-line header with various kinds of header lines which are 80 characters, but characters 1 to
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
I ran into an interesting format today called RINEX that is problematic for DFDL v1.0:

 http://acc.igs.org/misc/rinex304.pdf<http://acc.igs.org/misc/rinex304.pdf>

This format has a multi-line header with various kinds of header lines which are 80 characters, but characters 1 to 60 are data, and 61 to 80 are the "label", and you need the label in order to know how to parse the data of each header line.

This either requires deep (and slow) speculation, or some sort of look ahead feature.

E.g., See page 69 of that spec document for an example. The headers appear not only at the start of the file, but at the start of each data block.

We have run into the need for short, fixed-distance look ahead before in other formats.
In the case of RINEX the lookahead is exactly 60 characters.

In other formats it is also a very short distance, but one which is tricky to figure out as the spec doesn't say exactly how big the region to be looked past is. One would have to figure it out from the data format spec. It's always a constant of course.

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org<http://daffodil.apache.org/>
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl<http://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl>
Owl Cyber Defense | www.owlcyberdefense.com<http://www.owlcyberdefense.com/>



--

  dfdl-wg mailing list

  dfdl-wg at lists.ogf.org<mailto:dfdl-wg at lists.ogf.org>

  https://lists.ogf.org/mailman/listinfo/dfdl-wg

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 9817 bytes
Desc: not available
URL: <https://lists.ogf.org/pipermail/dfdl-wg/attachments/20230104/7509875c/attachment.txt>


More information about the dfdl-wg mailing list