[DFDL-WG] interesting format: RINEX

Mike Beckerle mbeckerle at apache.org
Wed Jan 4 06:22:14 PST 2023


Ah, the magic hammer: regex.

Of course I should have thought of that, but I'm always looking to avoid
things that involve backtracking.

Thanks



On Wed, Jan 4, 2023 at 6:39 AM Steve Hanson <smh at uk.ibm.com> wrote:

> Hi Mike
>
> Happy New Year.
>
> Perhaps I am missing something, but for this format I would use a
> discriminator with testKind="pattern" and a regex that skips past the first
> 60 bytes and looks at the next 20. A pattern discriminator is our way of
> peeking ahead at the data.
>
> Regards
>
> Steve Hanson
>
> IBM Integration, Hursley, UK
> Architect, IBM DFDL
> Co-Chair, OGF DFDL Working Group
> smh at uk.ibm.com
> tel:+44-7717-378890
> Note: I work Tuesday to Friday
>
> -----Original Message-----
> *From*: Mike Beckerle <mbeckerle at apache.org
> <Mike%20Beckerle%20%3cmbeckerle at apache.org%3e>>
> *Reply-To*: mbeckerle at apache.org
> *To*: DFDL-WG <dfdl-wg at ogf.org <DFDL-WG%20%3cdfdl-wg at ogf.org%3e>>
> *Subject*: [EXTERNAL] [DFDL-WG] interesting format: RINEX
> *Date*: Tue, 20 Dec 2022 09:23:26 -0500
>
> I ran into an interesting format today called RINEX that is problematic
> for DFDL v1. 0:  http: //acc. igs. org/misc/rinex304. pdf This format has a
> multi-line header with various kinds of header lines which are 80
> characters, but characters 1 to
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
> I ran into an interesting format today called RINEX that is problematic
> for DFDL v1.0:
>
>  http://acc.igs.org/misc/rinex304.pdf
>
> This format has a multi-line header with various kinds of header lines
> which are 80 characters, but characters 1 to 60 are data, and 61 to 80 are
> the "label", and you need the label in order to know how to parse the data
> of each header line.
>
> This either requires deep (and slow) speculation, or some sort of look
> ahead feature.
>
> E.g., See page 69 of that spec document for an example. The headers appear
> not only at the start of the file, but at the start of each data block.
>
> We have run into the need for short, fixed-distance look ahead before in
> other formats.
> In the case of RINEX the lookahead is exactly 60 characters.
>
> In other formats it is also a very short distance, but one which is tricky
> to figure out as the spec doesn't say exactly how big the region to be
> looked past is. One would have to figure it out from the data format spec.
> It's always a constant of course.
>
> Mike Beckerle
> Apache Daffodil PMC | daffodil.apache.org
> OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> Owl Cyber Defense | www.owlcyberdefense.com
>
>
> --
>
>   dfdl-wg mailing list
>
>   dfdl-wg at lists.ogf.org
>
>   https://lists.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> --
>   dfdl-wg mailing list
>   dfdl-wg at lists.ogf.org
>   https://lists.ogf.org/mailman/listinfo/dfdl-wg
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 7968 bytes
Desc: not available
URL: <https://lists.ogf.org/pipermail/dfdl-wg/attachments/20230104/adca77c4/attachment.txt>


More information about the dfdl-wg mailing list