[DFDL-WG] Expressions too restrictive? Example from IPFIX format
Steve Hanson
smh at uk.ibm.com
Tue Nov 13 04:55:46 EST 2012
Mike
I think more complex predicates is something for the next release of DFDL.
For this particular format, you at least have the length of the record set
so you can parse the record fields as xs:hexBinary and then apply a schema
generated from the template in a post-parse step.
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: dfdl-wg at ogf.org,
Date: 12/11/2012 21:07
Subject: [DFDL-WG] Expressions too restrictive? Example from IPFIX
format
Sent by: dfdl-wg-bounces at ogf.org
I'm looking at RFC 5101 and RFC 5102
These describe a dense binary file format for observing network flows. The
application is related to network security.
This format has a 'meta' structure to it that I'm not sure how to deal
with in DFDL currently.
Here's the problem as succinctly as I can make it.
One configures some network information capture tools to capture some
information. This is flexible, so what is captured can be quite variable.
The resulting data stream contains first sets of templates, then sets of
data records, then more templates, then more records, and so on.
The templates each are identified by an ID number which is an integer from
256 to 65534. A template then contains a count of how many fields are in
the template, and then a field descriptor for each field, which includes
the length of the field in octets.
Each data record set begins with the ID of its template, then a total
length in octets for the set, then the records, which are just field,
field, field, each as described by the associated template with no record
separators or anything.
Now, I can model a template in DFDL as an array of field descriptors.
I can also model a data record set as a templateID and an array of data
records, and each data record as an array of fields, where the number of
field occurrences is given in the template, and each field is a byte array
of occurs count given in the corresponding template's field descriptor for
that occursIndex.
The one thing I can't figure out how to do is to create an XPath to the
right template given the template ID in the data record set header.
The problem, is that the TemplateID is not an array index. The templates
might have arbitrary ID numbers. They might not be say, 256, 257, etc. in
order. They could be scattered, etc. The standard only requires that they
are unique IDs. So the set of templates is truly a set with these
identifiers.
So I need a way to write an XPath that would choose the template from the
set, whose ID matches a particular integer value from the data record
set's header.
Right now I think our XPath subset doesn't allow this. We can only index
arrays with integers, and we have no searching capability that processes a
set of nodes to identify a node having any particular value or
characteristic.
So: are we being too strict here. Should we allow somewhat more complex
predicates? such as { ...../template[idField eq $templateID] }
(For speed reasons, I might not do that lookup each time I parse a data
record. I might pre-fill an array of 65535 structures by way of
inputValueCalc so that I can actually use the template IDs as indexes into
that array. But either way I need the select from matching set
capability.)
In general, templates can be interleaved in IPFix in that the requirement
is just that they are transmitted before data record sets that reference
them. So in general, an application that reads IPFix data cannot say,
first scrape off all the templates and generate a schema for them, and
then use that schema to parse. The late availability of the templates is
something that is inherent in the format.
...mikeb
--
Mike Beckerle | OGF DFDL WG Co-Chair
Tel: 781-330-0412
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121113/3beee02a/attachment.html>
More information about the dfdl-wg
mailing list