[DFDL-WG] Expressions too restrictive? Example from IPFIX format

Mike Beckerle mbeckerle.dfdl at gmail.com
Mon Nov 12 15:39:53 EST 2012


I'm looking at RFC 5101 and RFC 5102

These describe a dense binary file format for observing network flows. The
application is related to network security.

This format has a 'meta' structure to it that I'm not sure how to deal with
in DFDL currently.

Here's the problem as succinctly as I can make it.

One configures some network information capture tools to capture some
information. This is flexible, so what is captured can be quite variable.

The resulting data stream contains first sets of templates, then sets of
data records, then more templates, then more records, and so on.

The templates each are identified by an ID number which is an integer from
256 to 65534. A template then contains a count of how many fields are in
the template, and then a field descriptor for each field, which includes
the length of the field in octets.

Each data record set begins with the ID of its template, then a total
length in octets for the set, then the records, which are just field,
field, field, each as described by the associated template with no record
separators or anything.

Now, I can model a template in DFDL as an array of field descriptors.

I can also model a data record set as a templateID and an array of data
records, and each data record as an array of fields, where the number of
field occurrences is given in the template, and each field is a byte array
of occurs count given in the corresponding template's field descriptor for
that occursIndex.

The one thing I can't figure out how to do is to create an XPath to the
right template given the template ID in the data record set header.

The problem, is that the TemplateID is not an array index. The templates
might have arbitrary ID numbers. They might not be say, 256, 257, etc. in
order. They could be scattered, etc. The standard only requires that they
are unique IDs. So the set of templates is truly a set with these
identifiers.

So I need a way to write an XPath that would choose the template from the
set, whose ID matches a particular integer value from the data record set's
header.

Right now I think our XPath subset doesn't allow this. We can only index
arrays with integers, and we have no searching capability that processes a
set of nodes to identify a node having any particular value or
characteristic.

So: are we being too strict here. Should we allow somewhat more complex
predicates? such as { ...../template[idField eq $templateID]  }

(For speed reasons, I might not do that lookup each time I parse a data
record. I might pre-fill an array of 65535 structures by way of
inputValueCalc so that I can actually use the template IDs as indexes into
that array. But either way I need the select from matching set capability.)

In general, templates can be interleaved in IPFix in that the requirement
is just that they are transmitted before data record sets that reference
them. So in general, an application that reads IPFix data cannot say, first
scrape off all the templates and generate a schema for them, and then use
that schema to parse. The late availability of the templates is something
that is inherent in the format.

...mikeb

-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121112/331e4246/attachment.html>


More information about the dfdl-wg mailing list