[DFDL-WG] about recursive data structures

Mike Beckerle mbeckerle.dfdl at gmail.com
Thu Jul 11 21:35:37 EDT 2013


Thanks for your note Sampo,

I am pretty sure DFDL cannot help you today. DFDL v1.0 has not grown the
ability to handle recursive structures.

You asked why:

DFDL v1.0 is a standard formed by taking existing industry data-handling
tools, and finding the union of their functionality, and standardizing
that.

Topics that advance the state-of-the-art beyond that of any existing
commercial data-handling software have been postponed to what we're
informally calling DFDL v2.0.

We have found that research advancing the state of the art doesn't work in
the context of a standards process. The two are a terrible mismatch, as
"creative process" and "committee compromise" don't go together smoothly
usually.

My rejoinder to anyone asking "can DFDL do X?" has been "Are there any
products in the marketplace that do that, so that we can derive a standard
from what they have done?" I am not aware of any commercial software
anywhere that creates a declarative description of rich pointer-based graph
data, and writes it to file or parses it from file.  To my knowledge, such
formats are always written by programs, i.e., software, not from
declarative descriptions. Perhaps companies/people have created such
things, but have them only for internal use at their project/company. I
know of none that are published. I would love to hear otherwise.

All that said,...we do realize there is a large community of people hoping
to take the central DFDL "idea" which is declarative description of data
formats, and apply it to new and richer problems like the graph problem you
have described. So far the big features people want are:

* recursion - needed to declaratively describe binary document formats and
"container" file formats that have arbitrarily deep nesting.
* layering - also known as multi-pass. Needed because many formats are
conceptually layered.
* transformation - some kinds of transforms want to go right on the data
format schema because they don't change the 'shape' of the data.

I would add your graphs problem to this list, as it adds additional
complexity beyond recursion due to node sharing and cycles.

Since the Daffodil implementation of DFDL is open-source, we're hoping to
use that as a research/investigation vehicle to try out approaches to many
of these features that advance the state of the art. Once we have created
such a feature and we believe it works and is useful, we can have it in
Daffodil as a way to provide some de-facto energy behind it, and propose it
for DFDL v2.0 standardization. That's the idea anyway.

This is not years out as the existing sponsors of my work on Daffodil at
Tresys are interested in these DFDL v2.0 extensions as well, and they have
need for them in near-term products. But the priority has been on finishing
DFDL v1.0, the specification, and the implementations. Of course as
Daffodil solidifies, it's open source and anyone can grab it and start
running on these research topics.

I just recently created a wiki page within the Daffodil open-source project
to serve as a parking lot for DFDL v2.0 wish-list. The page is at:
https://opensource.ncsa.illinois.edu/confluence/display/DFDL/DFDL+2.0+Wishlist.
I have just added a section at the end about pointer-linked graph
structures.

Best regards and I hope you will keep "watching this space".

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com


On Thu, Jul 11, 2013 at 8:37 PM, Sampo Syreeni <decoy at iki.fi> wrote:

> The DFDL spec has been growing quite a bit over the past two years or so.
> Mostly because of of the handling of arcane details. So...
>
> Has it also grown in the wider, descriptive (complexity) sense? Can it now
> also describe e.g. the arbitrary link structure a typical *NIX file system
> image can contain, with its hard links?
>
> If so, I might just now have an application for that. If not, why not? It
> isn't as though you can't develop clean semantics for that, if only as an
> option. And it's clearly warranted because formats utilizing such
> constructs constitute a sizable proportion of data both store and actively
> passed around.
> --
> Sampo Syreeni, aka decoy - decoy at iki.fi, http://decoy.iki.fi/front
> +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/**listinfo/dfdl-wg<https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130711/a627ace1/attachment.html>


More information about the dfdl-wg mailing list