[DFDL-WG] Call For Abstract: NIST Data Science Symposium

Steve Lawrence slawrence at tresys.com
Fri Oct 4 11:38:32 EDT 2013


I'm just letting the working group know that we are submitting an 
abstract today to give a presentation on DFDL at the NIST Data Science 
Symposium. Below is what we plan to submit.

- Steve

----------------------------------------------

Title: Stop Writing Custom Data Parsers -- Write DFDL Instead!

This talk gives an introduction to the Data Format Description
Language (DFDL), how it can be used to parse both textual and binary
data in a standardized way, and how this leads to less time spent on
custom data parser development and consequently, more time spent on
data processing and analysis. The talk will then describe the
current DFDL implementations, with focus on the open-source Daffodil
project and its design. It will conclude with a brief walkthrough of
real DFDL examples, including commercial and scientific formats, and
a demonstration of the parsing capabilities of Daffodil.

The DFDL specification, which has completed a second round of public
comments as part of the Open Grid Forum (OGF), is a modeling
language for describing general text and binary data using a subset
of XML Schema augmented with data format annotations. DFDL allows
data to be read from its native format and presented as an instance
of an information set or an XML document. DFDL also allows the
reverse, through conversion of an information set back to its native
format. By using the information set, this cleanly integrates with
common XML utilities (e.g. XProc, XSLT, XQuery) for data processing
and analysis regardless of the format of the native data.

Two implementations of DFDL exist, as is required by the OGF to
become a standard. The first, created by IBM as part of IBM
WebSphere V8, is written in both Java and C and includes graphical
tools for modeling, running, and debugging DFDL schemas. The second
implementation, Daffodil, is an open-source project written in
Scala, with a design focused on speed and correctness. With the two
implementations making great strides, and the DFDL specification
nearing standardization, DFDL is becoming a promising tool that will
ease data parsing, processing, and analysis.


Biography:

Stephen Lawrence has worked as a software engineer at Tresys
Technology since 2007, while contributing to the open-source
Daffodil project as a core maintainer for almost two years. He works
alongside Michael Beckerle, the co-chair of the DFDL Working Group,
to develop Daffodil and improve the DFDL specification. Outside of
Daffodil, he focuses on computer security applications, including
file inspection and sanitization, Security Enhanced Linux (SELinux),
and cross domain solutions.


More information about the dfdl-wg mailing list