[DFDL-WG] Call For Abstract: NIST Data Science Symposium

Steve Hanson smh at uk.ibm.com
Fri Oct 4 11:47:57 EDT 2013


Steve - a correction below for the IBM implementation

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Steve Lawrence <slawrence at tresys.com>
To:     DFDL-WG <dfdl-wg at ogf.org>, 
Date:   04/10/2013 16:40
Subject:        [DFDL-WG] Call For Abstract: NIST Data Science Symposium
Sent by:        dfdl-wg-bounces at ogf.org



I'm just letting the working group know that we are submitting an 
abstract today to give a presentation on DFDL at the NIST Data Science 
Symposium. Below is what we plan to submit.

- Steve

----------------------------------------------

Title: Stop Writing Custom Data Parsers -- Write DFDL Instead!

This talk gives an introduction to the Data Format Description
Language (DFDL), how it can be used to parse both textual and binary
data in a standardized way, and how this leads to less time spent on
custom data parser development and consequently, more time spent on
data processing and analysis. The talk will then describe the
current DFDL implementations, with focus on the open-source Daffodil
project and its design. It will conclude with a brief walkthrough of
real DFDL examples, including commercial and scientific formats, and
a demonstration of the parsing capabilities of Daffodil.

The DFDL specification, which has completed a second round of public
comments as part of the Open Grid Forum (OGF), is a modeling
language for describing general text and binary data using a subset
of XML Schema augmented with data format annotations. DFDL allows
data to be read from its native format and presented as an instance
of an information set or an XML document. DFDL also allows the
reverse, through conversion of an information set back to its native
format. By using the information set, this cleanly integrates with
common XML utilities (e.g. XProc, XSLT, XQuery) for data processing
and analysis regardless of the format of the native data.

Two implementations of DFDL exist, as is required by the OGF to
become a standard. The first, created by IBM and already shipped in 
several IBM
products (such as IBM Integration Bus v9), is written in both Java and C 
and includes graphical tools for modeling, running, and debugging DFDL 
schemas. 
The second implementation, Daffodil, is an open-source project written in
Scala, with a design focused on speed and correctness. With the two
implementations making great strides, and the DFDL specification
nearing standardization, DFDL is becoming a promising tool that will
ease data parsing, processing, and analysis.


Biography:

Stephen Lawrence has worked as a software engineer at Tresys
Technology since 2007, while contributing to the open-source
Daffodil project as a core maintainer for almost two years. He works
alongside Michael Beckerle, the co-chair of the DFDL Working Group,
to develop Daffodil and improve the DFDL specification. Outside of
Daffodil, he focuses on computer security applications, including
file inspection and sanitization, Security Enhanced Linux (SELinux),
and cross domain solutions.
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20131004/3cadfcfe/attachment.html>


More information about the dfdl-wg mailing list