[dfdl-wg] Usage scenarios

Steve Hanson smh at uk.ibm.com
Thu Sep 29 07:42:42 CDT 2005


Attached is Robert's use case document with 2 new use cases added, a simple
generic data federation one, and a more concrete retail one.

(See attached file: dfdl-uses-discussion.doc)

Regards, Steve

Steve Hanson
WebSphere Business Integration Brokers,
IBM Hursley, England
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848


                                                                           
             "Robert E.                                                    
             McGrath"                                                      
             <mcgrath at ncsa.uiu                                          To 
             c.edu>                    dfdl-wg at gridforum.org               
             Sent by:                                                   cc 
             owner-dfdl-wg at ggf                                             
             .org                                                  Subject 
                                       Re: [dfdl-wg] Usage scenarios       
                                                                           
             28/09/2005 20:27                                              
                                                                           
                                                                           
                                                                           
                                                                           




Thanks, Martin.

Do we want to consider a new document, simply for example uses?

This could be sorted from simple to complex.

Then the primer and spec. and other docs could reference a single,
consistent set of sample uses.

On Wednesday 28 September 2005 13:18, Martin Westhead wrote:
> Hi Bob,
>
> Here is my list of ways in which DFDL could be used - I have probably
> missed some but here's enough to kick off with. If you need more details
> on any of the real-world examples let me know.
>
> Cheers,
>
> Martin
>
>
> Description
> -----------
>
> In QCD physics independent research groups from all over the world have
> data which is always a 4d array of floating point values. However,
> different groups have different standards for precision, dimension
> order, byte order. It would be useful for them to have a simple,
> canonical XML language for describing the format of a file. In the first
> instance this need only be human readable.
>
> Archiving
> ---------
>
> Data needs to be stored, but the programs and systems for reading it
> become obsolete. DFDL provides a valuable possibility of describing all
> the details of a particular format so that even if there were no
> programs able to read a format, the description (and the standard) would
> provide sufficient information to access archived data. (There are lots
> of examples of this type they might include atmospheric measurement).
>
> A sophistication of this is that archived data may need to be
> transformed as it is moved to up to data physical media (changes of
> precision) etc. It would be nice if DFDL could (a) annotate these
> changes (b) (perhaps) be used to ensure that the changes did not result
> in data loss.
>
> Format abstraction
> ------------------
>
> At the simplest level the QCD physicists (described above) would like to
> be able to have a single API that would allow them to read any described
> piece of data, and carry out all the transformations required to ensure
> that they get the correct array in memory.
>
> I have examples of potential users who are just interested in describing
> byte-order in a standard way.
>
> At the next level we would like to supply a high level DFDL description
> that captures a standard view of the data, and have generic DFDL logic
> that can transform an existing DFDL-described format into this generic
> view. This is one of the primary motivations for "layers" in the
> standard. It is a very powerful feature but it introduces scoping
> issues: What transformations can DFDL not describe? (also what
> transformations can DFDL not describe efficiently).
>
>
> Generic data access
> -------------------
>
> A DFDL library should provide the ability to interrogate a data
> description and read all aspects of the data into memory. An example of
> a generic tool is a browser that will allow arbitrary DFDL-described
> data to be displayed in some sensible human-readable form. This case
> requires the standard to specify an API for reading and interrogating
> the data. The favoured suggestion for this is to extend DOM/SAX to allow
> the reading of data fields directly into in-memory types (float, int,
> char etc.)
>
>
> Data queries
> ------------
>
> The DFDL description implies an associated XML document. This document
> can be queried using XPath/XQuery to extract pieces of data.
>
> [Note: If the data comes back as an XML-XPath result then this process
> is straight forward. With BinX we tried to return the data in a similar
> format to the one it is represented in with an accompanying description.
> We found a number of issues arose in this case that may or may not also
> arise for DFDL].
>
> Data annotations
> ----------------
>
> The same XPath/XQuery expressions that can be used to query a document
> can provide external (format independent) annotations. For example NASA
> stores photographic images of hurricanes. A scientist can identify a
> blob of pixels that correspond to the hurricane in an image. They could
> like to store this annotation is such a way that the will be preserved
> through future transformations (e.g. new image format, or different
> pixel depth, or compression level). Note the point here is that a byte
> offset into the image data cannot do this.
>
> XML without the tags
> --------------------
>
> There are groups who would like to use DFDL as a sort of cheap data
> compression technique. An example here is particle physics collision
> data. This is stored as a set of sparse (hence variable sized) trees of
> results. The data is richly structured trees and they would like to
> access it and talk about it as if it were in XML but they don't want to
> (cannot afford to) represent it using XML markup or use conventional XML
> tools to parse it.
>
> The idea is that such a group would design a new binary format that
> could be described in XML and then they would work with the implied XML
> data. Note: naturally these folks do not want to access their floating
> point values as strings so they would want the sort of DOM extensions
> that we alluded to earlier. For this same reason things like Binary XML
> do not solve their problem.
>
> Another example comes from the astronomy community has recently moved
> from a long-standing binary data format (FITS) to an XML version
> (VOTable). FITS was very rich in metadata but also included binary
> images and large tables of observational data representations. VOTable
> is great for capturing the metadata in a standard way but leads to
> excessive bloat for images and large tables. The community has ended up
> with a complicated compromise in which they allow raw binary data in at
> the bottom of the XML file. A DFDL-described format could provide a
> cleaner solution.

--
---
Robert E. McGrath, Ph.D.
National Center for Supercomputing Applications
University of Illinois, Urbana-Champaign
1205 West Clark
Urbana, Illinois 61820
(217)-333-6549

mcgrath at ncsa.uiuc.edu

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dfdl-uses-discussion.doc
Type: application/msword
Size: 585216 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20050929/c95acc37/attachment.doc 


More information about the dfdl-wg mailing list