Arrays issue - Re: [dfdl-wg] Issues: additional data types

Mike Beckerle beckerle at us.ibm.com
Tue Sep 6 11:57:00 CDT 2005


re: Space - a space penalty only occurs if your DFDL implementation 
actually converts the data into XML. My personal plans for DFDL would do 
none of that. You would incur zero space penalty. I want to reemphasize 
here, that the "index attributes" x and y in my example, would take up 
exactly zero space. They have no representation. Their values are inferred 
by the positon of the elements of the array. 

re: algorithms - DFDL doesn't address APIs for access to data at all. 
There's nothing stopping someone from making array access appear in a 
programming language exactly the way it appears in C, Fortran, or Java or 
any other language today. E.g.,

      Array a = ...getArrayFromDFDL(".../a"); // establish correspondence 
between Java array 'a', and DFDL-described array reachable via path 
'..../a'. 
     int value = a(5, -2); // retrieve the element at these index 
locations

If you really want to express transformations "in this markup", i.e., as 
if the data had been converted to XML, then I'm unclear why XPath/XQuery 
would make the algorithms particularly ugly. Use of Xpath/Xquery to 
address elements would be very similar to basic index-oriented access in a 
programming language.

...mike

Mike Beckerle
Architect, Scalable Computing
IBM Software Group
Information Integration Solutions
Westborough, MA



"Robert E. McGrath" <mcgrath at ncsa.uiuc.edu> 
Sent by: owner-dfdl-wg at ggf.org
09/06/2005 12:31 PM

To
Mike Beckerle/Worcester/IBM at IBMUS
cc
dfdl-wg at gridforum.org
Subject
Re: Arrays issue - Re: [dfdl-wg] Issues: additional data types






Yes, this is one way to do arrays.

This approach emphasizes the use case where it is important to
access individual elements via XML.

There are two obvious down sides:

   1. space:  this will be >10 times the storage of the actual numbers.
      A big problem for many cases.
   2. array algorithms (e.g., scatter-gather, transpose) do
      block operations which are totally ugly in this markup.

A variant of this might mark up parts of the array, e.g., each row.


Two other general approaches can be considered:

Array as blob:  markup says 'this is an array, laid out like so',
data is a big blob. (Probably this is what Jim is talking about)

Array as external blob:  same as above, except payload is a URL,
e.g., to OpenDAP server where the data is. (Ideal for "virtual datasets")


The memo I was working on tries to lay these options out with the
advantages and disadvantages.

---
Robert E. McGrath
National Center for Supercomputing Applications
University of Illinois, Urbana-Champaign
Champaign, Illinois 61820
(217)-333-6549

mcgrath at ncsa.uiuc.edu


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20050906/3d907b15/attachment.html 


More information about the dfdl-wg mailing list