Arrays issue - Re: [dfdl-wg] Issues: additional data types
Mike Beckerle
beckerle at us.ibm.com
Tue Sep 6 11:57:00 CDT 2005
re: Space - a space penalty only occurs if your DFDL implementation
actually converts the data into XML. My personal plans for DFDL would do
none of that. You would incur zero space penalty. I want to reemphasize
here, that the "index attributes" x and y in my example, would take up
exactly zero space. They have no representation. Their values are inferred
by the positon of the elements of the array.
re: algorithms - DFDL doesn't address APIs for access to data at all.
There's nothing stopping someone from making array access appear in a
programming language exactly the way it appears in C, Fortran, or Java or
any other language today. E.g.,
Array a = ...getArrayFromDFDL(".../a"); // establish correspondence
between Java array 'a', and DFDL-described array reachable via path
'..../a'.
int value = a(5, -2); // retrieve the element at these index
locations
If you really want to express transformations "in this markup", i.e., as
if the data had been converted to XML, then I'm unclear why XPath/XQuery
would make the algorithms particularly ugly. Use of Xpath/Xquery to
address elements would be very similar to basic index-oriented access in a
programming language.
...mike
Mike Beckerle
Architect, Scalable Computing
IBM Software Group
Information Integration Solutions
Westborough, MA
"Robert E. McGrath" <mcgrath at ncsa.uiuc.edu>
Sent by: owner-dfdl-wg at ggf.org
09/06/2005 12:31 PM
To
Mike Beckerle/Worcester/IBM at IBMUS
cc
dfdl-wg at gridforum.org
Subject
Re: Arrays issue - Re: [dfdl-wg] Issues: additional data types
Yes, this is one way to do arrays.
This approach emphasizes the use case where it is important to
access individual elements via XML.
There are two obvious down sides:
1. space: this will be >10 times the storage of the actual numbers.
A big problem for many cases.
2. array algorithms (e.g., scatter-gather, transpose) do
block operations which are totally ugly in this markup.
A variant of this might mark up parts of the array, e.g., each row.
Two other general approaches can be considered:
Array as blob: markup says 'this is an array, laid out like so',
data is a big blob. (Probably this is what Jim is talking about)
Array as external blob: same as above, except payload is a URL,
e.g., to OpenDAP server where the data is. (Ideal for "virtual datasets")
The memo I was working on tries to lay these options out with the
advantages and disadvantages.
---
Robert E. McGrath
National Center for Supercomputing Applications
University of Illinois, Urbana-Champaign
Champaign, Illinois 61820
(217)-333-6549
mcgrath at ncsa.uiuc.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20050906/3d907b15/attachment.html
More information about the dfdl-wg
mailing list