[dfdl-wg] Fw: TAG opinion on XML Binary Format

Reagan Moore moore at sdsc.edu
Wed May 25 15:11:00 CDT 2005


A binary XML format is needed for scientific data.  We are assembling 
collections that aggregate 10-100 Terabytes in size.  We plan to rely 
on an XML binary format description to automate data handling.  We 
have no plans to move 100 Terabytes of data through web services.

Reagan Moore





>Mike Beckerle
>Architect, Scalable Computing
>IBM Software Group
>Information Integration Solutions
>Westborough, MA
>----- Forwarded by Mike Beckerle/Worcester/IBM on 05/24/2005 01:44 PM -----
>ed.rice at hp.com
>
>05/24/2005 01:26 PM
>To
>www-tag at w3.org, public-xml-binary at w3.org
>cc
>Subject
>TAG opinion on XML Binary Format
>
>
>
>
>
>
>TAG opinion on XML Binary Format
>
>The TAG has reviewed in detail the documents [1,2,3,4] prepared by the XBC
>workgroup [5].  While we very much appreciate the significant progress that
>these notes represent, the TAG believes that more detailed analysis is
>needed before a W3C Binary XML Recommendation is sufficiently justified.  We
>are taking no position at this time as to whether Binary XML will prove to
>be warranted, as there seem to be good arguments on both sides of that
>question.  Rather, we are suggesting that further careful analysis is needed
>before the W3C commits to a direction.
>
>The TAG believes there are disadvantages as well as potential advantages
>that will result from even a well crafted Binary XML Recommendation.  The
>advantages are clear: a successful binary format is likely to provide speed
>gains or size reductions, at least for certain use cases.  The drawbacks are
>likely to include reduced interoperability with XML 1.0 and XML 1.1
>software, and an inability to leverage the benefits of text-based formats.
>These are important concerns.  Quoting from the Web Architecture
>document[6]:
>
>   "The trade-offs between binary and textual data
>   formats are complex and application-
>   dependent. Binary formats can be substantially
>   more compact, particularly for complex
>   pointer-rich data structures. Also, they can be
>   consumed more rapidly by agents in those cases
>   where they can be loaded into memory and used
>   with little or no conversion. Note, however,
>   that such cases are relatively uncommon as such
>   direct use may open the door to security issues
>   that can only practically be addressed by
>   examining every aspect of the data structure in
>   detail.
>
>   "Textual formats are usually more portable and
>   interoperable. Textual formats also have the
>   considerable advantage that they can be
>   directly read by human beings (and understood,
>   given sufficient documentation). This can
>   simplify the tasks of creating and maintaining
>   software, and allow the direct intervention of
>   humans in the processing chain without recourse
>   to tools more complex than the ubiquitous text
>   editor. Finally, it simplifies the necessary
>   human task of learning about new data formats;
>   this is called the "view source" effect."
>
>We therefore believe that the benefits of a binary XML must be predictable
>and compelling in order to justify development of a Recommendation.
>
>In particular, we suggest that a quantitative analysis is necessary.  For at
>least a few key use cases, concrete targets should be set for the size
>and/or speed gains that would be needed to justify the disruption introduced
>by a new format.  For example, a target might be that "in typical web
>services scenarios, median speed gains on the order of 3x in combined
>parsing and deserialization are deemed sufficient to justify a new format."
>We further suggest that representative binary technologies be benchmarked
>and analyzed to a sufficient degree that such speed or size improvements can
>be reasonably reliably predicted before we commit to a Recommendation.  No
>doubt, any given set of goals or benchmarks will suffer from some degree of
>imprecision, but if the gains are sufficiently compelling to justify a new
>format, then they should be relatively easy to demonstrate.  In short,
>actual measurements should be a prerequisite to preparing a Recommendation.
>
>In doing such measurements, we believe it is essential that comparisons be
>done to the best possible text-based XML 1.x implementations, which are not
>necessarily those that are most widely deployed.  Stated differently:
>if XML 1.x is inherently capable of meeting the needs of users, then our
>efforts should go into tuning our XML implementations, not designing new
>formats.  Benchmark environments should be as representative as possible of
>fully optimized implementations, not just of the XML parser, but of the
>surrounding application or middleware stack.  We note that different
>application-level optimizations may be necessary to maximize the performance
>of the Binary or text cases respectively.  Care should especially be taken
>to ensure that the performance of particular APIs such as DOM or SAX does
>not obscure the performance possible with either option (e.g. both SAX and
>DOM can easily result in high overhead string conversions when UTF-8 is
>used.)
>
>The TAG would also appreciate clarification as to how many formats are
>likely to be included in a Recommendation; it's not clear whether the
>proposal is for one binary xml format for all cases, or if multiple formats
>are to be endorsed.  The use of multiple formats is likely to further reduce
>interoperability.
>
>We feel that introduction of a binary format would be an important
>development for those who might benefit from its size or speed, but also for
>those who might be impacted by its impact on interoperability and
>perspicuity.  Therefore, in order to justify a potential new format, the TAG
>would like to see the above issues addressed.  As stated above, we make no
>prediction as to whether such an analysis will ultimately confirm the need
>for Binary XML;  if it does, we will be glad to support development of a
>Recommendation at the W3C.
>
>
>[1]  http://www.w3.org/TR/xbc-use-cases/
><http://www.w3.org/TR/xbc-use-cases/>
>[2]  http://www.w3.org/TR/xbc-properties/
><http://www.w3.org/TR/xbc-properties/>
>[3]  http://www.w3.org/TR/xbc-measurement/
><http://www.w3.org/TR/xbc-measurement/>
>[4]  http://www.w3.org/TR/xbc-characterization/
><http://www.w3.org/TR/xbc-characterization/>
>[5]  http://www.w3.org/XML/Binary/ <http://www.w3.org/XML/Binary/>
>[6]  http://www.w3.org/TR/webarch/#binary
><http://www.w3.org/TR/webarch/#binary>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20050525/f19fd78b/attachment.htm 


More information about the dfdl-wg mailing list