[DFDL-WG] DFDL: Minutes from OGF WG call, 21 Nov 2007
Ian W Parkinson
PARKIW at uk.ibm.com
Thu Nov 22 09:13:50 CST 2007
Open Grid Forum: Data Format Description Language Working Group
Weekly Working Group Conference Call
17:00 GMT, 21 Nov 2007
Attendees
Mike Beckerle (IBM)
Geoff Judd (IBM)
Steve Hanson (IBM)
Suman Kalia (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
1. Introduction
Mike would like, in this meeting, to cover the hexBinary and base64Binary
debate, and also to discuss the use of 'any' wildcards with minOccurs and
maxOccurs.
2. 'Any' wildcards with minOccurs and maxOccurs
The DFDL specification presently disallows the use of minOccurs and
maxOccurs on 'any' wildcards, in contrast to XML schema. Simon stated that
he sees no reason to forbid this, and that these properties might be
useful, for example when structures could be followed by arbitrary
extensions.
Suman felt that the most common use case would be minOccurs="0" and
maxOccurs="1". Mike wondered whether this would be useful within unordered
groups, and thought it would be a good way to model an 'all' group
containing some known fields and a number of unknown fields.
Mike took an action item to investigate this further.
3. hexBinary and base64Binary
The working group has been discussing, via email, the use of hexBinary and
base64Binary types, along with 'enumeration' and 'pattern' properties.
Mike asked the group whether we should disallow the 'enumeration' and
'pattern' properties on binary types (as they are difficult to use, in
particular with base64Binary), and whether we should remove support for
base64Binary (as it shares a value space with hexBinary, and is therefore
a synonym of hexBinary in DFDL).
Simon felt that this distinction remained a useful hint to any component
emitting XML based on a DFDL infoset. He also observed that base64Binary
is more commonly used than hexBinary, and is preferred. Mike argued that
while base 64 might be preferred to hexadecimal in terms of space,
hexadecimal is more readable. Steve observed that hexBinary is commonly
used.
While Mike felt that supporting fixed and default binary values might lead
to requiring support for patterns and enumeration, the meeting agreed that
there are use cases for default and fixed values - for example, some file
formats use "eyecatchers" which are best expressed in hexadecimal. Mike
pointed out that this could be achieved usng a string type, but suggested
allowing 'default' and 'fixed' for hexBinary. Simon suggested that
something similar would be necesssary for base64Binary, as some values
(such as passwords and identifiers) are frequently expressed in base 64.
Sandy Gao (IBM) has been asked to comment on whether there are any use
cases where patterns are used with hexBinary of base64Binary.
To conclude this discussion, Mike proposed the following: to retain
support for both base64Binary and hexBinary, with identical content in
DFDL; to allow both 'fixed' and 'default' for both base64Binary and
hexBinary, but (pending further information from Sandy) to disallow
'pattern' and 'enum'.
[Simon and Suman left the meeting]
4. Array Prefixes and Suffixes
Mike asked whether the group was happy with the omission of array prefixes
and suffixes. We know how to add these back should we ever need to, and
there is a concern that including them would lead to many more array
properties being necessary. Steve was happy with the present proposal.
5. Choice type and Length properties on Choice
Mike observed that there is a need to distinguish between choice groups
which are of constant length, and choice groups where the length is
determined by the relevent subelement. Where the choice is unresolvable,
it is not possible to have a choice of variable length.
Steve felt that as we are able to make assertions, there would be very few
cases where a choice is unresolvable. Mike pointed out that in an
unresolvable choice group, each arm would need consistant enough syntax
for a parser to be able to determine the end; and that this could be
modelled as arms with enough information to discriminate. Geoff suggested
that experience with IBM's MRM technology shows this to be unusual.
The meeting considered two options. We could specify two properties, one
to select between constant length and variable length; and one to select
between resolvable and unresolvable. In this option, the combination
variable-length/unresolvable would be disallowed. In the second option, we
would have a single property with three possible values: constant length,
variable length or unresolvable. The meeting agreed upon the second
option.
When experimenting with the DFDL language recently, Steve found specifying
length on structures to be awkward. He proposed removing 'lengthKind' on
choice elements, and Mike added that we would also wish to remove other
associated properties such as 'intitator' and 'terminator'. On reflection,
the meeting decided to keep these properties, noting that using these on a
choice element is identical to wrapping the choice element in a sequence
element with the same properties.
6. Other business
There has, internally within IBM, been a discussion regarding length
prefixes on strings. Mike will circulate a proposal to the working group,
to allow prefix formats to be described through annotations on simpleType
definitions.
Meeting closed, 17:55 GMT
Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20071122/bef53562/attachment-0001.html
More information about the dfdl-wg
mailing list