[DFDL-WG] DFDL: Minutes from OGF WG call, 21 Nov 2007

Ian W Parkinson PARKIW at uk.ibm.com
Thu Nov 22 09:13:50 CST 2007


Open Grid Forum: Data Format Description Language Working Group

Weekly Working Group Conference Call
17:00 GMT, 21 Nov 2007

Attendees
Mike Beckerle (IBM)
Geoff Judd (IBM)
Steve Hanson (IBM)
Suman Kalia (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)


1. Introduction
Mike would like, in this meeting, to cover the hexBinary and base64Binary 
debate, and also to discuss the use of 'any' wildcards with minOccurs and 
maxOccurs.

2. 'Any' wildcards with minOccurs and maxOccurs
The DFDL specification presently disallows the use of minOccurs and 
maxOccurs on 'any' wildcards, in contrast to XML schema. Simon stated that 
he sees no reason to forbid this, and that these properties might be 
useful, for example when structures could be followed by arbitrary 
extensions.

Suman felt that the most common use case would be minOccurs="0" and 
maxOccurs="1". Mike wondered whether this would be useful within unordered 
groups, and thought it would be a good way to model an 'all' group 
containing some known fields and a number of unknown fields.

Mike took an action item to investigate this further.

3. hexBinary and base64Binary
The working group has been discussing, via email, the use of hexBinary and 
base64Binary types, along with 'enumeration' and 'pattern' properties. 
Mike asked the group whether we should disallow the 'enumeration' and 
'pattern' properties on binary types (as they are difficult to use, in 
particular with base64Binary), and whether we should remove support for 
base64Binary (as it shares a value space with hexBinary, and is therefore 
a synonym of hexBinary in DFDL).

Simon felt that this distinction remained a useful hint to any component 
emitting XML based on a DFDL infoset. He also observed that base64Binary 
is more commonly used than hexBinary, and is preferred. Mike argued that 
while base 64 might be preferred to hexadecimal in terms of space, 
hexadecimal is more readable. Steve observed that hexBinary is commonly 
used.

While Mike felt that supporting fixed and default binary values might lead 
to requiring support for patterns and enumeration, the meeting agreed that 
there are use cases for default and fixed values - for example, some file 
formats use "eyecatchers" which are best expressed in hexadecimal. Mike 
pointed out that this could be achieved usng a string type, but suggested 
allowing 'default' and 'fixed' for hexBinary. Simon suggested that 
something similar would be necesssary for base64Binary, as some values 
(such as passwords and identifiers) are frequently expressed in base 64. 

Sandy Gao (IBM) has been asked to comment on whether there are any use 
cases where patterns are used with hexBinary of base64Binary.

To conclude this discussion, Mike proposed the following: to retain 
support for both base64Binary and hexBinary, with identical content in 
DFDL; to allow both 'fixed' and 'default' for both base64Binary and 
hexBinary, but (pending further information from Sandy) to disallow 
'pattern' and 'enum'.

[Simon and Suman left the meeting]

4. Array Prefixes and Suffixes
Mike asked whether the group was happy with the omission of array prefixes 
and suffixes. We know how to add these back should we ever need to, and 
there is a concern that including them would lead to many more array 
properties being necessary. Steve was happy with the present proposal.

5. Choice type and Length properties on Choice
Mike observed that there is a need to distinguish between choice groups 
which are of constant length, and choice groups where the length is 
determined by the relevent subelement. Where the choice is unresolvable, 
it is not possible to have a choice of variable length.

Steve felt that as we are able to make assertions, there would be very few 
cases where a choice is unresolvable. Mike pointed out that in an 
unresolvable choice group, each arm would need consistant enough syntax 
for a parser to be able to determine the end; and that this could be 
modelled as arms with enough information to discriminate. Geoff suggested 
that experience with IBM's MRM technology shows this to be unusual.

The meeting considered two options. We could specify two properties, one 
to select between constant length and variable length; and one to select 
between resolvable and unresolvable. In this option, the combination 
variable-length/unresolvable would be disallowed. In the second option, we 
would have a single property with three possible values: constant length, 
variable length or unresolvable. The meeting agreed upon the second 
option.

When experimenting with the DFDL language recently, Steve found specifying 
length on structures to be awkward. He proposed removing 'lengthKind' on 
choice elements, and Mike added that we would also wish to remove other 
associated properties such as 'intitator' and 'terminator'. On reflection, 
the meeting decided to keep these properties, noting that using these on a 
choice element is identical to wrapping the choice element in a sequence 
element with the same properties.

6. Other business
There has, internally within IBM, been a discussion regarding length 
prefixes on strings. Mike will circulate a proposal to the working group, 
to allow prefix formats to be described through annotations on simpleType 
definitions.

Meeting closed, 17:55 GMT


Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20071122/bef53562/attachment-0001.html 


More information about the dfdl-wg mailing list