[DFDL-WG] DFDL: Minutes from OGF WG call, 30 Jan 2007

Fri Feb 1 08:19:49 CST 2008

All,

Here are the minutes from Wednesday's meeting - apologies for the delay. 
As ever please let me know of any corrections.

Cheers,

Ian

Open Grid Forum: Data Format Description Language Working Group

Weekly Working Group Conference Call
17:00 GMT, 30 Jan 2008

Attendees
Mike Beckerle (Oco)
Steve Hanson (IBM)
Geoff Judd (IBM)
Suman Kalia (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
Alan Powell (IBM)

1. Specification drafts
Alan has not yet received any updates for the next draft of the 
specification, which is due soon. As Alan is on vacation from the end of 
next week, he is looking for input as soon as possible. The group recorded 
the following status for items targetting for the next specification 
draft:

Nulls/default/optionals - Mike and Steve will collaborate on this and hope 
to have a draft ready for the end of next week.
Description of schema components - Simon expects to have this ready by the 
end of the week..
Regular expressions for lengths - This is now targeted for the following 
specification draft ("vX+2")
Expression language - Alan has distributed a new draft for review and has 
asked for comments. Mike will review this next week.
valueCalc - Mike will write up at least a first draft, and aims to have 
this ready for next week.
Property precedence - Steve is looking for review comments on the 
previously distributed mindmap. The specification can include this 
information simply as a list, not as a "mindmap" tree. It may be useful to 
combine this information with the schema components diagram. The present 
proposal only handles parsing; Steve will extend this to cover unparsing.
Entities - The group has been having good discussions regarding the 
entities proposal.
White space handling - Whitespace seems to be largely handled by the 
entities proposal. Steve has been thinking about a way to introduce 
variable terminators to DFDL, which is discussed later in the meeting.

2. Expression Language
Alan distributed a new draft of the expression language proposal, and 
thanks Simon for his comments. As noted above, Mike will send review 
comments next week. The second part of the proposal document is text from 
the current specification draft, which will be replaced by the new 
proposal. This is included for "history" and is not intended to be 
reviewed.

3. Whitespace
The MIME header format allows for optional whitespace either side of a 
colon used to delimit the header name and its value. Steve felt that this 
justfied the need for the proposed %OWSP; entity (along with the %WSP; 
entity). Mike wondered whether this lead to a slippery slope where we try 
to handle other complex delimiters similarly, for example, 
case-insensitive delimiters. Steve would like to propose a more general 
way to handle complex delimiters in DFDL. Mike would have no objection to 
having both %WSP; and %OWSP; in the language if these could be described 
in terms of Steve's more general approach.

In PolarLake, the MIME header use case would be handled using a name field 
(terminated by a colon), with a optional field between the colon and the 
value which would consume any whitespace.

4. Recursive definitions for delimeters
Steve has proposed describing initiators in terms of named types. For 
example, an intiator could be defined using a simple enumeration type to 
list its possible values, or one using a pattern facet or assertions.

This would extend DFDL's current use of such facets, which are presently 
used only for validation. Mike distinguished between 'format' and 
'content' data, and suggested that this conceptually means that we can 
interpret facets during parsing for format data, but only at validation 
for content. Simon suggested a similar concept of "system data" vs. "user 
data". We would need to revisit speculative parsing to deal with this 
issue.

An alternative, suggested by Simon, would be to relax the present 
restriction which allows an expression to only return one value. If an 
expression could return a sequence of values, as is allowed by XPath, then 
we could use expressions to describe delimiters with multiple possible 
values. On output, a DFDL unparser would write out the first value in the 
sequence. Mike observed that such a scheme would solve the 'quoting hell' 
problem present with simple, space-delimited, XML lists as presently 
allowed for the nullValues property.

If we adopt Steve's approach, we may not be able to access DFDL constructs 
(variables, expressions or entities) in facets. Steve pointed out that XSD 
processors would typically reject an enumerated type, restricted to length 
1, where the enumeration includes a DFDL entity - the XSD processor will 
not recognise the DFDL entity and instead treat this as a string of length 
greater than 1.

Alan asked whether there are any dynamic cases, where the set of 
terminators is obtained from the document itself. Mike felt that this 
could be modelled using assertions, though this would leave terminator on 
output undefined; and that a solution where complex types may be used as 
terminators would probably allow us to handle most of these cases. Steve 
mentioned the EDI format, which allows a document-specified delimiter to 
be used with optional whitespace.

How, asked Simon, should we handle the output value of a complex 
delimiter? This must come either from the DFDL schema or from the infoset. 
Steve suggested we use default properties in the subelements of the type, 
and Mike suggested we could similarly use outputValueCalc. Steve and Mike 
agreed that a terminator wouldn't be present in the infoset, by analogy to 
the similar mechanism used for length prefixes. Further, this mechanism 
might allow us to remove a number of properties related to terminators.

Although Steve had intended this mechanism to be used with simple types or 
elements, Mike and Suman thought it would be appropriate to allow complex 
types. Elements would allow the use of the 'default' attribute for use on 
output. Mike contrasted this to the prefix length solution, where a simple 
type is used: the value under prefix length is treated as an integer, so 
it is appropriate to handle it as such. Here, however, we are modelling 
syntactic constructs. Simon felt that users will think in terms of 
elements.

Steve will prepare some examples and a proposal for inclusion in the 
"vX+2" draft.

5. Other business
Simon will email the group with some questions about the UML schema 
components description.

Meeting closed, 18:10 GMT

Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20080201/b74ddc26/attachment.html