[DFDL-WG] DFDL: Minutes from OGF WG call, 30 Jan 2007
Ian W Parkinson
PARKIW at uk.ibm.com
Fri Feb 1 08:19:49 CST 2008
All,
Here are the minutes from Wednesday's meeting - apologies for the delay.
As ever please let me know of any corrections.
Cheers,
Ian
Open Grid Forum: Data Format Description Language Working Group
Weekly Working Group Conference Call
17:00 GMT, 30 Jan 2008
Attendees
Mike Beckerle (Oco)
Steve Hanson (IBM)
Geoff Judd (IBM)
Suman Kalia (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
Alan Powell (IBM)
1. Specification drafts
Alan has not yet received any updates for the next draft of the
specification, which is due soon. As Alan is on vacation from the end of
next week, he is looking for input as soon as possible. The group recorded
the following status for items targetting for the next specification
draft:
Nulls/default/optionals - Mike and Steve will collaborate on this and hope
to have a draft ready for the end of next week.
Description of schema components - Simon expects to have this ready by the
end of the week..
Regular expressions for lengths - This is now targeted for the following
specification draft ("vX+2")
Expression language - Alan has distributed a new draft for review and has
asked for comments. Mike will review this next week.
valueCalc - Mike will write up at least a first draft, and aims to have
this ready for next week.
Property precedence - Steve is looking for review comments on the
previously distributed mindmap. The specification can include this
information simply as a list, not as a "mindmap" tree. It may be useful to
combine this information with the schema components diagram. The present
proposal only handles parsing; Steve will extend this to cover unparsing.
Entities - The group has been having good discussions regarding the
entities proposal.
White space handling - Whitespace seems to be largely handled by the
entities proposal. Steve has been thinking about a way to introduce
variable terminators to DFDL, which is discussed later in the meeting.
2. Expression Language
Alan distributed a new draft of the expression language proposal, and
thanks Simon for his comments. As noted above, Mike will send review
comments next week. The second part of the proposal document is text from
the current specification draft, which will be replaced by the new
proposal. This is included for "history" and is not intended to be
reviewed.
3. Whitespace
The MIME header format allows for optional whitespace either side of a
colon used to delimit the header name and its value. Steve felt that this
justfied the need for the proposed %OWSP; entity (along with the %WSP;
entity). Mike wondered whether this lead to a slippery slope where we try
to handle other complex delimiters similarly, for example,
case-insensitive delimiters. Steve would like to propose a more general
way to handle complex delimiters in DFDL. Mike would have no objection to
having both %WSP; and %OWSP; in the language if these could be described
in terms of Steve's more general approach.
In PolarLake, the MIME header use case would be handled using a name field
(terminated by a colon), with a optional field between the colon and the
value which would consume any whitespace.
4. Recursive definitions for delimeters
Steve has proposed describing initiators in terms of named types. For
example, an intiator could be defined using a simple enumeration type to
list its possible values, or one using a pattern facet or assertions.
This would extend DFDL's current use of such facets, which are presently
used only for validation. Mike distinguished between 'format' and
'content' data, and suggested that this conceptually means that we can
interpret facets during parsing for format data, but only at validation
for content. Simon suggested a similar concept of "system data" vs. "user
data". We would need to revisit speculative parsing to deal with this
issue.
An alternative, suggested by Simon, would be to relax the present
restriction which allows an expression to only return one value. If an
expression could return a sequence of values, as is allowed by XPath, then
we could use expressions to describe delimiters with multiple possible
values. On output, a DFDL unparser would write out the first value in the
sequence. Mike observed that such a scheme would solve the 'quoting hell'
problem present with simple, space-delimited, XML lists as presently
allowed for the nullValues property.
If we adopt Steve's approach, we may not be able to access DFDL constructs
(variables, expressions or entities) in facets. Steve pointed out that XSD
processors would typically reject an enumerated type, restricted to length
1, where the enumeration includes a DFDL entity - the XSD processor will
not recognise the DFDL entity and instead treat this as a string of length
greater than 1.
Alan asked whether there are any dynamic cases, where the set of
terminators is obtained from the document itself. Mike felt that this
could be modelled using assertions, though this would leave terminator on
output undefined; and that a solution where complex types may be used as
terminators would probably allow us to handle most of these cases. Steve
mentioned the EDI format, which allows a document-specified delimiter to
be used with optional whitespace.
How, asked Simon, should we handle the output value of a complex
delimiter? This must come either from the DFDL schema or from the infoset.
Steve suggested we use default properties in the subelements of the type,
and Mike suggested we could similarly use outputValueCalc. Steve and Mike
agreed that a terminator wouldn't be present in the infoset, by analogy to
the similar mechanism used for length prefixes. Further, this mechanism
might allow us to remove a number of properties related to terminators.
Although Steve had intended this mechanism to be used with simple types or
elements, Mike and Suman thought it would be appropriate to allow complex
types. Elements would allow the use of the 'default' attribute for use on
output. Mike contrasted this to the prefix length solution, where a simple
type is used: the value under prefix length is treated as an integer, so
it is appropriate to handle it as such. Here, however, we are modelling
syntactic constructs. Simon felt that users will think in terms of
elements.
Steve will prepare some examples and a proposal for inclusion in the
"vX+2" draft.
5. Other business
Simon will email the group with some questions about the UML schema
components description.
Meeting closed, 18:10 GMT
Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20080201/b74ddc26/attachment.html
More information about the dfdl-wg
mailing list