[DFDL-WG] DFDL: Minutes from OGF WG call, 09 Jan 2007
Ian W Parkinson
PARKIW at uk.ibm.com
Thu Jan 10 06:43:49 CST 2008
Open Grid Forum: Data Format Description Language Working Group
Weekly Working Group Conference Call
17:00 GMT, 09 Jan 2008
Attendees
Mike Beckerle (Oco)
Geoff Judd (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
Alan Powell (IBM)
Steve Hanson (IBM)
Suman Kalia (IBM)
Agenda
1. OGF 22 in Cambridge, MA
2. Level set on specification drafts
3. Expression Language
4. Nulls and defaults - can we drop useNullForDefault?
5. Other business
1. OGF22
The next OGF conference will be held February 25-29 in Cambridge, MA. As
he is local, Mike is planning to attend to represent DFDL. The working
group should decide what we would like to present at the conference, if
anything, and Mike will enquire upon the closing date for submissions.
Could be Jan 11th?
2. Specification Drafts
Mike circulated draft 30 of the DFDL specification before Christmas, and
had prepared a plan covering the contents of the next three drafts. The
objective of the plan was to guide the group to the stage where the
specification was not a limiting factor to progress and that
implementations could proceed with a reasonable expectation that the
specification would not change significantly. Steve mentioned that IBM are
attempting to assign remaining workitems internally, and wanted to
coordinate this with the other working group members to avoid duplication
of effort.
Due to the demands of his new role, Mike will need to pass some items that
he had been hoping to tackle on to other people. He suggested that
editorship of the specification should pass around the group with each
draft, ideally to whoever would be making the most significant changes in
that draft.
For the next draft, number 31, Steve suggested that Alan might be an
appropriate editor as he is working on the expression language, which is a
key subject for the next draft. Simon would also like to own a draft and
would consider this, but that he could not commit in the meeting.
3. Expression Language
The group has previously discussed difficulties with forward/backward
references in expressions. Mike observed that forward-referencing
expressions can occur in a DFDL schema but could only be used during
unparse. Discussing whether it is feasible to police this statically, Mike
reckoned that while it may be difficult to analyze an expression to see
whether it referred forward or not, this would probably be a decidable
problem (eg, follow the dfdl:outputValue chain).
Steve asked how we should specify the data type to be returned from an
expression, there being two candidates:
a) the XML Schema type of the DFDL property
b) the 'resolved' data type of the DFDL property as needed by the parser
Take dfdl:length as an example. The XML Schema type is 'string' because
the field can accept numeric literals, expressions, regular expression,
etc. But the parser will always want an integer.
Agreed that an expression should return the 'resolved' data type.
Steve asked whether, in the property descriptions, we should include the
allowable return type from an expression. Mike believed that we should, as
it may be distinct from the DFDL type for that field.
So, the dfdl:length property description in the spec needs to say exactly
what the options are - eg, "a literal integer, or an expression that
resolves to an integer, or a regular expression that resolves to an
integer".
Using the XSD "maxOccurs" field as an example, which is normally an
integer but may also be the token 'unbounded', Simon suggested that simply
using the 'resolved' type may not be sufficient and that a processor will
need to be aware that, in some cases, the result of an expression may not
be the natural type. Mike concluded that we would need to specify both
types as above and also any 'distinguished tokens'.
Finally, should a DFDL engine automatically cast an expression result to
the 'resolved' type, or instead strictly enforce the return type of the
expression. The group felt the latter option to be preferable.
(Alan Powell joined the meeting)
4. Nulls and defaults
Steve would like to review his previous correspondance with Mike before
discussing this further. It will be included in the agenda for next week's
meeting.
5. Property Precedence
Geoff and Steve have been preparing a proposal for precedence using a mind
map. Steve will distribute this initial proposal for wider review.
(Mike Beckerle and Suman Kalia left the meeting)
6. Entity references
Alan has been looking at the use of XML entity references to more easily
allow non-printable characters to be written into DFDL documents, and has
distributed a proposal within IBM. There are some issues around this at
the moment (need DTD to define entities, allowable characters in XML 1.0
docs). Alan is looking at these.
This discussion in IBM had led to the concept of a mechanism to easily
represent arbitrary whitespace, which is a common feature of text formats
but which causes problems when modelling. Simon has experience with this
concept and will send Steve a description of how PolarLake handle this..
Steve suggested we could handle this by allowing delimiters to be a list
of allowable values, with the first used as a default on unparse. (We
already have this idea for dfdl:nullValue). Simon observed that this could
not handle arbitrary length whitespace. Steve said that we should have
entities that cover that - like <WSP> and <OWSP> in IBM's WTX parser (the
O meaning optional) - these are extremely useful. So then you could say
things like (ignore incorrect entity syntax):
dfdl:separator ="x0Dx0A x0D"
meaning allow the separator to default to CRLF but allow LF on its own.
However, Steve also pointed out that in the EDI data format the choice of
delimiter comes from an expression, adding to the complexity, because the
allowable value of the delimiter is then <value from expression>
concatenated with <entity>. Is that supported by current spec?. Eg:
dfdl:separator ="{..\delimiter} {..\delimiter}x0Dx0A {..\delimiter}x0D"
Simon wondered if we could deal with this situation in a different way by
perhaps handling it as 'delimiter padding' and having a DFDL option to
allow/trim it. But he cautioned that we must avoid ambiguity - for
example, to handle whitespace at the end of a delimiter which is followed
by data which allows whitespace. Steve said that in that situation you
have no choice but to explictly model the whitespace and not use the
arbitrary entities.
Geoff thought that if we did go for the trimming approach we may need to
describe separate sets of rules for whitespace handling for the markup
region and for the data region.
Steve will take an action to come up with a proposal.
7. Other business
Steve would like to discuss a model of ACORD AL3 length-prefixed data on
the working group call, and will add an item to next week's agenda. Mike
and Geoff have been corresponding on that.
Within IBM, some changes have been proposed to Mike's UML model of DFDL.
This will be circulated to the working group when IBM comments are
complete.
Meeting closed, 17:45 GMT
Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20080110/7f39deb9/attachment.html
More information about the dfdl-wg
mailing list