[DFDL-WG] DFDL: Minutes from OGF WG call, 09 Jan 2007

Thu Jan 10 06:43:49 CST 2008

Open Grid Forum: Data Format Description Language Working Group

Weekly Working Group Conference Call
17:00 GMT, 09 Jan 2008

Attendees
Mike Beckerle (Oco)
Geoff Judd (IBM)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
Alan Powell (IBM)
Steve Hanson (IBM)
Suman Kalia (IBM)

Agenda
1. OGF 22 in Cambridge, MA
2. Level set on specification drafts
3. Expression Language
4. Nulls and defaults - can we drop useNullForDefault?
5. Other business

1. OGF22
The next OGF conference will be held February 25-29 in Cambridge, MA. As 
he is local, Mike is planning to attend to represent DFDL. The working 
group should decide what we would like to present at the conference, if 
anything, and Mike will enquire upon the closing date for submissions. 
Could be Jan 11th?

2. Specification Drafts
Mike circulated draft 30 of the DFDL specification before Christmas, and 
had prepared a plan covering the contents of the next three drafts. The 
objective of the plan was to guide the group to the stage where the 
specification was not a limiting factor to progress and that 
implementations could proceed with a reasonable expectation that the 
specification would not change significantly. Steve mentioned that IBM are 
attempting to assign remaining workitems internally, and wanted to 
coordinate this with the other working group members to avoid duplication 
of effort.

Due to the demands of his new role, Mike will need to pass some items that 
he had been hoping to tackle on to other people. He suggested that 
editorship of the specification should pass around the group with each 
draft, ideally to whoever would be making the most significant changes in 
that draft.

For the next draft, number 31, Steve suggested that Alan might be an 
appropriate editor as he is working on the expression language, which is a 
key subject for the next draft. Simon would also like to own a draft and 
would consider this, but that he could not commit in the meeting.

3. Expression Language
The group has previously discussed difficulties with forward/backward 
references in expressions. Mike observed that forward-referencing 
expressions can occur in a DFDL schema but could only be used during 
unparse. Discussing whether it is feasible to police this statically, Mike 
reckoned that while it may be difficult to analyze an expression to see 
whether it referred forward or not, this would probably be a decidable 
problem (eg, follow the dfdl:outputValue chain). 

Steve asked how we should specify the data type to be returned from an 
expression, there being two candidates:
a) the XML Schema type of the DFDL property
b) the 'resolved' data type of the DFDL property as needed by the parser
Take dfdl:length as an example. The XML Schema type is 'string' because 
the field can accept numeric literals, expressions, regular expression, 
etc. But the parser will always want an integer. 
Agreed that an expression should return the 'resolved' data type.

Steve asked whether, in the property descriptions, we should include the 
allowable return type from an expression. Mike believed that we should, as 
it may be distinct from the DFDL type for that field.
So, the dfdl:length property description in the spec needs to say exactly 
what the options are - eg, "a literal integer, or an expression that 
resolves to an integer, or a regular expression that resolves to an 
integer".

Using the XSD "maxOccurs" field as an example, which is normally an 
integer but may also be the token 'unbounded', Simon suggested that simply 
using the 'resolved' type may not be sufficient and that a processor will 
need to be aware that, in some cases, the result of an expression may not 
be the natural type. Mike concluded that we would need to specify both 
types as above and also any 'distinguished tokens'.

Finally, should a DFDL engine automatically cast an expression result to 
the 'resolved' type, or instead strictly enforce the return type of the 
expression. The group felt the latter option to be preferable. 

(Alan Powell joined the meeting)

4. Nulls and defaults
Steve would like to review his previous correspondance with Mike before 
discussing this further. It will be included in the agenda for next week's 
meeting.

5. Property Precedence
Geoff and Steve have been preparing a proposal for precedence using a mind 
map. Steve will distribute this initial proposal for wider review.

(Mike Beckerle and Suman Kalia left the meeting)

6. Entity references
Alan has been looking at the use of XML entity references to more easily 
allow non-printable characters to be written into DFDL documents, and has 
distributed a proposal within IBM. There are some issues around this at 
the moment (need DTD to define entities, allowable characters in XML 1.0 
docs).  Alan is looking at these.

This discussion in IBM had led to the concept of a mechanism to easily 
represent arbitrary whitespace, which is a common feature of text formats 
but which causes problems when modelling. Simon has experience with this 
concept and will send Steve a description of how PolarLake handle this..

Steve suggested we could handle this by allowing delimiters to be a list 
of allowable values, with the first used as a default on unparse. (We 
already have this idea for dfdl:nullValue). Simon observed that this could 
not handle arbitrary length whitespace. Steve said that we should have 
entities that cover that - like <WSP> and <OWSP> in IBM's WTX parser (the 
O meaning optional) - these are extremely useful.  So then you could say 
things like (ignore incorrect entity syntax):

   dfdl:separator ="x0Dx0A  x0D" 

meaning allow the separator to default to CRLF but allow LF on its own.

However, Steve also pointed out that in the EDI data format the choice of 
delimiter comes from an expression, adding to the complexity, because the 
allowable value of the delimiter is then <value from expression> 
concatenated with <entity>. Is that supported by current spec?. Eg:

   dfdl:separator ="{..\delimiter} {..\delimiter}x0Dx0A {..\delimiter}x0D" 

Simon wondered if we could deal with this situation in a different way by 
perhaps handling it as 'delimiter padding' and having a DFDL option to 
allow/trim it. But he cautioned that we must avoid ambiguity - for 
example, to handle whitespace at the end of a delimiter which is followed 
by data which allows whitespace. Steve said that in that situation you 
have no choice but to explictly model the whitespace and not use the 
arbitrary entities. 
Geoff thought that if we did go for the trimming approach we may need to 
describe separate sets of rules for whitespace handling for the markup 
region and for the data region.

Steve will take an action to come up with a proposal.

7. Other business
Steve would like to discuss a model of ACORD AL3 length-prefixed data on 
the working group call, and will add an item to next week's agenda. Mike 
and Geoff have been corresponding on that.
Within IBM, some changes have been proposed to Mike's UML model of DFDL. 
This will be circulated to the working group when IBM comments are 
complete.

Meeting closed, 17:45 GMT

Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20080110/7f39deb9/attachment.html