[DFDL-WG] DFDL: Minutes from OGF WG call, 23 Jan 2007 *CORRECTED*
Ian W Parkinson
PARKIW at uk.ibm.com
Thu Jan 24 10:19:25 CST 2008
A small correction. with thanks to Simon - it was Steve (rather than
Simon) who had previously attracted a reasonable audience at the OGF
conference.
Ian
Open Grid Forum: Data Format Description Language Working Group
Weekly Working Group Conference Call
17:00 GMT, 23 Jan 2008
Attendees
Mike Beckerle (Oco)
Simon Parker (PolarLake)
Ian Parkinson (IBM)
Alan Powell (IBM)
Apologies
Steve Hanson (IBM), Suman Kalia (IBM)
1. OGF22
The DFDL session at OGF22 is now booked for the Monday afternoon, and Mike
has registered to attend. Mike will present our updated status, and Alan
promised to upload the last set of presented slides to GridForge so that
Mike can update them. Alan asked whether we should attempt to drum up
interest in the DFDL session to encourage attendence; Simon thought that
advertising may not make much difference and that Steve had a reasonable
audience when he presented.
2. Specification drafts
Steve and Alan had previously assigned ownership of individual items from
Mike's plan of contents for the next few drafts. Alan will assemble the
next draft, due at the end of the month, and asked for input as soon as
possible.
Looking at the plan for the next, "vX+1", draft, the group reported the
following status:
Nulls/default/optionals - Mike reported no update.
Description of schema components - Simon is still working on this.
Regular expressions for lengths - Alan reported no progress.
Expression language - Alan will shortly distribute a new version of the
proposal for review.
valueCalc - Mike is still to write this.
Property precedence - Following a discussion on the call last week, please
provide review comments. Mike will add this to the agenda for next week.
Entities - Alan's recent proposal is to be discussed on the current call.
White space handling - Discussion is ongoing, and Steve is to make a
proposal.
The plan calls for subsequent versions of the specification, including the
following items with status:
Supplements - Steve is working to update the supplements
Speculative parsing - IBM has internally been discussing and reviewing WTX
function, though no documentation presently exists covering this.
3. UML diagrams
Simon is revising the UML diagrams which describe the DFDL schema
components. The previous meeting minutes included a number of comments on
these diagrams, and the group took this opportunity to look at some of
those comments:
"...I think it would be better to use the open source XML schema model as
source model and show relationship of DFDL Annotations attached to the XSD
schema model" - Mike noted that DFDL makes use of annotations on objects
which are absent from the XSD schema model, and hence that it may be
unnatural to base the DFDL schema model directly on the XSD model. Simon
suggested that it would be cleanest to describe a modified version the XSD
model including those XSD elements that we need to annotate, and use this
as a basis for the DFDL model.
"The current diagram suggests that 'variable definition' can both be part
of a format base or as a standalone annotation (outside of a format). Is
this true?" - Mike suggested that variable definitions don't have to be
part of a format block: so, yes, this is true.
Mike agreed to respond further to the set of comments by email.
4. Review of Entities proposal
Alan has distributed a proposal covering entities in DFDL, intended to
allow characters which are disallowed by XML1.0 (or XML1.1) to be included
in DFDL schemas. These follow a similar syntax to XML, using % instead of
& as an escape, with an additional mechanism for specifying raw data. This
latter is intended to supplant the escaping mechanism described in current
versions of the specification (which also uses % as an escape).
The group felt that the description of the raw data entities should not be
cast in terms of characters and character sets, but rather in terms of
bytes. If treated as characters, schemas may need to be written when
moving from single-byte to double-byte character sets; further, this
incorrectly implies some codepage conversion is involved.
The proposal also introduces a list of predefined names for certain common
control characters. Mike asked whether these are the existing XML names -
Alan replied that XML does not define names for control characters.
Ian asked how we should represent the literal % character in strings given
this form of escaping. The present draft of the specification uses "%%" to
handle this; Simon suggested a string like "%pc;". The meeting felt that
%% might be marginally preferable.
Finally, the proposal defines some labels which aim to reduce the
complexity of dealing with whitespace and newlines. The %NL; entity
represents a newline on "the target platform" - Mike observed that DFDL
presently does not have a concept of a target platform. Alan felt it
important that a single DFDL schema be able to generate output documents
targetted at different platforms. Mike proposed that we introduce a new
property, "generatedNewLine", which describes the meaning of %NL; during
unparse, and that %NL; should be tolerant of any common new line
representation during parse. The group discussed whether this could
instead be handled using a list of optional new line values, however this
would not support schema portability. Simon suggested we introduce another
new property to mean that %NL; should be the conventional new line
representation on the platform on which an engine is running, however Mike
pointed out that this simply requires appropriate configuration of the
generatedNewLine property.
%WSP; and %OWSP; are introduced to mean any whitespace, and optional
whitespace. This will be useful in describing some formats which allow
arbitrary whitespace, such as MIME. Mike pointed out that we could model
such whitespace using hidden fields, but that these entities may make a
schema clearer. PolarLake have found that only one such label is
necessary, which means, "one or more whitespace characters", and that this
needs only to be made available as a delimiter - Mike agreed that this
label may represent a special type of delimiter rather than a general
purpose entity. Alan would like to work through the potential use cases to
see if we can restrict it in this fashion, and will update the proposal to
specify that these relate to just one character. Simon suggested we could
introduce an extra label, perhaps %WPS*; to match multiple whitespace
characters.
Meeting closed, 18:15
Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20080124/a57555b9/attachment-0001.html
More information about the dfdl-wg
mailing list