[DFDL-WG] Agenda for DFDL WG call - 2008-01-30

Wed Jan 30 10:27:47 CST 2008

Hi Mike

My specific item is:

Ways of handling variable markup

These thoughts came out of observations of WTX and how it handles variable 
terminators: 

1) One way to handle the situation where the terminator can vary is to 
allow the DFDL markup properties (dfdl:terminator, dfdl:separator, etc) to 
be lists, just like we already do for dfdl:nullValues. 

2) We've allowed the prefix of a prefixed length to be explicitly 
described as a non-event field using dfdl:lengthPrefixType. Should we 
permit this for markup properties?  Instead of supplying a list of 
possible values, you supply a simple type with enums for the values. This 
could be viewed as an alternative/complementary to 1). (WTX has this). 

3) Extending 2) you could say that elements as well as simple types are 
available to model markup, so you have the full power of DFDL available.

Now 2) and 3) are very powerful, but there are issues. 
- If using XSDL enumeration facet, we are constrained by its syntax so I 
don't see how we could use our own entity scheme or expressions.
- What to do on output - use an element of simple type as then you can use 
the XSDL default attribute. 
- If the markup variations are of different lengths, how do we handle 
that? 
        - Take enums into account when parsing 
        - But that breaks a rule we had that enum facets were not used 
when parsing
        - Maybe we need that anyway to help with speculative parsing?
- Is such an element hidden by implication?

Clearly, we should not force a user to model markup as an element/type - 
most users just see it as a piece of text so just entering the value must 
still be allowed. 

Example: 

<xsd:element name="terminatedfield" type="xsd:string" 
dfdl:lengthKind="delimited" dfdl:terminator="urn:terminator" />

<xsd:element name="terminator" dfdl:lengthKind="implicit" default=";" />
    <xsd:simpleType>
      <xsd:restriction base="xsd:string">
        <xsd:enumeration value=":"/>
        <xsd:enumeration value=";"/>
      </xsd:restriction>
    </xsd:simpleType>
  </xsd:element>

(Note: proper syntax needs defining)

Last question: Let's say my delimiter is dynamically defined at the start 
of the data, like EDI allows. We would handle that in DFDL using an 
expression or variable. However, EDI also allows random white space to 
appear after the delimiter. Can our expression/entity syntaxes handle 
this?  Does this preclude use of 1) or 2) or 3) above?   

Regards, Steve

Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848

"Mike Beckerle" <mbeckerle.dfdl at gmail.com> 
Sent by: dfdl-wg-bounces at ogf.org
30/01/2008 02:05

To
<dfdl-wg at ogf.org>
cc

Subject
[DFDL-WG] Agenda for DFDL WG call - 2008-01-30

Thanks Alan and Steve for some agenda topics:

Follow up the items from last week 

 - Specification  drafts   -   I need updates from everyone to produce 
next spec draft 

-  Expression language  -  Comments from only Steve H. so far 

- Property precedence  - Any more comments/discussion 

- UML for DFDL schema - status update 

- Entity proposal updates?

Discussion for this call 

- White space 

- Steve?s items (??) 

- OGF presentation

Other Topics?

From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: Tuesday, January 29, 2008 1:10 PM
To: Mike Beckerle
Cc: Alan Powell
Subject: Agenda for DFDL WG call

Hi Mike - possible agenda items for tomorrow. 

Regards, Steve

Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 29/01/2008 17:55 ----- 

Alan Powell/UK/IBM 
29/01/2008 17:25 

To
Steve Hanson/UK/IBM at IBMGB 
cc

Subject
Fw: [DFDL-WG] DFDL: Minutes from OGF WG call,        23 Jan 2007 
*CORRECTED*

Steve 

I will try to make the WG call tomorrow but may be still on a course. 

We need to follow up the items from last week 

 - Specification  drafts   -   I need updates from everyone to produce 
next spec draft 

-  Expression language  -  Comments from only you so far 

- Property precedence  - Any more comments/discussion 

- UML for DFDL schema - status update 

- Entity proposal  - I should have updated as a result of last weeks 
discussion but haven't had time 

Discussion for this call 

- White space 

- Your items.   We did discuss them a bit but mostly in the context of 
white space. 

Alan Powell

MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com 
Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898

----- Forwarded by Alan Powell/UK/IBM on 29/01/2008 17:19 ----- 

Steve Hanson/UK/IBM at IBMGB 
Sent by: dfdl-wg-bounces at ogf.org 
28/01/2008 08:51 

To
dfdl-wg at ogf.org 
cc

Subject
Re: [DFDL-WG] DFDL: Minutes from OGF WG call,        23 Jan 2007 
*CORRECTED*

Sorry I couldn't make the call.  Some comments: 

a) we need both WSP and OWSP if DFDL delimiter properties can only specify 
a single value. If they can specify a list of values then you can get away 
with only needing WSP 
      eg, dfdl:terminator="@ @%WSP;" 
b) if we make WSP mean a single white space character, we need a second 
entity for multiple white space characters. 

It doesn't look like you got round to discussing the other items I sent in 
(below)? Let's do that next call. 

1) One way to handle the situation where the terminator can vary is to 
allow the DFDL markup properties (dfdl:terminator, dfdl:separator, etc) to 
be lists, just like we already do for dfdl:nullValues. (IBM's WTX has this 
capability). 

2) We've allowed the prefix of a prefixed length to be explicitly 
described as a non-event field using dfdl:lengthPrefixType. Should we 
permit this for markup properties?  Instead of supplying a list of 
possible values, you supply a simple type with enums for the values. This 
could be viewed as an alternative/complementary to 1). There is a 
limitations - because we are using XSDL enumeration facet, we are 
constrained by its syntax so I don't see how we could use our own entity 
scheme or expressions. Also, I suspect that enums are inherently unordered 
so we'd need a way of saying which to use on output (use an element of 
simple type and use XSDL default attribute?).  Lastly, we should not force 
a user to model an initiator as an element/type - most users just see it 
as a piece of text so just entering the value must still be allowed. 

3) Let's say my delimiter is dynamically defined at the start of the data, 
like EDI allows. We would handle that in DFDL using an expression or 
variable. However, EDI also allows random white space to appear after the 
delimiter. Can our expression/entity syntaxes handle this?  Does this 
preclude use of 1) or 2)?   

Regards, Steve

Steve Hanson
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848 

Ian W Parkinson/UK/IBM at IBMGB 
Sent by: dfdl-wg-bounces at ogf.org 
24/01/2008 16:19 

To
dfdl-wg at ogf.org 
cc

Subject
[DFDL-WG] DFDL: Minutes from OGF WG call, 23 Jan 2007 *CORRECTED*

A small correction. with thanks to Simon - it was Steve (rather than 
Simon) who had previously attracted a reasonable audience at the OGF 
conference. 

Ian 

Open Grid Forum: Data Format Description Language Working Group 

Weekly Working Group Conference Call 
17:00 GMT, 23 Jan 2008 

Attendees 
Mike Beckerle (Oco) 
Simon Parker (PolarLake) 
Ian Parkinson (IBM) 
Alan Powell (IBM) 

Apologies 
Steve Hanson (IBM), Suman Kalia (IBM) 

1. OGF22 
The DFDL session at OGF22 is now booked for the Monday afternoon, and Mike 
has registered to attend. Mike will present our updated status, and Alan 
promised to upload the last set of presented slides to GridForge so that 
Mike can update them. Alan asked whether we should attempt to drum up 
interest in the DFDL session to encourage attendence; Simon thought that 
advertising may not make much difference and that Steve had a reasonable 
audience when he presented. 

2. Specification drafts 
Steve and Alan had previously assigned ownership of individual items from 
Mike's plan of contents for the next few drafts. Alan will assemble the 
next draft, due at the end of the month, and asked for input as soon as 
possible. 

Looking at the plan for the next, "vX+1", draft, the group reported the 
following status: 
Nulls/default/optionals - Mike reported no update. 
Description of schema components - Simon is still working on this. 
Regular expressions for lengths - Alan reported no progress. 
Expression language - Alan will shortly distribute a new version of the 
proposal for review. 
valueCalc - Mike is still to write this. 
Property precedence - Following a discussion on the call last week, please 
provide review comments. Mike will add this to the agenda for next week. 
Entities - Alan's recent proposal is to be discussed on the current call. 
White space handling - Discussion is ongoing, and Steve is to make a 
proposal.

The plan calls for subsequent versions of the specification, including the 
following items with status: 
Supplements - Steve is working to update the supplements 
Speculative parsing - IBM has internally been discussing and reviewing WTX 
function, though no documentation presently exists covering this.

3. UML diagrams 
Simon is revising the UML diagrams which describe the DFDL schema 
components. The previous meeting minutes included a number of comments on 
these diagrams, and the group took this opportunity to look at some of 
those comments: 

"...I think it would be better to use the open source XML schema model as 
source model and show relationship of DFDL Annotations attached to the XSD 
schema model" - Mike noted that DFDL makes use of annotations on objects 
which are absent from the XSD schema model, and hence that it may be 
unnatural to base the DFDL schema model directly on the XSD model. Simon 
suggested that it would be cleanest to describe a modified version the XSD 
model including those XSD elements that we need to annotate, and use this 
as a basis for the DFDL model. 

"The current diagram suggests that 'variable definition' can both be part 
of a format base or as a standalone annotation (outside of a format). Is 
this true?" - Mike suggested that variable definitions don't have to be 
part of a format block: so, yes, this is true. 

Mike agreed to respond further to the set of comments by email. 

4. Review of Entities proposal 
Alan has distributed a proposal covering entities in DFDL, intended to 
allow characters which are disallowed by XML1.0 (or XML1.1) to be included 
in DFDL schemas. These follow a similar syntax to XML, using % instead of 
& as an escape, with an additional mechanism for specifying raw data. This 
latter is intended to supplant the escaping mechanism described in current 
versions of the specification (which also uses % as an escape). 

The group felt that the description of the raw data entities should not be 
cast in terms of characters and character sets, but rather in terms of 
bytes. If treated as characters, schemas may need to be written when 
moving from single-byte to double-byte character sets; further, this 
incorrectly implies some codepage conversion is involved. 

The proposal also introduces a list of predefined names for certain common 
control characters. Mike asked whether these are the existing XML names - 
Alan replied that XML does not define names for control characters. 

Ian asked how we should represent the literal % character in strings given 
this form of escaping. The present draft of the specification uses "%%" to 
handle this; Simon suggested a string like "%pc;". The meeting felt that 
%% might be marginally preferable. 

Finally, the proposal defines some labels which aim to reduce the 
complexity of dealing with whitespace and newlines. The %NL; entity 
represents a newline on "the target platform" - Mike observed that DFDL 
presently does not have a concept of a target platform. Alan felt it 
important that a single DFDL schema be able to generate output documents 
targetted at different platforms. Mike proposed that we introduce a new 
property, "generatedNewLine", which describes the meaning of %NL; during 
unparse, and that %NL; should be tolerant of any common new line 
representation during parse. The group discussed whether this could 
instead be handled using a list of optional new line values, however this 
would not support schema portability. Simon suggested we introduce another 
new property to mean that %NL; should be the conventional new line 
representation on the platform on which an engine is running, however Mike 
pointed out that this simply requires appropriate configuration of the 
generatedNewLine property. 

%WSP; and %OWSP; are introduced to mean any whitespace, and optional 
whitespace. This will be useful in describing some formats which allow 
arbitrary whitespace, such as MIME. Mike pointed out that we could model 
such whitespace using hidden fields, but that these entities may make a 
schema clearer. PolarLake have found that only one such label is 
necessary, which means, "one or more whitespace characters", and that this 
needs only to be made available as a delimiter - Mike agreed that this 
label may represent a special type of delimiter rather than a general 
purpose entity. Alan would like to work through the potential use cases to 
see if we can restrict it in this fashion, and will update the proposal to 
specify that these relate to just one character. Simon suggested we could 
introduce an extra label, perhaps %WPS*; to match multiple whitespace 
characters. 

Meeting closed, 18:15 

Ian Parkinson
WebSphere ESB Development
Mail Point 211, Hursley Park, Hursley, Winchester, SO21 2JN, UK

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 http://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20080130/ea53ae3f/attachment-0001.html