[DFDL-WG] Minutes for OGF DFDL Working Group Call, January-13-2010

Thu Jan 14 05:03:42 CST 2010

Open Grid Forum: Data Format Description Language Working Group

OGF DFDL Working Group Call, January-13-2010

Attendees
Mike Beckerle (Oco)
Steve Hanson (IBM)
Alan Powell (IBM)
Steve Marting (Progeny)
Stephanie Fetzer (IBM)
Suman Kalia (IBM)
Peter Lambros (IBM)
Tim Kimber(IBM)

Apologies

1.   045  - Disciminators
Stephanie took us through her email Subject: [DFDL-WG] Bob & Steph's WTX 
'Discriminators' write-up

 WTX Identifiers are similar to DFDL discriminators

-Discriminators may only be placed on the physical representation of a 
group.  That is why we see them on partition groups and sequence groups 
but not on choice groups (or unordered groups ? covered below). 

In partitioned groups we have a subtype of each possible group ? so each 
possible group may have a discriminator. 

When WTX expresses choice groups it expresses them as a group containing 
all of the possible child groups ? so at the top level ?choice group? 
there is no component of the actual group content- so no use for a 
discriminator. But each choice which may itself be a group may have a 
discriminator.  Choice groups are special in that the choice model 
construct simply lists the components and only one may occur...at this 
level a discriminator on one of the choices may not be very useful. Inside 
of each choice?s components a discriminator could be used to indicate the 
existence of that choice. 

-The WTX UI does not allow discriminators on the components of Unordered 
Groups.  This may be due to the fact that the position of the 
discriminator has significance (all rules at or above the discriminator 
must evaluate to true).  If the group is unordered it would be difficult 
to enforce.  Will need to discuss for DFDL. 

-A group may have either zero or one discriminators.  No group may have 
more than one discriminator. 

-The discriminator may have two significant parts 
o        it?s location (mandatory).  The discriminator is placed on a 
component of a group and makes all of the cardinality and rules at that 
point and above become part of it's concept. 
o        it?s rule (optional) 
A group with a component which has a discriminator should have some ?rule? 
associated with it. In WTX if there is no explicit rule then the implicit 
rule is ?PRESENT($)?. We will need to decide if such implied rules will be 
allowed in DFDL. 

-A group may only have a discriminator on a mandatory component. Once 
again, this impacts a choice group where by definition all components are 
optional ? which will not have a discriminator. 

This has been an issue of debate in WTX. We could have implemented 
checking on optional elements quite easily  Over the years this has been 
questioned (as our UI allows them to be placed on optional elements) but 
once we explained the way the engine worked no customers perceived this as 
a deficiency.  In DFDL we will need to determine if this is needed. 

-In WTX we do allow a discriminator to be placed on a mandatory fixed size 
array (a repeating mandatory component with n:n cardinality).  It?s 
component rule can either refer to the entirety of the array (PRESENT($) 
meaning the whole of the array is present) or can call out a specific rule 
against one if the iterations.  This is not done often in practice. 

-In WTX it is common to have multiple levels of discriminators when we are 
working with nested groups. 

We discussed whether DFDL should not allow discriminators on unordered 
groups or groups with floating elements. Agree that discriminators should 
be allowed

Also discussed whether timing 'before/after' was required are WTX only has 
after. Decided to keep timing property.

Suggested should not be allowed on variable length arrays to be consistent 
with not being allowed on optional elements.

Mike agreed to write up rules in dfdl terms and extent to cover other 
points of uncertainty besides choices.

2. Zero length elements 
Steve H took us through his email subject: [DFDL] zero length (was Re: Fw: 
TDS length reference) ** updated ** 
This proposes that zero length fields should not be a processing error

Proposal: 

1. Parsing

Simple elements

1) It is not a schema definition error nor a processing error if a length 
is being used to extract data and it is zero. This covers dfdl:lengthKind 
implicit, explicit, prefixed and endOfParent (when parent length is 
known). The result is 'empty content'. (Note that for implicit, XSDL 
allows maxLength/length facet to be 0, so disallowing it for others is not 
consistent). 

2) It is not a processing error if scanning for data and the length of the 
returned bytes is zero. This applies to dfdl:lengthKind delimited, pattern 
and  endOfParent (when parent length is not known). The result is 'empty 
content'. (This is just stating the obvious).

(The above two rules ensure that it is possible to apply empty content to 
trigger optional, nil value or default value processing regardless of data 
type and dfdl:lengthKind). 

3) Optional, nil and default processing are applied as per spec.

4) If the element is required, and nil value or default value is not used, 
and empty string is not in the lexical space of the element's type, then 
it is a processing error. 

The two initiator related properties dfdl:nilValueInitiatorPolicy and 
dfdl:defaultValueInitiatorPolicy define whether nils and defaults are 
applied when initiated empty content is found, they don't affect the 
definition of empty content or what it means for the type.

[Note: If you recall, this discussion was triggered by a customer that was 
using an expression to calculate the length of a standard text decimal. He 
wanted 0 length to mean 0 ended up in the infoset. He can achieve this by 
making the element required with a default value of 0.]

Complex elements

It is possible to get returned empty content for a complex element for 
cases 1) and 2) above. 

1) If the complex element is optional then it is not added to the infoset. 

2) If the complex element does not have an initiator specified & is 
required then it is added to the infoset.

3) If the element has an initiator specified then 
dfdl:defaultValueInitiatorPolicy applies
        - required => element is added to infoset only if initiator is 
present (processing error if no initiator & empty content)
        - prohibited => element is added to infoset only if initiator is 
not present (initiator implies real content follows so processing error if 
initiator & empty content)

4) If the complex element is added to the infoset, then the parser 
processes the child content of the complex type. This may or may not cause 
a processing error.
<tk>I presume a processing error would be caused by
- any group having an initiator or terminator ( same as 5. below )
- any group having a prefix or postfix delimiter
- any group with more than one member having an infix delimiter
- any required element within the complex element having an initiator and 
dfdl:defaultValueInitiatorPolicy="required"
- any required element within the complex element having a terminator
- any required element which does not have a default value specified, and 
for which a zero-length representation is illegal
- other error scenarios?
</tk>
<smh>Correct. Basically you are going through the element's content (model 
group plus children) and attempting to parse. When you extract the data 
you get back empty content. This may or not cause a processing error. This 
was agreed on the call as the correct behaviour. In summary, for empty 
content to be valid for the complex element then it must also be valid for 
at least one content model</smh> 
 If it doesn't then default value processing applies for required child 
elements. If we don't do this then we will not create default values for 
all missing required simple elements, and that would be wrong.

5) If the contained sequence or choice has an initiator or terminator then 
it is a processing error.
<tk>
So it's OK to have a choice among the children of the complex element? If 
so, the specification should define the rules for picking a branch of the 
choice. The DFDL processer *could* always pick the first branch, but what 
if the first branch triggers a processing error and a different branch 
would not have done?
</tk>
<smh>I think it's the same as with real content. Parser will start against 
the first branch of the choice and see where it gets. Usual speculative 
parsing rules apply. If it has not discriminated successfully and a 
processing error occurs it will cause backtracking and the next branch 
will be tried. If it finds a valid content model for the empty content we 
are ok. If it doesn't it's a processing error.</smh> 

2. Unparsing

Simple elements

Data in the infoset can result in empty content being added to the bit 
stream (ie, nothing), with an accompanying 0 value in any length prefix or 
length expression field, if appropiate to the dfdl:lengthKind.

Complex elements

The absence from the infoset of a required complex element will cause any 
specified initiator to be output, plus if there are required children then 
default values will be output for those children. If we don't do this then 
we will not create default values for nested missing required simple 
elements, and that would be wrong. This enables creation of a sparse 
infoset containing just the elements with explicit values, with the rest 
defaulting regardless of nesting. 

3. Choices

Worth noting that the concept of 'required' for the elements of a choice 
does not apply. Even if minOccurs > 0.

4. Outstanding Issues

Is it ok to reuse dfdl:defaultValueInitiatorPolicy for complex elements? 
Should it be renamed? Should we add a separate property for complex 
elements?

Steve H to propose new  name for dfdl:defaultValueInitiatorPolicy

3. Difference between dfdl:lenghtKind= Delimited and endOfParent 

'delimited' means the item is delimited by the item?s terminator (if 
specified) or an enclosing construct?s separator or the end of the 
enclosing construct designated by its known length or its terminator. 
the only difference with dfdl:lentghKind='endOfParent' is that  the latter 
includes the 'end of the data stream' and applies to binary fields. 
We should either 
Add 'end of data stream' to delimited and remove 'endOfParent' 
Make 'endOfParent' be specifically for only  'end of data stream'

Short discussion. Alan agreed to try to write up description of 
endOfParent for review

4. Go through remaining actions 
No enough time

5 Draft 037 review 
>From comments: 
a  DFDL Subset of XML Schema   (TBD: need means for an implementation to 
indicate it is using non-standard extensions?) 
Believe that this was to allow users to indicate they are using 
unsupported schema components. Agreed to defer fron DFDL v1
b. Question whether infoset MUST be in schema order.  Request for 
'bitstream order' 
Short discussion. Main reason for schema order is allow the infoset to be 
validated against a schema. Agree to leave as schema order
c. Dealing with 'Grammar ambiguity' errors 
Not discussed

6 Review Schedule 
Activity

Schedule
Who
Complete Action items 

             - 18 Dec 2009 
 WG 
Complete Spec 
Write up work items 
            ? 23 Dec 2009 
AP 
Restructure and complete specification 
              - 23 Dec 2009 
AP 
Issue Draft 038 
23 Dec 2009

WG review 
WG review 
7 Dec ? 08 Jan 2010 
WG 
Incorporate review comments 
4 Jan - 29 Jan 2010 
AP + 
Issue Draft 039 
15 Jan 2010

Incorporate review comments 
4 Jan - 29 Jan 2010 
AP + 
Issue Draft 040 
29 Jan 2010

Initial OGF Editor Review 
Initial Editor review 
1 Feb - 1 Mar 2010 
OGF 
Initial GFSG review 
1 Feb - 1 Mar 2010

Issue Draft 041 
1 Mar 2010

OGF Public Comment period (60 days) 

1 Mar - 30 Apr 2010 
OGF 
OGF 28 Munich 

15-19 March 2010 

Incorporate comments 
Incorporate comments 
28 May 2010

Issue Draft 042 
28 May 2010

Final OGF Editor Review 
Final  Editor review 
June  2010 
OGF 
final GFSG review 
June  2010

Issue Final specification 
30 June 2010

Publish proposed recommendation 

1 July 2010

Grid recommendation process 

1 Jan - 1 April 2011

Meeting closed, 15:20

Next call 20 January 2010  13:00 UK

Next action: 074
Actions raised at this meeting

No
Action 

Current Actions:
No
Action 

045
20/05 AP: Speculative Parsing
27/05: Psuedo code has been circulated. Review for next call
03/06: Comments received and will be incorporated
09/06: Progress but not discussed
17/06: Discussed briefly
24/06: No Progress
01/07: No Progress
15/07: No progress. MB not happy with the way the algorithm is documented, 
need to find a better way.
29/07: No Progress 
05/08: No Progress. Will document behaviour as a set of rules.
12/08: No Progress 
...
16/09: no progress
30/09: AP distributed proposal and others commented. Brief discussion AP 
to incorporate update and reissue
07/10: Updated proposal was discussed.Comments will be incorporated into 
the next version.
14/10: Alan to update proposal to include array scenario where minOccurs > 
0
21/10: Updated proposal reviewed
28/10: Updated proposal reviewed see minutes
04/11: Discussed semantics of disciminators on arrays. MB to produce 
examples
11/11: Absorbing action 033 into 045.  Maybe decorated discrminator kinds 
are needed after all. MB and SF to continue with examples. 
18/11: Went through WTX implementation of example. SF to gather more 
documentation about WTX discriminator rules.
25/11: Further discussion. Will get more WTX documentation. Need to 
confirm that no changes need to Resolving Uncertainty doc.
04/11: Further discussion about arrays.
09/12: Reviewed proposed discriminator semantic.
16/12: Reviewed discriminator examples and WTX semantic.
23/12: SF to provide better description of WTX behaviour and invite B 
Connolley to next call
06/01:B Connolly not available. SF to provide more complete description.
13/01: Stephaine took us through a description of WTX identifiers. Mike 
agreed to write up in DFDL terms.
049
20/05 AP Built-in specification description and schemas
03/06: not discussed
24/06: No Progress
24/06: No Progress (hope to get these from test cases)
15/07: No progress. Once available, the examples in the spec should use 
the dfdl:defineFormat annotations they provide.
...
14/10: no progress
21/10: Discussed the real need for this being in the specification. It 
seemed that the main value is it define a schema location for downloading 
'known' defaults from the web. 
28/10: no progress
04/11: no progress
11/11: no update
18/11: no update
25/11: Agreed to try to produce for CSV and fixed formats
04/12: no update
09/12: no update
16/12: no update
23/12: no update
06/01: no progress. If there is no resource to complete this action it can 
be deferred
13/01:no progess
064
MB/SH Request WG presentation at OGF 28
25/11: Session requested
04/12: no update
09/12: no update
16/12: SH has changed request to a general session rather tha WG in the 
hope of attracting more people.
23/12: no update
06/01: not heard anything yet
13/01: no update
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
09/12: no update
16/12: reminded dent to project manager
23/12: SH will send another reminder.
06/01: Another reminder will be sent
13/01: no update
068
Should the roots of messages be designated.?
09/12: Yes. New dfdl:documentRoot property
Closed
16/12: reopened and decided to drop property subject to agreement from SKK 
and SF
23/12: SKK review decision to drop  dfdl:documentRoot 
13/01: closed
071
Semantics of length=0, nil handling and defaults.
23/12:SH no update
06/01: SH has started
13/01: SH proposal review. Minor updates to be made
073
SH: Control of overpunching zoned positive sign
13/01: no update

Closed actions
No
Action 
056
MB Resolve lengthUnits=bits including fillbytes
12/08: No Progress
...
28/10: no progress
04/11: MB to look at lengthUnits = bits
11/11: no update
18/11: no update
25/11: no update
04/12: no update. ALan will set up a separate call to progress this 
action.
09/12: no update. ALan will set up a separate call to progress this 
action.
16/12: MB, SH and AP had  a separate call. MB to distribute proposal
23/12: Discussed proposal. MB will updated
06/01: V4 discussed and approved
13/01: Mike updated proposal. Closed
068
Should the roots of messages be designated.?
09/12: Yes. New dfdl:documentRoot property
Closed
16/12: reopened and decided to drop property subject to agreement from SKK 
and SF
23/12: SKK review decision to drop  dfdl:documentRoot 
13/01: closed

Work items:
No
Item
target version
status
005
Improvements on property descriptions 

not started
011
How speculative parsing works (combining choice and variable-occurence - 
currently these are separate) (from action 045)

awaiting completion of actions 045  
012
Reordering the properties discussion: move representation earlier, improve 
flow of topics 

not started 
036
Update dfdl schema with change properties 
ongoing

038
Improve length section including bit handling

some improvement in 036
042
Mapping of the DFDL infoset to XDM 
none
not required for V1 specification
069
ICU fractional seconds

070
Write DFDL primer 

071
Write test cases.

072
it is a processing error if the number of occurrences in the data does not 
match the value of the expression or prefix

073
Rename dfdl:separatorPolicy="required" to "always". 

074
- Last 'postFix' separator is not optional
- Terminators are mandatory.
- dfdl:documentFinalTerminatorCanBeMissing
- dfdl:documentFinalSeparatorCanBeMissing  (Action (70))

075
Remove occursCountKind="useAvailableSpace".

076
 dfdl:documentRoot,  will be defined that can only be on global elements.
The DFDL spec does not have to define the format of parameters to the DFDL 
processor but will indicate that it must be possible to adresss any 
element.
Agreed that ANY element within the schema cane be the starting point for 
parsing or unparsing.
dfdl:documentRoot no longer required

077
 'delimited' means the item is delimited by the item?s terminator (if 
specified) or an enclosing construct?s separator or end of the enclosing 
construct designated by its known length or its terminator.  
The definition of EndOfParent also needs improving.

078
document UPA checks

079
Restrictions on use of 'special' entities in regular expressions

080
LengthUnit=bits  (A056)

Alan Powell

 MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
 Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com 
 Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100114/9baa44f0/attachment-0001.html