[DFDL-WG] Minutes for OGF DFDL Working Group Call, March 31-2010

Alan Powell alan_powell at uk.ibm.com
Wed Mar 31 12:01:14 CDT 2010


Open Grid Forum: Data Format Description Language Working Group

OGF DFDL Working Group Call, March 31-2010

Attendees
Steve Hanson (IBM) 
Alan Powell (IBM) 
Steve Marting (Progeny) 
Stephanie Fetzer (IBM)
Mike Beckerle (Oco)
Tim Kimber(IBM) 

Apologies
Suman Kalia (IBM)


 0 DFDL specification status

The draft specification was passed by the OGF technical committee on March 
30th so will now be made available for public comment.


 1 Nils, defaults and unparsing 
The semantics need clarifying 

I have reorganized the table 

Logical Value

initiator region contains
content region contains
Nil
(implies nillable)
nilValueInitiatorPolicy 
'prohibited'
empty
representation of nil 
nilValueInitiatorPolicy 
'required'
initiator string
"" (empty string)
(type is xs:string or xs:hexBinary)
missingValueInitiatorPolicy 
'prohibited'
empty
empty 
missingValueInitiatorPolicy 
'required'
initiator string 
missing
useNilForDefault
nilValueInitiatorPolicy 
'prohibited'
empty
representation of nil 
nilValueInitiatorPolicy 
'required'
initiator string
default is empty string
(type is xs:string or xs:hexBinary)
missingValueInitiatorPolicy  'prohibited'
empty
empty
missingValueInitiatorPolicy 'required'
initiator string
Other default 

initiator string
 representation of the default value.
a non-nil non-empty-string value

initiator string
representation of the logical value

Note: If nilValue is the empty string then missingValueInitiatorPolicy is 
not used. 

Other changes: 

missingValueInitiatorPolicy 
Enum 
Valid values ?required', ?prohibited' 

Specifies whether to expect an initiator when an element is missing. 
Ignored unless dfdl:initiator is specified and is not "" (empty string). 

'required'- Indicates that the dfdl:initiator followed by empty content is 
the required syntax to indicate that the element is missing. 
  
?prohibited' - Indicates that empty content is the required syntax to 
indicate that the element is missing. The presence of an initiator implies 
that real content must follow. 
Use of ?prohibited' implies an ordered sequence. If used on an initiated 
element of an unordered group it is a schema definition error. 

If the element is required, defaulting occurs as defined above. 
This property also applies on unparsing, when the data to be written 
(after nil value and default value processing) is empty content.

Annotation: dfdl:element (string or hexbinary)

The WG agreed that the unparsing table was correct though there was some 
discussion and misunderstanding about the comment that a nilValue of the 
empty string did  not use missingValueInitiatorPolicy. The comment was 
asking fro agreement from reviewers and does not add any new rule to the 
tavle. The meaning of the columns needs a better explanation.

SH pointed out that missingValueInitiatorPolicy applies to all simple 
types not just string and hexbinary and to complex types. The description 
implies parsing and needs to include unparsing.

There was a lot of discussion about paring behaviour. In particular 
whether there needs to be a differentiation between 'zero length content' 
and 'missing from the data stream'. 

Action: Tim will document the parsing use cases so we can verify the 
parsing table.


2 dfdl:choiceKind 'fixedLength'


The main issues  are: 
a) The calculation of the length of the longest branch is not obvious. 
b) The length units to use - the dfdl:lengthUnits property does not exist 
on a choice 
c) The name could be better 

Proposal is therefore to retain the property but to: 
i) State the conditions that must apply to use this property, and enforce 
them in the validator => schema definition error otherwise 
ii) Decouple the choice from its parent by calculating the length of each 
branch based solely on the properties of the branches components, 
irrespective of any parent dfdl:lengthKind 

The use cases are COBOL redefines. C unions and PL/1 ?

Restrictions:
- At least one of the choice branches must be of calculable length. (Tim 
to provide definition of calculable) 

To be calculable 
- minOccurs must equal maxOccurs for all children 
- All lengths will be calculated in bytes. All encodings must be fixed 
width.

Actions:
- investigate how PL/1 allocates storage for varchars
- investigate how COBOL allocates storage for 'occurs dependingon' and how 
these are modelled in DFDL.


3 dfdl:occursCountKind 'expression' 

When dfdl:occursCountKind is 'expression' the occursCount should only be 
used when parsing. On unpasring the number of occurrences is specified by 
minOccurs etc 
  
Note this behaviour is different from dfdl:lengthKind. 

Agreed that the dfdl:occursCountKind 'expression' is only used on parsing.

4 Unsigned decimal 

Whether a logical type is signed or not is used to determine whether a 
sign is output in some representations. However there isn't an unsigned 
decimal or integer so these will always be output with a sign so COBOL 
declarations such as 
05  CFCFDN-FLD12             PIC  9(05)V99 COMP-3. 

cannot be supported. 

Proposed solution: 

- Allow xs:nonNegativeInteger which enables unsigned unbounded integers to 
be modelled.  The problem to solve is then just for xs:decimal. 

- Call the new property dfdl:decimalSigned. It only applies to xs:decimal 
or user defined restrictions thereof.  It applies to all physical 
decimals, as its name implies (not just zoned or packed). 


We considered added a dfdl:unsignedDecimal type but rejected it as causing 
confusion between dfdl type restrictions and user restrictions. 

The preferred solution is as proposed. Action Mike to agree.

5 dfdl:separatorPoilicy. 

dfdl:separatorPolicy enumeration definitions are not complete as they do 
not discuss arrays or defaulting. 

separatorPolicy="required" 
All the separators must be there.
No group member, except for required members with a default,  may be 
omitted from the data stream. 
Every group member must have a separator in the correct position. 
Optional members and variable-occurrence arrays will be allowed, 
separators will be expected for all maxOccurs 
** It is a schema definition error if maxOccurs="unbounded" for any member 
of the group 

separatorPolicy="suppressed" 
Any group member ( optional or required with a default) can be omitted 
from the data stream, in which case its separator must  may also be 
omitted. 

For prefix and infix, if a separator is found then another group member is 
expected. 
For postfix, if enclosing markup is not found immediately after a postfix 
separator then another group member is expected. 

Enclosing markup or end of data can occur where an infix or prefix 
separator is expected. This terminates the group. 
Enclosing markup or end of data can occur immediately after a postfix 
separator (if dfdl:documentfinalSeparatorCanBeMissing is 'yes' then the 
postfix separator may be missing at the end of the datastream). This 
terminates the group. 

If the group terminates before all of its members have been parsed then 
any required members in the remainder of the group must be defaulted into 
the infoset. 
The entire group can have a zero-length representation, which is indicated 
by enclosing markup at the start of the group's content region. 

If another group member is expected and the next group member is a point 
of uncertainty then the identity of the next group member must be 
determined 
by resolving the point of uncertainty as described in section XXX. 

If another group member is expected but no group member is found in the 
data stream then it is a processing error. 
If another group member is expected but all group members have been parsed 
then it is a processing error. 

separatorPolicy="suppressedAtEnd" 
suppressAtEnd has the same rules as 'required' up to the last required 
member that does not have a default, then it has the same rules as 
'suppressed' 

Open questions 
1. If a group has separatorPolicy='required' and one of its optional 
members has a zero-length representation, and that member is allowed to 
have a zero-length representation, what should the parser do?   
A similar question arises for required, defaultable elements with a 
zero-length representation when they are based on a type with 
xs:minLength="0". Should the default value be used, or the empty string? 
2. How should the occursXXX properties be applied when there are other 
ways to determine the number of occurences ( e.g. by counting separators 
)? 
3. What algorithm should the parser use to decide whether to default a 
complex element or group ( when the element/group is a child of a 
separated group)? 
4. How does unparsing work for separated groups in all of these scenarios? 



The changes above were suggested. The unparsing behaviour needs adding.
There was discussion about defaulting of complex elements and when this 
should occur. More discussion is needed.


6 Current Actions 
Updated below
Meeting closed, 15:00

Next call  Wednesday 7 April January 2010  13:00 UK  (8:00 ET)

Next action: 091
Actions raised at this meeting

No
Action 
088
define semantics of choiceKind 'fixedLength' 
31/03: TK to provide definition of calculable length.
Investigate  PL/I varchars and Cobol occurs dependingon.
089
Need to be able to define unsigned decimal and integer
31/03: MB to agree proposed solution
090
Semantics of separatorPolicy


Current Actions:
No
Action 
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
09/12: no update
16/12: reminded dent to project manager
23/12: SH will send another reminder.
06/01: Another reminder will be sent
13/01: no update
20/01: no update
27/01: no progress
29/01: no progress
03/02: IBM is still investigating
10/02: IBM is still investigating
17/02: IBM is willing in principle to publish the test case format and 
some of the test cases. May need some time to build a 'compliance suite'
24/03: No progress
03/03: Discussions have been taking place on the subset of tests that will 
be provided.
10/03: work is progressing
17/03: work is progressing
31/03: work is progressing
084
Check behaviour of dfdl:inputValueCalc and outputValueCalc.
085
ALL: publicize Public comments phase to ensure a good review..
086
AP: Nils and Defaults during unparsing - update table
31/03: TK to documetn use cases for parsing
088
define semantics of choiceKind 'fixedLength' 
31/03: TK to provide definition of calculable length.
Investigate  PL/I varchars and Cobol occurs dependingon.
089
Need to be able to define unsigned decimal and integer
31/03: MB to agree proposed solution
090
Semantics of separatorPolicy

Closed actions
No
Action 







Work items:
No
Item
target version
status
005
Improvements on property descriptions 

not started
012
Reordering the properties discussion: move representation earlier, improve 
flow of topics 

not started 
036
Update dfdl schema with change properties 
ongoing

042
Mapping of the DFDL infoset to XDM 
none
not required for V1 specification
070
Write DFDL primer 


071
Write test cases.


083
Implement RFC2116


097
Remove functions that returns duration


098
occursCountKind is parsing only






 
Regards

 
Alan Powell
 
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell at uk.ibm.com






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100331/ccbea35c/attachment-0001.html 


More information about the dfdl-wg mailing list