[DFDL-WG] Minutes for OGF DFDL Working Group Call, March 03-2010

Alan Powell alan_powell at uk.ibm.com
Wed Mar 3 11:23:24 CST 2010


Open Grid Forum: Data Format Description Language Working Group

OGF DFDL Working Group Call, March 03-2010

Attendees
Suman Kalia (IBM) 
Steve Hanson (IBM) 
Alan Powell (IBM) 
Steve Marting (Progeny) 
Peter Lambros (IBM) 
Stephanie Fetzer (IBM)
Tim Kimber(IBM) 

Apologies
Mike Beckerle (Oco)

1.  16.2 scannablility with lengthKind pattern:   

In summary, you can use a data pattern on any element (complex, simple 
text, simple binary) as long as the bytes are legal in the stated 
encoding, which where binary data is involved in practice means an 8-bit 
ASCII encoding. 

Binary data can be handled using some of the conveniences of text by way 
of treating it as text with encoding="iso-8859-1". In this case literal 
text, such as length patterns, is interpreted as in the iso-8859-1 
character encoding, and the correspondence of byte values in the data to a 
string in the DFDL infoset is one to one. That is, byte with value N, 
produces an infoset character with character code N. 

The WG agreed to the solution of recommending the use of "iso-8859-1" for 
binary fields. The write-up should note that he main use case is for a 
complex element that contains binary fields.

2. Current Actions: 
Updated below

3 Steve H issues with draft 039 

1) Name of property dfdl:textNumberRepresentation is not consistent with 
dfdl:binaryNumberRep, dfdl:binaryFloatRep, etc. 
Agreed

2) The dfdl:numberPattern etc properties that have been moved from the 
defunct dfdl:textNumberFormat annotation to dfdl:element etc should be 
called dfdl:textNumberPattern etc. Otherwise users will think they apply 
to binary numbers too. 
Agreed

3) In section 14.3 on sequences, there are several sub-sections that talk 
about parsing according to different ways of specifying length (ie, 
lengthKind). But dfdl:sequence no longer carries dfdl:lengthKind so I 
think these sub-sections are not in the right place.  I think they should 
be in section 12, under the correct 12.3.x lengthKind sub-section. 
MB has moved these sections

4) Section 19 on built-in specifications. Given that we don't have any for 
public comment phase we should reword this section. 
Agreed


4 Tim's (major) issues with draft 039   

12.2 Delimiters: Text Markup 
- The term 'Delimiters' is  not accurate. Most readers will not think of 
an initiator as a 'delimiter'. 
- It's not 'Text' markup any more - especially since v0.39 has allowed 
lengthKind="delimited" for elements with binary representation. 
Title should be 'Markup' and explanation can then deal with what it really 
is, rather than justifying the innaccurate title :-) 
Will be changed to Delimiters - Add delimiters to glossary  (DFDLlimits 
delimiters to terminators and separator when dfdl:lengthKind=delimited)

Syntax for specifying markup: 
It's not clear from this description that each item in the space-separated 
list is a DFDL string literal. 
Section 6.3.1 will be revised to separate string literals and dfdl 
expressions etc.
If possible will add the list of property types (listof DFDL string 
literals, etc)  and update property tables

initiator ( and all other space-separated properties ) 
It is not clear whether the order of the space-separated properties 
matters. Must the parser test them in the order in which they are 
specified? 
( Q: What if %ES; is the first in the list? ) 
The parser must look for the longest first. Unparsing will output the 
first in the list.

The %ES entity should be not be allowed for initiator, separator and 
terminator. Should probably be limited to use in nilValues but will check 
whether it is useful in other properties..

occursStopValue should a value in the value space of the base type of the 
array.

terminator: 
is it OK if the final terminator is missing within the scope of a 
known-length parent? Seems like a reasonable extension of the rule ( in 
all other scenarios, the end of a known-length parent acts like the end of 
the data stream for items with its scope ). 

documentFinalTerminatorCanBeMissing: 
Let's try to avoid creating another property for the postfix separator 
scenario. I think this property provides a way of modelling the data 
naturally. 
We can recommend use of infix-with-a-terminator rather than 'postfix' if 
the final terminator can be missing. 

outputNewLine 
Should we validate that the 'characterOrCharacters' are all newline 
characters from the set described by the %NL; mnemonic? Otherwise the DFDL 
serializer will output data which cannot be parsed by the DFDL parser. 
Agreed

dfdl:lengthKind endOfParent 
'endOfParent' has almost the same meaning as 'delimited' so should have 
the same semantics. 
·        the item?s terminator (if specified) 
·        an enclosing construct?s separator or terminator 
·        the end of an enclosing construct designated by its known length 
·        the end of the data stream 
The effect would be the the element could be ended by the nearest known 
length parent not just the immediate parent. Also the immediate parent 
could have lengthKind 'implicit' 
Not discussed but emails indicate Agreed


choiceKind 'Fixed' 
When lengthKind='implicit' all alternative branches of the choice are 
padded to the fixed length of the largest one so that overall the entire 
choice construct is fixed length 

There must be a restriction that the length of at least one choice must be 
statically defined. 
Not discussed but emails indicate Agreed
5 DFDL v1 Specification completion. 

Draft 40 will be published early next week. Will be sent to OGF next week 
prior to OGF 28 
Meeting closed, 14:10

Next call  Wednesday 10 March January 2010  13:00 UK 

Next action: 084
Actions raised at this meeting

No
Action 








Current Actions:
No
Action 
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
09/12: no update
16/12: reminded dent to project manager
23/12: SH will send another reminder.
06/01: Another reminder will be sent
13/01: no update
20/01: no update
27/01: no progress
29/01: no progress
03/02: IBM is still investigating
10/02: IBM is still investigating
17/02: IBM is willing in principle to publish the test case format and 
some of the test cases. May need some time to build a 'compliance suite'
24/03: No progress
03/03: Discussions have been taking place on the subset of tests that will 
be provided.
080
AP:Clarify semantics of fn:poisition and fn:count
17/02: no progress
24/03: No progress
03/03: no progress. There are other functions which return a duration 
which need investgating

Closed actions
No
Action 
049
20/05 AP Built-in specification description and schemas
03/06: not discussed
24/06: No Progress
24/06: No Progress (hope to get these from test cases)
15/07: No progress. Once available, the examples in the spec should use 
the dfdl:defineFormat annotations they provide.
...
14/10: no progress
21/10: Discussed the real need for this being in the specification. It 
seemed that the main value is it define a schema location for downloading 
'known' defaults from the web. 
28/10: no progress
04/11: no progress
11/11: no update
18/11: no update
25/11: Agreed to try to produce for CSV and fixed formats
04/12: no update
09/12: no update
16/12: no update
23/12: no update
06/01: no progress. If there is no resource to complete this action it can 
be deferred
13/01:no progress
20/01: no progress
27/01: no progress
29/01: No progress.  The predefined formats do not need to be available 
when the spec is published.
Suman said that he had been mapping COBOL structures to DFDL and it didn't 
look as though the way text numbers are define is very usable. He will 
document for next call 
03/02: No progress
10/02: No progress
17/03: No progress
24/03: No progress
03/03: The wording of this section will be changed to say that 
'implementations may choose to provide predefined formats. The DFDL WG 
intends to supply a limited set'
Closed
079
MB:Encoding for binary fields when lenghtkind is pattern
17/02: Discussed but no conclusion
24/03: Mike has found an encoding that matches the first 255 codepoints of 
iso 10646. Will document its use for binary fields.
03/03: Wording in minutes agreed. 
Closed
083
MB:To correct syntax diagram for FinalUnused and suggest wording for the 
Sequence section
03/03: Mike has supplied updates to Sytax, length and Sequence sections.
Closed









Work items:
No
Item
target version
status
005
Improvements on property descriptions 

not started
012
Reordering the properties discussion: move representation earlier, improve 
flow of topics 

not started 
036
Update dfdl schema with change properties 
ongoing

042
Mapping of the DFDL infoset to XDM 
none
not required for V1 specification
070
Write DFDL primer 


071
Write test cases.


083
Implement RFC2116


084
MB:Encoding for binary fields when lenghtkind is pattern


085
MB:To correct syntax diagram for FinalUnused and suggest wording for the 
Sequence section


086
dfdl:textNumberRepresentation to dfdl:textNumberRep


087
dfdl:NumberPattern etc to dfdl:textNumberPattern  (not for calendarformat 
but check individual properties)


088
Update built-in specifications section.


089
12.2 Will be changed to Delimiters - Add delimiters to glossary 
(DFDLlimits delimiters to terminators and separator when 
dfdl:lengthKind=delimited)


090
Section 6.3.1 will be revised to separate string literals and dfdl 
expressions etc.
If possible will add the list of property types (listof DFDL string 
literals, etc)  and update property tables


091
The parser must look for the longest first. Unparsing will output the 
first in the list.
The %ES entity should be not be allowed for initiator, separator and 
terminator. Should probably be limited to use in nilValues but will check 
whether it is useful in other properties..


092
occursStopValue should a value in the value space of the base type of the 
array.


093
outputNewLine MUST be one of newline characters from the set described by 
the %NL; mnemonic


094
dfdl:lengthKind endOfParent 
'endOfParent' has almost the same meaning as 'delimited' so should have 
the same semantics. 
·        the item?s terminator (if specified) 
·        an enclosing construct?s separator or terminator 
·        the end of an enclosing construct designated by its known length 
·        the end of the data stream 
The effect would be the the element could be ended by the nearest known 
length parent not just the immediate parent. Also the immediate parent 
could have lengthKind 'implicit' 


095
ChoiceKinf fixed:There must be a restriction that the length of at least 
one choice must be statically defined. 


096
The WG agreed to the solution of recommending the use of "iso-8859-1" for 
binary fields. The write-up should note that he main use case is for a 
complex element that contains binary fields.









 
Regards

 
Alan Powell
 
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell at uk.ibm.com






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100303/d0f7f859/attachment-0001.html 


More information about the dfdl-wg mailing list