[DFDL-WG] Minutes for OGF DFDL Working Group Call, February 24-2010

Alan Powell alan_powell at uk.ibm.com
Wed Feb 24 10:11:35 CST 2010


Open Grid Forum: Data Format Description Language Working Group

OGF DFDL Working Group Call, February 24-2010

Attendees
Mike Beckerle (Oco) 
Suman Kalia (IBM) 
Steve Hanson (IBM) 
Alan Powell (IBM) 
Steve Marting (Progeny) 
Peter Lambros (IBM) 
Stephanie Fetzer (IBM)

Apologies
Tim Kimber(IBM) 


1. Remaining 037 review issues 

A:
16.2 scannablility with lengthKind pattern:   
Confirm that this is what we agreed 
In summary, you can use a data pattern on any element (complex, simple 
text, simple binary) as long as the bytes are legal in the stated 
encoding, which where binary data is involved in practice means an 8-bit 
ASCII encoding. 

Mike B
I found an official reference which has no "greyed out" codepoints. All 
256 values are "mapped". 
The following ftp table (see URL below) officially defines the mapping for 
8859-1 to unicode/iso10646.

The table includes all 256 codepoints - some are specified as just 
<control> i.e., have no specific meaning, but their 8859 codepoint maps 
one-to-one and onto a unicode/10646 codepoint with the same value.

Note that this property holds for 8859-1. It does not hold for 8859-2 to 
8859-16, as these have character codes substituted into them that map to 
other places in the iso10646 codepoint space.

Here's the correspondence table: 

ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT

If we reference this mapping table in the references of the DFDL spec, 
then I believe we can say that using encoding="iso-8859-1", you can treat 
binary data as textual, use patterns, etc., and the relationship to/from 
the infoset always insures preservation of the values of the bytes 
(parsing), and creation of bytes whose values exactly match the string 
codepoints (unparsing).

This language can be added to the section on lengthKind="pattern" and 
binary data: 

Binary data can be handled using some of the conveniences of text by way 
of treating it as text with encoding="iso-8859-1". In this case literal 
text, such as length patterns, is interpreted as in the iso-8859-1 
character encoding, and the correspondence of byte values in the data to a 
string in the DFDL infoset is one to one. That is, byte with value N, 
produces an infoset character with character code N.  [reference to above 
FTP site]. 


B: 
Glossary 

Variable-Occurrence Item - Optional elements have a variable number of 
occurrences (0 or 1) and arrays also can have a variable number of 
occurrences (when minOccurs < maxOccurs). So when we say an item with a 
variable number of occurrences, this can mean either an optional element, 
or an array where minOccurs < maxOccurs. In either array or optional 
elements, we have the additional constraint that the DFDL representation 
properties do not preclude a variable number of occurrences. When 
dfdl:occursCountKind='explicit' and dfdl:occursCount has a literal 
constant as its value, or an expression that statically evaluates to a 
constant, then the DFDL properties are specifying exactly the number of 
occurrences for all instances and so are said to preclude a variable 
number of occurrences. If dfdl:occursCount has a formula as its expressed 
value, then the DFDL properties do not preclude a variable number of 
occurrences. 
MikeB Comment: 
This idea that you can have minOccurs < maxOccurs, but dfdl:occurs is 
equal to a constant and dfdl:occursKind="explicit" is causing us a bunch 
of grief in these definitions. 
Can we be conservative and just say it is a schema definition error if 
minOccurs < maxOccurs but the length is static, i.e., an explicit 
constant-valued expression? 
WG decided that the wording can remain as currently written

C: 
DFDL Schema Component Model 


Only changes needed are: remove wildcards.
Add following to describe shading:
The shaded boxes have a direct corresponding element syntax and therefore 
appear in DFDL schema
D: 
Sequence Groups 
Mike B: To correct syntax diagram for FinalUnused and suggest wording for 
the Sequence section
 

E: Check other comments in document. 
Please look at the remaining comments in draft 039 and suggest solutions

2. Go through Actions 
Updated below
3 DFDL v1 Specification completion. 
Draft 039 will be publish today. 
WG review and Comments by 3 March 
Draft 40 with updates for OGF submission - available 5 March 
Meeting closed, 14:00

Next call  Wednesday 3 March January 2010  13:00 UK 

Next action: 084
Actions raised at this meeting

No
Action 
083
MB:To correct syntax diagram for FinalUnused and suggest wording for the 
Sequence section








Current Actions:
No
Action 
049
20/05 AP Built-in specification description and schemas
03/06: not discussed
24/06: No Progress
24/06: No Progress (hope to get these from test cases)
15/07: No progress. Once available, the examples in the spec should use 
the dfdl:defineFormat annotations they provide.
...
14/10: no progress
21/10: Discussed the real need for this being in the specification. It 
seemed that the main value is it define a schema location for downloading 
'known' defaults from the web. 
28/10: no progress
04/11: no progress
11/11: no update
18/11: no update
25/11: Agreed to try to produce for CSV and fixed formats
04/12: no update
09/12: no update
16/12: no update
23/12: no update
06/01: no progress. If there is no resource to complete this action it can 
be deferred
13/01:no progress
20/01: no progress
27/01: no progress
29/01: No progress.  The predefined formats do not need to be available 
when the spec is published.
Suman said that he had been mapping COBOL structures to DFDL and it didn't 
look as though the way text numbers are define is very usable. He will 
document for next call 
03/02: No progress
10/02: No progress
17/03: No progress
24/03: No progress
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
09/12: no update
16/12: reminded dent to project manager
23/12: SH will send another reminder.
06/01: Another reminder will be sent
13/01: no update
20/01: no update
27/01: no progress
29/01: no progress
03/02: IBM is still investigating
10/02: IBM is still investigating
17/02: IBM is willing in principle to publish the test case format and 
some of the test cases. May need some time to build a 'compliance suite'
24/03: No progress
079
MB:Encoding for binary fields when lenghtkind is pattern
17/02: Discussed but no conclusion
24/03: Mike has found an encoding that matches the first 255 codepoints of 
iso 10646. Will document its use for binary fields.
080
AP:Clarify semantics of fn:poisition and fn:count
17/02: no progress
24/03: No progress
081
AP: Inf and Nan
The description is the way ICU behaves but need clarification. It isn't 
clear how inf and Nan are represented in the infoset. Need to investigate 
if XML allows these values
17/02: XML allows Nan and inf for float and double Dfdl will do the same. 
Requires more investigation of ICU. 
24/03: Alan send clarification that Inf and Nan are limited to float and 
Double. Closed.
083
MB:To correct syntax diagram for FinalUnused and suggest wording for the 
Sequence section

Closed actions
No
Action 
081
AP: Inf and Nan
The description is the way ICU behaves but need clarification. It isn't 
clear how inf and Nan are represented in the infoset. Need to investigate 
if XML allows these values
17/02: XML allows Nan and inf for float and double Dfdl will do the same. 
Requires more investigation of ICU. 
24/03: Alan send clarification that Inf and Nan are limited to float and 
Double. Closed.













Work items:
No
Item
target version
status
005
Improvements on property descriptions 

not started
012
Reordering the properties discussion: move representation earlier, improve 
flow of topics 

not started 
036
Update dfdl schema with change properties 
ongoing

042
Mapping of the DFDL infoset to XDM 
none
not required for V1 specification
069
ICU fractional seconds
039

070
Write DFDL primer 


071
Write test cases.


072
it is a processing error if the number of occurrences in the data does not 
match the value of the expression or prefix
039


073
Rename dfdl:separatorPolicy="required" to "always". 
039
Defferred untilaction 071 agreed
078
document UPA checks
039

079
Semantics of length=0, nil handling and defaults. (A071)
039

080
Tlog: Allow LengthKind delimited for packed/bcd (A074)
039

081
Update empty sequence section (A075)
039

082
semantics of minOccurs= 0 on choice branches (A076)
039

083
Implement RFC2116


084
Length|Kind pattern scanability rules
039

085
Invalid character substitution
039

086
infoset round tripi: Rephrase sentence 'It is possible to define a schema 
so that when infoset unparsed and the datastream reparsed, the same 
infoset will be produced'
039

087
Clarify use of relative paths in global components.
039

088
'DFDL expression'
039

089
Ageed that dfdl:represetnation 'text' is implied for strings and 
dfdl:represetnation 'binary' is implied for hexbinary
039

091
textStringPadCharacter textNumberPadCharacter  must be a 1 byte character 
if the char set encoding is variable width?
039

092
 finalDocumentTerminatorCanBeMissing and  
finalDocumentSeparatorCanBeMissing allowed only in 'default' format 
039

093
 remove textNumberFormat and textCalendarFormat.
039

094
Alignment should be  1 based
039

095
AP: Inf and Nan
039




 
Regards

 
Alan Powell
 
Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell at uk.ibm.com






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100224/f2e28f7f/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 25343 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20100224/f2e28f7f/attachment-0001.gif 


More information about the dfdl-wg mailing list