[DFDL-WG] Minutes for OGF DFDL Working Group Call, February 02 & 03-2010

Thu Feb 4 10:55:56 CST 2010

Open Grid Forum: Data Format Description Language Working Group

OGF DFDL Working Group Call, February 02 & 03-2010

Attendees
Mike Beckerle (Oco) 
Steve Hanson (IBM) 
Alan Powell (IBM) 
Suman Kalia (IBM) 
Peter Lambros (IBM) 
Tim Kimber(IBM) 

Apologies
Stephanie Fetzer (IBM)
Steve Marting (Progeny) 

1. Discriminators 

Discussed the 'parent exists' and 'component exists' options for 
dfdl:discriminator semantics. The WG agreed that DFDL would adopt the 
'component exists' semantics where the discriminator indicates that the 
component it is on exists and does not say anything about the components 
parent.

Examples need minor changes and full syntax.  Arrays would be the same as 
optional components.

Discriminators are not allowed on 
- Global groups and the top level sequence or choice of a global group.
- Global element decalrations
- The top level group of a complex type.
- Anonymous groups other than when it is the top level of a choice branch.

The discriminator timing property was also discussed. It was agreed that a 
parser should be able to tell when it is possible to evaluate the 
expression so the timing property will be removed.
The following will be added instead 'The expression will be evaluated when 
the referenced elements are known to exist or known not to exist.'

Timing on asserts was also mention but deferred for the time being.

Alan will update the 'component exists' proposal.

1. Action 077 Cobol and numberFormats
Suman highlighted a problem using textNumberFormat for COBOL numbers where 
a separate textNumberFormat is required for each length. This is because a 
different numberPattern is required for each one.  It was suggested that 
numberPattern was move from textNumberFormat to the standard properties 
but it was decided to get rid of textNumberFormat altogether and move all 
the properties. Tim pointed out that this meant that is was now not 
possible to just vary the number properties so there may be more 
definedFormats but this was not felt to be a problem.

For consistency it was agreed that textCalendarFormat would also be 
removed. EscapeScheme is different so will be retained.

2. Remaining 037 review issues 

2. I agree with the existing comment that the RFC2119 key words should be 
upper case. Agreed that the RFC2119 keywords should be in upper case and 
wherever possible the spec should be reworded to use the key words.

16.2 scannablility with lengthKind pattern:  One use case it to find the 
end of of a character element by looking for 'binary' bytes of the 
following element.
Discussed whether dfdl:lengthKind pattern should be allowed for binary 
elements. Pattern scanning inherently treats the data as characters but 
for single byte encodings a binary byte can be specified by its character 
codepoint.   Agreed that dfdk:lengthKind pattern will be allowed for 
binary elements when the encoding is US-ASCII. (why didn't we say any 
single byte character set?)
Section 16.2 says that that the children on a complex type must not change 
the encoding. This will be relaxed when encoding is US-ASCII.

DFDL raw entities will not be allowed in a pattern.

Tracker Issue: illegal character encodings for parsing and unparsing. 
What should DFDL do when it finds illegal bytes when converting strings 
during parsing and unparsing? Discussed adding a new property to declare a 
substitution character. Discussed what products do and ICU. Decided to 
follow ICU.
Conversions to Unicode (during parsing) will substitue characters that 
cannot be converted with the unicode substitute character (U+FFFD). When 
converting from Unicode (during unparsing) they will be substituted with 
the substitute character for that encoding, for example 0x1A (Control-Z) 
for ASCII.

Tracker Issue: Processing-time Schema Definition Errors 
Mike will reword Section 2.3.1

Tim will supply rules for the  order that terminators and separators are 
look for in the data stream
Tracker Issue: "round trip" for infoset. Should we omit the whole point? 
Rephrase sentence 'It is possible to define a schema so that when infoset 
unparsed and the datastream reparsed, the same infoset will be produced'

3. Go through Actions 

Meeting closed, 15:00

Next call  Tuesday 10 February January 2010  13:00 UK 

Next action: 079
Actions raised at this meeting

No
Action 
078
MB: Reword section 2.3.1 incorporating markup order rules.

Current Actions:
No
Action 

045
20/05 AP: Speculative Parsing
27/05: Psuedo code has been circulated. Review for next call
03/06: Comments received and will be incorporated
09/06: Progress but not discussed
17/06: Discussed briefly
24/06: No Progress
01/07: No Progress
15/07: No progress. MB not happy with the way the algorithm is documented, 
need to find a better way.
29/07: No Progress 
05/08: No Progress. Will document behaviour as a set of rules.
12/08: No Progress 
...
16/09: no progress
30/09: AP distributed proposal and others commented. Brief discussion AP 
to incorporate update and reissue
07/10: Updated proposal was discussed.Comments will be incorporated into 
the next version.
14/10: Alan to update proposal to include array scenario where minOccurs > 
0
21/10: Updated proposal reviewed
28/10: Updated proposal reviewed see minutes
04/11: Discussed semantics of disciminators on arrays. MB to produce 
examples
11/11: Absorbing action 033 into 045.  Maybe decorated discrminator kinds 
are needed after all. MB and SF to continue with examples. 
18/11: Went through WTX implementation of example. SF to gather more 
documentation about WTX discriminator rules.
25/11: Further discussion. Will get more WTX documentation. Need to 
confirm that no changes need to Resolving Uncertainty doc.
04/11: Further discussion about arrays.
09/12: Reviewed proposed discriminator semantic.
16/12: Reviewed discriminator examples and WTX semantic.
23/12: SF to provide better description of WTX behaviour and invite B 
Connolley to next call
06/01:B Connolly not available. SF to provide more complete description.
13/01: Stephaine took us through a description of WTX identifiers. Mike 
agreed to write up in DFDL terms.
20/01: Mike will write up
27/01: further discussion of discriminators
29/01: Alan had  emailed both proposals but not enough time to discuss
02/02: Agreed to adopt 'component exists' semantics for discriminators
049
20/05 AP Built-in specification description and schemas
03/06: not discussed
24/06: No Progress
24/06: No Progress (hope to get these from test cases)
15/07: No progress. Once available, the examples in the spec should use 
the dfdl:defineFormat annotations they provide.
...
14/10: no progress
21/10: Discussed the real need for this being in the specification. It 
seemed that the main value is it define a schema location for downloading 
'known' defaults from the web. 
28/10: no progress
04/11: no progress
11/11: no update
18/11: no update
25/11: Agreed to try to produce for CSV and fixed formats
04/12: no update
09/12: no update
16/12: no update
23/12: no update
06/01: no progress. If there is no resource to complete this action it can 
be deferred
13/01:no progress
20/01: no progress
27/01: no progress
29/01: No progress.  The predefined formats do not need to be available 
when the spec is published.
Suman said that he had been mapping COBOL structures to DFDL and it didn't 
look as though the way text numbers are define is very usable. He will 
document for next call 
03/03: No progress
066
Investigate format for defining test cases
25/11:IBM to see if it is possible to publish its test case format.
04/12: no update
09/12: no update
16/12: reminded dent to project manager
23/12: SH will send another reminder.
06/01: Another reminder will be sent
13/01: no update
20/01: no update
27/01: no progress
29/01: no progress
03/02: IBM is still invetsigating
077
SKK:  mapping of COBOL numbers to textNumberFormats.
03/02: Suman documented the problem. Agreed to remove textNumberFormat and 
textCalendarFormat.
078
MB: Reword section 2.3.1 incorporating markup order rules.

Closed actions
No
Action 

Work items:
No
Item
target version
status
005
Improvements on property descriptions 

not started
012
Reordering the properties discussion: move representation earlier, improve 
flow of topics 

not started 
036
Update dfdl schema with change properties 
ongoing

042
Mapping of the DFDL infoset to XDM 
none
not required for V1 specification
069
ICU fractional seconds
039

070
Write DFDL primer 

071
Write test cases.

072
it is a processing error if the number of occurrences in the data does not 
match the value of the expression or prefix
039

073
Rename dfdl:separatorPolicy="required" to "always". 
039
Defferred untilaction 071 agreed
078
document UPA checks
039

079
Semantics of length=0, nil handling and defaults. (A071)
039

080
Tlog: Allow LengthKind delimited for packed/bcd (A074)
039

081
Update empty sequence section (A075)
039

082
semantics of minOccurs= 0 on choice branches (A076)
039

083
Implement RFC2116

084
Length|Kind pattern scanability rules

085
Invalid character substitution

Regards

Alan Powell

Development - MQSeries, Message Broker, ESB
IBM Software Group, Application and Integration Middleware Software
-------------------------------------------------------------------------------------------------------------------------------------------
IBM
MP211, Hursley Park
Hursley, SO21 2JN
United Kingdom
Phone: +44-1962-815073
e-mail: alan_powell at uk.ibm.com

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100204/74d68c42/attachment-0001.html