[DFDL-WG] One email or a flock or... - re: 10.03 draft - open review items - update

Wed Oct 17 06:56:53 EDT 2012

I have added some comments in-line to reflect the WG call on Tuesday, we 
will continue on Friday.

(Andy - please see 5 below)

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848

From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Hanson/UK/IBM at IBMGB
Date:   02/10/2012 18:41
Subject:        One email or a flock or... - re: 10.03 draft - open review 
items

Steve,

I've got the issues below left after your review pass on 10.03, minus 2 I 
send emails to you about separately.

Should I issue this email to the WG, or do you want me to decompose this 
into separate emails, or do you just want to list these as agenda topics 
for next call? I think it is good if people get to look at them in advance 
of a call. 

...mikeb

--------------------------------------------------------

This is a list of items left open after a review pass by SMH(on draft in 
preparation r010.03).

These items need specific WG discussion on a call. They may be small 
enough to resolve there, or may be escalated into action items. (A couple 
issues already clearly action-item related are not listed here.)

Note: please Ignore the identifiers like SMH107 or m236 I'm tagging these 
with. Those are just for me editing the text. (Those change ...grrr... if 
someone inserts a comment into the document, so they're not good issue 
identifiers).

1.      SMH107   Spec says: When the separator and terminator on a group 
have the same value, then at a point where either separator or terminator 
could be found, the separator is tried first.
Issue is that this language still feels ambiguous. E.g., So it tries the 
separator first, let’s say it finds it. Will a subsequent processing error 
cause it to backtrack and revisit this and try the terminator? Or does 
finding the separator confirm that it IS a separator, resolve forever that 
point of uncertainty? I believe the latter is what was intended (delimiter 
decisions drive parsing and are not revisited), but we need to state this 
(or do we somewhere else already?)
SMH: Would like to discuss this when Tim is present, as need to see what 
IBM DFDL does with separators when backtracking.

2.      SMH169 - Some numeric types are signed, others unsigned. Some 
representations are sign-capable, some are not (BCD specifically). Right 
now spec draft says you can't have bcd as rep for signed integer types 
long, int, short, byte. But you CAN have bcd for rep of decimal, integer. 
We could allow bcd only for nonNegativeInteger type, but there is no 
nonNegativeDecimal type, so....how to resolve? I would suggest that we 
simply allow bcd as rep for both signed and unsigned types, and it's a 
processing error to unparse a negative value into bcd rep.
        SMH: Noted that for a decimal, property decimalSigned is used to 
indicate whether the logical value is signed or not. So we could disallow 
BCD for integer and for decimal when decimalSigned is 'yes'. 
Interestingly, section 3.7.1 states "Signed numbers with 
dfdl:binaryNumberRep 'bcd' are always positive. On unparsing it is a 
processing error if the data is negative." which is admitting that BCD can 
be used with signed types. IBM DFDL currently implements the table in the 
description of binaryNumberRep, and so allows BCD for integer and decimal 
regardless of decimalSigned, but does not allow long, short, int, byte. 

3.      m229 - textStandardZeroRep - should this allow %ES; as one of the 
list of possibles?
        SMH: Decided not to allow %ES; because it adds some complexity to 
the 'empty representation' processing rules, in the same way that 
xs:string and xs:hexBinary do. Can always make an element required and use 
default of 0.

4.      m236 - is V (virtual decimal point position) and also P allowed in 
the textNumberPattern for double and float types? 
        SMH: Post-call investigation: Errata 2.80 says they are allowed, 
but not in conjunction with E, @ and * symbols. This is reflected in BNF 
as subpattern := prefix? ((number exponent?) | vpinteger) suffix?

5.      m237 - Do we check that the various symbols used for infinity, 
digits, grouping separators, decimal separators are properly distinct to 
allow parsing? E.g., that the decimal separator and grouping separator 
aren't the same, and that the positive and negative pattern variants are 
distinguishable? ICU library supposedly doesn't do this checking. Do we 
state this is an SDE in DFDL. If so then is this checking required? Can we 
make it possible for implementations to not check somehow? Other grammar 
ambiguity situations like separator and terminator being ambiguous are 
specifically NOT checked for, because determining if a grammar is 
ambiguous is hard or undecidable, and would have to be done at runtime 
because delimiters can be run-time computed. Buf for the syntax components 
of text numbers do we require checking or not?
        SMH: Post-call investigation: IBM DFDL gives an error if the 
decimal & grouping separators are the same, but does not check any of the 
other characters for uniqueness.  (Andy - please can you check what ICU 
does if you set various of the text number characters to be the same 
value, eg, decimal sep, grouping sep, exponent and (for floats) Nan and 
Inf reps ?, in both strict & lax modes)

6.      m370 - multiple PoU resolutions: If you have initiatedContent, AND 
a choiceBranchRef, AND a discriminator all on the same element, and there 
are 3 enclosing nested PoU, which one controls which? Precedence is the 
issue. Or..... do we really need to allow this? Why don’t we just disallow 
this kind of piling-on of complexity and make the user choose which PoU 
resolution technique they want?
        SMH: a) initiatedContent 'yes' on a choice/sequence, and a 
discriminator on a child of the choice/sequence. Allowed at the moment, 
and it is quite possible that users of IBM DFDL have this combination, so 
I would prefer not to make this an SDE (it's a difficult check to get 
right anyway due to the nature of discriminator placement). Issue a 
warning if a discriminator found on a direct child of the choice?
        b) initiatedContent 'yes' and choiceBranchRef together on a 
choice. Yes one is redundant, but remember that initiatedContent could 
have been obtained via scoping rules, so if we made the combination an SDE 
then we would be forcing users to explictly set initiatedContent to 'no'. 
I'd be ok with ignoring initiatedContent's PoU resolution behaviour if 
choiceBranchRef was present on a choice (note that choiceBranchRef can not 
be scoped).
        c) choiceBranchRef on choice and a discriminator on a child of the 
choice. Issue a warning if a discriminator found on a direct child of the 
choice?

7.      m396 - is BCD representation a mandatory feature, or optional?
        SMH: BCD calendars and BCD numbers are independent optional 
features. Will update errata document to make this clear.

8.      m398 - portability at risk if subset processors ignore properties 
they don't implement. We relaxed this from a more rigid policy, and now 
allow subsets to not validate properties they don't implement. However, is 
there a better compromise, e.g., require a warning about all 
unimplemented/unrecognized properties? E.g., dfdl:textBiDi='no' yields SDE 
"unrecognized property 'textBiDi' with value 'no'.
        SMH: Agreed that this had been relaxed too much and a warning MUST 
be issued by implementations. 

-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121017/cc615c52/attachment-0001.html>