[DFDL-WG] One email or a flock or... - re: 10.03 draft - open review items - update

Mike Beckerle mbeckerle.dfdl at gmail.com
Mon Oct 22 12:29:13 EDT 2012


Suggest wording for issue 1 below. Change in italics:

When the separator and terminator on a group have the same value, then at a
point where either separator or terminator could be found, the separator is
tried first. *(Speculative execution may try the terminator subsequently.)*



On Wed, Oct 17, 2012 at 6:56 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> I have added some *comments* in-line to reflect the WG call on Tuesday,
> we will continue on Friday.
>
> (Andy - please see 5 *below*)
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        Steve Hanson/UK/IBM at IBMGB
> Date:        02/10/2012 18:41
> Subject:        One email or a flock or... - re: 10.03 draft - open
> review items
> ------------------------------
>
>
>
> Steve,
>
> I've got the issues below left after your review pass on 10.03, minus 2 I
> send emails to you about separately.
>
> Should I issue this email to the WG, or do you want me to decompose this
> into separate emails, or do you just want to list these as agenda topics
> for next call? I think it is good if people get to look at them in advance
> of a call.
>
> ...mikeb
>
> --------------------------------------------------------
>
> This is a list of items left open after a review pass by SMH(on draft in
> preparation r010.03).
>
> These items need specific WG discussion on a call. They may be small
> enough to resolve there, or may be escalated into action items. (A couple
> issues already clearly action-item related are not listed here.)
>
> Note: please Ignore the identifiers like SMH107 or m236 I'm tagging these
> with. Those are just for me editing the text. (Those change ...grrr... if
> someone inserts a comment into the document, so they're not good issue
> identifiers).
>
> 1.        SMH107   Spec says: When the separator and terminator on a
> group have the same value, then at a point where either separator or
> terminator could be found, the separator is tried first.
>
>    - Issue is that this language still feels ambiguous. E.g., So it tries
>    the separator first, let’s say it finds it. Will a subsequent processing
>    error cause it to backtrack and revisit this and try the terminator? Or
>    does finding the separator confirm that it IS a separator, resolve forever
>    that point of uncertainty? I believe the latter is what was intended
>    (delimiter decisions drive parsing and are not revisited), but we need to
>    state this (or do we somewhere else already?)
>
> *SMH: Would like to discuss this when Tim is present, as need to see what
> IBM DFDL does with separators when backtracking*.
>
> 2.        SMH169 - Some numeric types are signed, others unsigned. Some
> representations are sign-capable, some are not (BCD specifically). Right
> now spec draft says you can't have bcd as rep for signed integer types
> long, int, short, byte. But you CAN have bcd for rep of decimal, integer.
> We could allow bcd only for nonNegativeInteger type, but there is no
> nonNegativeDecimal type, so....how to resolve? I would suggest that we
> simply allow bcd as rep for both signed and unsigned types, and it's a
> processing error to unparse a negative value into bcd rep.
>         *SMH: Noted that for a decimal, property decimalSigned is used to
> indicate whether the logical value is signed or not. So we could disallow
> BCD for integer and for decimal when decimalSigned is 'yes'.
>  Interestingly, section 3.7.1 states "**Signed numbers with
> dfdl:binaryNumberRep 'bcd' are always positive. On unparsing it is a
> processing error if the data is negative.**" which is admitting that BCD
> can be used with signed types. IBM DFDL currently implements the table in
> the description of binaryNumberRep, and so allows BCD for integer and
> decimal regardless of decimalSigned, but does not allow long, short, int,
> byte. *
>
> 3.        m229 - textStandardZeroRep - should this allow %ES; as one of
> the list of possibles?
> *        SMH: Decided not to allow %ES; because it adds some complexity
> to the 'empty representation' processing rules, in the same way that
> xs:string and xs:hexBinary do. Can always make an element required and use
> default of 0.*
>
> 4.        m236 - is V (virtual decimal point position) and also P allowed
> in the textNumberPattern for double and float types?
> *        SMH: Post-call investigation: Errata 2.80 says they are allowed,
> but not in conjunction with E, @ and * symbols. This is reflected in BNF as
> **subpattern := prefix? ((number exponent?) | vpinteger) suffix?*
>
> 5.        m237 - Do we check that the various symbols used for infinity,
> digits, grouping separators, decimal separators are properly distinct to
> allow parsing? E.g., that the decimal separator and grouping separator
> aren't the same, and that the positive and negative pattern variants are
> distinguishable? ICU library supposedly doesn't do this checking. Do we
> state this is an SDE in DFDL. If so then is this checking required? Can we
> make it possible for implementations to not check somehow? Other grammar
> ambiguity situations like separator and terminator being ambiguous are
> specifically NOT checked for, because determining if a grammar is ambiguous
> is hard or undecidable, and would have to be done at runtime because
> delimiters can be run-time computed. Buf for the syntax components of text
> numbers do we require checking or not?
> *        SMH: Post-call investigation: IBM DFDL gives an error if the
> decimal & grouping separators are the same, but does not check any of the
> other characters for uniqueness.  **(Andy - please can you check what ICU
> does **if you set various of the text number characters to be the same
> value, eg, decimal sep, grouping sep, exponent and (for floats) Nan and Inf
> reps ?, in both strict & lax modes)*
>
> 6.        m370 - multiple PoU resolutions: If you have initiatedContent,
> AND a choiceBranchRef, AND a discriminator all on the same element, and
> there are 3 enclosing nested PoU, which one controls which? Precedence is
> the issue. Or..... do we really need to allow this? Why don’t we just
> disallow this kind of piling-on of complexity and make the user choose
> which PoU resolution technique they want?
>         *SMH: a) initiatedContent 'yes' on a choice/sequence, and a
> discriminator on a child of the choice/sequence. Allowed at the moment, and
> it is quite possible that users of IBM DFDL have this combination, so I
> would prefer not to make this an SDE (it's a difficult check to get right
> anyway due to the nature of discriminator placement). Issue a warning if a
> discriminator found on a direct child of the choice?*
> *        b) initiatedContent 'yes' and choiceBranchRef together on a
> choice. Yes one is redundant, but remember that initiatedContent could have
> been obtained via scoping rules, so if we made the combination an SDE then
> we would be forcing users to explictly set initiatedContent to 'no'.  I'd
> be ok with ignoring initiatedContent's PoU resolution behaviour if
> choiceBranchRef was present on a choice (note that choiceBranchRef can not
> be scoped).*
> *        c) choiceBranchRef on choice and a discriminator on a child of
> the choice. Issue a warning if a discriminator found on a direct child of
> the choice?*
>
> 7.        m396 - is BCD representation a mandatory feature, or optional?
> *        SMH: BCD calendars and BCD numbers are independent optional
> features. Will update errata document to make this clear.*
>
> 8.        m398 - portability at risk if subset processors ignore
> properties they don't implement. We relaxed this from a more rigid policy,
> and now allow subsets to not validate properties they don't implement.
> However, is there a better compromise, e.g., require a warning about all
> unimplemented/unrecognized properties? E.g., dfdl:textBiDi='no' yields SDE
> "unrecognized property 'textBiDi' with value 'no'.
> *        SMH: Agreed that this had been relaxed too much and a warning
> MUST be issued by implementations. *
>
> --
> Mike Beckerle | OGF DFDL WG Co-Chair
> Tel:  *781-330-0412* <781-330-0412>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>



-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121022/51962d2c/attachment-0001.html>


More information about the dfdl-wg mailing list