[DFDL-WG] Action 242 - valueLength and contentLength function wording

Fri Apr 11 09:04:16 EDT 2014

Revised Action 242 proposed changes word doc attached. I have incorporated
the discussion in this thread (I hope.) Please review.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property
Policy<http://www.ogf.org/About/abt_policies.php>

On Tue, Mar 25, 2014 at 10:56 AM, Mike Beckerle <mbeckerle.dfdl at gmail.com>wrote:

>
> This language is consistent with what we say for lengthKind pattern in
> section 12.3.5:
>
> "When unparsing, the dfdl:valueLength of a complex type element when the
> length units is 'characters' is computed as if the entire structure was
> unparsed into a temporary data stream beginning at position 1, and then
> this data stream is considered to be text in the character set encoding
> specified by the dfdl:encoding property, regardless of the actual
> representation of the complex type element or the elements contained within
> it. The number of characters in this temporary data stream is the value
> length of the complex type."
>
> The behavior of the IBM DFDL implementation for valueLength is as
> described is consistent with the above, excepting that it will not detect a
> decode error, and it gives an SDE (?) if the encoding is not fixed width.
>
> Since we have decided not to require that a complex type element is
> recursively all text all the way down, I believe we have to tolerate
> implementations having different behaviors in the potentially meaningless
> cases where there is binary data or encoding changes in the complex type.
> So I would add to the above suggested language this:
>
> "However, if creation of this data stream would cause an encoding error,
> or parsing of this data stream as characters would cause a decoding error,
> then the behavior and return value of dfdl:valueLength are implementation
> dependent."
>
> Looking at the DFDL spec, I am concerned that we never really say what we
> mean by the "length of the ComplexContent region." (Last sentence before
> Table 7 in section 12.3.7) Section 12.3.7.3 doesn't do it. The
> dfdl:valueLength function may be the first place where we have to actually
> say how the various sub-regions contribute to the ComplexContent region's
> length.
>
> I believe this is the obvious "sum of length of all contained regions",
> but keep in mind that alignment region lengths will vary depending on the
> starting alignment, so the length is, in general, dependent on the position
> within the bit stream.
>
> Hence when unparsing we have to specify that the dfdl:valueLength is
> measured as if the ComplexContent region started at position 1 (as I did
> above) so that internal alignment regions can be given meaningful lengths.
>
> The general clarification should be added to 12.3.7.3, or to section
> 12.3.7 immediately before section 12.3.7.1. Something like this:
>
> "The length of the ComplexContent region is the sum of the lengths of the
> contained regions. However, note that alignment regions inside the
> ComplexContent may be of different lengths depending on the
> ComplexContent's starting position alignment."
>
>
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> www.tresys.com
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the OGF Intellectual Property Policy<http://www.ogf.org/About/abt_policies.php>
>
>
>
> On Mon, Mar 24, 2014 at 11:34 AM, Andrew Edwards <andy.edwards at uk.ibm.com>wrote:
>
>> Steve (et al) - Resending as the last one bounced.
>>
>> I'll usurp Tim and respond :)
>>
>> Currently the IBM implementation insists on using a fixed-length encoding
>> and returns an "unsupported" error message for a variable width encoding.
>>  With a fixed width encoding, we "do the maths" using the
>> bytes-per-character and the bytes written by this complex element.
>>
>> HTH,
>> Andy  *Andy Edwards* - *IBM Integration Bus*<http://www-03.ibm.com/software/products/us/en/integration-bus>-
>> *DFDL*<https://w3-connections.ibm.com/wikis/home?lang=en-gb#!/wiki/IBM%20Data%20Format%20Description%20Language>
>>   *Email:* *andy.edwards at uk.ibm.com* <andy.edwards at uk.ibm.com> *Snail
>> Mail:*   MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN *Tel
>> int:* 247222 *Tel ext:* +44 (0)1962 817222 *Desk:* DE3 V17
>> *The Feynman problem solving Algorithm*
>>  1) Write down the problem
>>  2) Think real hard
>>  3) Write down the answer
>> -- Murray Gell-mann in the NY Times
>>
>>
>>
>>  *Steve Hanson/UK/IBM*
>>
>> 24/03/2014 14:52
>>   To
>> "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
>> cc
>> Mike Beckerle <mbeckerle.dfdl at gmail.com>, Andrew Edwards/UK/IBM at IBMGB
>> Subject
>> Re: [DFDL-WG] Action 242 - valueLength and contentLength function
>>  wordingLink<Notes://D06ML014/80256D7F004ED63A/38D46BF5E8F08834852564B500129B2C/E5B500E5BE9FAE8980257CA500425DD8>
>>
>>
>>
>> Note errata 3.9, my bolding:
>>
>> *"3.9.** Section 12.3.5, 7.3.1, 7.3.2.  The spec originally allows
>> lengthKind 'pattern' to be used when the representation of the current
>> element, or of a child element, is binary, but imposes restrictions on the
>> encoding that can be in force. *
>>
>> *Clarify that the encoding property must be defined for the element (else
>> schema definition error), and that a decoding processing error is possible
>> if the match of the regex encounters data that does not decode in that
>> encoding, dependent on the setting of encodingErrorPolicy. Remove section
>> 12.3.5.1.*
>>
>> *Same clarifications needed for testKind "pattern" property for asserts
>> and discriminators**.*
>>
>> *For consistency, the restriction that a complex element of specified
>> length and lengthUnits 'characters' must have children that are all text
>> and that have the same encoding as the complex element, is dropped."*
>>
>> That's the restriction that I was referring to in my comment below.  I
>> can see why it was dropped - basically the parser now just tries to decode
>> n characters using the complex element's encoding (and
>> encodingErrorPolicy). We could apply the same principle for
>> dfdl:valueLength & dfdl:contentLength - you build the stream from the
>> bottom up, and then decode it using the complex element's encoding (and
>> encodingErrorPolicy ?) to get the length in characters.
>>
>> Note that's how unparsing for lengthKind 'prefixed' with lengthUnits
>> 'characters' would work as well  - the spec just says "*For a complex
>> element, the length is that of the ComplexContent region*" which is not
>> sufficient (12.3.4). Similar deal for lengthKind 'explicit' - in order to
>> know the size in chars of *ElementUnused* the unparser needs to know the
>> size in chars of the data first (12.3.7.3).
>>
>> (Of course, for a fixed width encoding, you don't need to decode, you can
>> just do the maths, but for the general case you need to decode. Also just
>> doing the maths does not take encodingErrorPolicy into account).
>>
>> Regards
>>
>> Steve Hanson
>> Architect, *IBM DFDL*<http://www.ibm.com/developerworks/library/se-dfdl/index.html>
>> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
>> IBM SWG, Hursley, UK
>> *smh at uk.ibm.com* <smh at uk.ibm.com>
>> tel:+44-1962-815848
>>
>>
>>
>>
>> From:        Steve Hanson/UK/IBM
>> To:        Mike Beckerle <mbeckerle.dfdl at gmail.com>,
>> Cc:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, dfdl-wg-bounces at ogf.org
>> Date:        24/03/2014 12:55
>> Subject:        Re: [DFDL-WG] Action 242 - valueLength and contentLength
>> function        wording
>> ------------------------------
>>
>>
>> Mike
>>
>> 23.5.3.1. Value length is only a function of the dfdl:encoding property
>> if the element has a text representation. Not sure this needs to be
>> (re)stated here.
>>
>> 23.5.3.1. *"**The value length is computed from the DFDL infoset value,
>> ignoring the dfdl:length or dfdl:textOutputMinLength property. Other DFDL
>> properties which affect the length of a text or binary representation are
>> respected, it is only an explicit length which is ignored." *Last
>> sentence is too imprecise - should be phrased in terms of the grammar.
>>
>> 23.5.3.1. *"**If the second argument is 'characters' then the element
>> must have text representation and it is a schema definition error otherwise*
>> *"*. Yes but only for a simple type, so should be qualified.
>>
>> 23.5.3.1. *"**If the second argument, giving the length units, is
>> 'characters', then recursively, this complex type element must have text
>> representation throughout all its contained elements and framing, all of
>> which must also use a uniform character set encoding.**"*  I can't see
>> that restriction elsewhere in the spec when it talks about length of
>> ComplexContent and lengthUnits 'characters' - I was expecting it to be in
>> section 12.3.4 or 12.3.7.3 which face the same issue - but it isn't. Did we
>> decide not to have this restriction? Without such a restriction, how does
>> the unparser come up with a meaningful length (unless it re-parses)? (
>> *Tim* - what does IBM DFDL do here?)  What about delimiters and padding
>> of children that use %#r entities?
>>
>> 23.5.3.2. The points in 23.5.3.1 about escape characters, length as a
>> function of encoding, and bottom up for complex elements, apply equally to
>> 23.5.3.2.  It might be easier just to say in 23.5.3.2 that
>> dfdl:contentLength for complex elements is same as dfdl:valueLength, and
>> for simple elements differs only by the additional inclusion of LeftPadding
>> and RightPadOrFill regions.
>>
>> Also noted in passing:
>>
>> *Specified length* - An item has specified length when dfdl:lengthKind
>> is "implicit", "explicit", or "prefixed".
>>
>> should be
>>
>> *Specified length* - An element has specified length when
>> dfdl:lengthKind is "implicit" (simple type only), "explicit", or
>> "prefixed".
>>
>> Regards
>>
>> Steve Hanson
>> Architect, *IBM DFDL*<http://www.ibm.com/developerworks/library/se-dfdl/index.html>
>> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
>> IBM SWG, Hursley, UK
>> *smh at uk.ibm.com* <smh at uk.ibm.com>
>> tel:+44-1962-815848
>>
>>
>>
>>
>> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
>> To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
>> Date:        20/03/2014 17:21
>> Subject:        [DFDL-WG] Action 242 - valueLength and contentLength
>> function        wording
>> Sent by:        dfdl-wg-bounces at ogf.org
>> ------------------------------
>>
>>
>>
>> See attached doc which is proposed revisions to section 23.5.3
>>
>> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
>> *www.tresys.com* <http://www.tresys.com/>
>> Please note: Contributions to the DFDL Workgroup's email discussions are
>> subject to the *OGF Intellectual Property Policy*<http://www.ogf.org/About/abt_policies.php>
>> [attachment "Action-252-DFDL-Functions-23.5.3.docx" deleted by Andrew
>> Edwards/UK/IBM] --
>>  dfdl-wg mailing list
>>  dfdl-wg at ogf.org
>>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>>
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140411/b38be05f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Action-252-DFDL-Functions-23.5.3.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 37596 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140411/b38be05f/attachment-0001.docx>