[DFDL-WG] Action 242 - valueLength and contentLength function wording

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue Mar 25 10:56:01 EDT 2014


This language is consistent with what we say for lengthKind pattern in
section 12.3.5:

"When unparsing, the dfdl:valueLength of a complex type element when the
length units is 'characters' is computed as if the entire structure was
unparsed into a temporary data stream beginning at position 1, and then
this data stream is considered to be text in the character set encoding
specified by the dfdl:encoding property, regardless of the actual
representation of the complex type element or the elements contained within
it. The number of characters in this temporary data stream is the value
length of the complex type."

The behavior of the IBM DFDL implementation for valueLength is as described
is consistent with the above, excepting that it will not detect a decode
error, and it gives an SDE (?) if the encoding is not fixed width.

Since we have decided not to require that a complex type element is
recursively all text all the way down, I believe we have to tolerate
implementations having different behaviors in the potentially meaningless
cases where there is binary data or encoding changes in the complex type.
So I would add to the above suggested language this:

"However, if creation of this data stream would cause an encoding error, or
parsing of this data stream as characters would cause a decoding error,
then the behavior and return value of dfdl:valueLength are implementation
dependent."

Looking at the DFDL spec, I am concerned that we never really say what we
mean by the "length of the ComplexContent region." (Last sentence before
Table 7 in section 12.3.7) Section 12.3.7.3 doesn't do it. The
dfdl:valueLength function may be the first place where we have to actually
say how the various sub-regions contribute to the ComplexContent region's
length.

I believe this is the obvious "sum of length of all contained regions", but
keep in mind that alignment region lengths will vary depending on the
starting alignment, so the length is, in general, dependent on the position
within the bit stream.

Hence when unparsing we have to specify that the dfdl:valueLength is
measured as if the ComplexContent region started at position 1 (as I did
above) so that internal alignment regions can be given meaningful lengths.

The general clarification should be added to 12.3.7.3, or to section 12.3.7
immediately before section 12.3.7.1. Something like this:

"The length of the ComplexContent region is the sum of the lengths of the
contained regions. However, note that alignment regions inside the
ComplexContent may be of different lengths depending on the
ComplexContent's starting position alignment."




Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property
Policy<http://www.ogf.org/About/abt_policies.php>



On Mon, Mar 24, 2014 at 11:34 AM, Andrew Edwards <andy.edwards at uk.ibm.com>wrote:

> Steve (et al) - Resending as the last one bounced.
>
> I'll usurp Tim and respond :)
>
> Currently the IBM implementation insists on using a fixed-length encoding
> and returns an "unsupported" error message for a variable width encoding.
>  With a fixed width encoding, we "do the maths" using the
> bytes-per-character and the bytes written by this complex element.
>
> HTH,
> Andy  *Andy Edwards* - *IBM Integration Bus*<http://www-03.ibm.com/software/products/us/en/integration-bus>-
> *DFDL*<https://w3-connections.ibm.com/wikis/home?lang=en-gb#!/wiki/IBM%20Data%20Format%20Description%20Language>
>   *Email:* *andy.edwards at uk.ibm.com* <andy.edwards at uk.ibm.com> *Snail
> Mail:*   MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN *Tel
> int:* 247222 *Tel ext:* +44 (0)1962 817222 *Desk:* DE3 V17
> *The Feynman problem solving Algorithm*
>  1) Write down the problem
>  2) Think real hard
>  3) Write down the answer
> -- Murray Gell-mann in the NY Times
>
>
>
>  *Steve Hanson/UK/IBM*
>
> 24/03/2014 14:52
>   To
> "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
> cc
> Mike Beckerle <mbeckerle.dfdl at gmail.com>, Andrew Edwards/UK/IBM at IBMGB
> Subject
> Re: [DFDL-WG] Action 242 - valueLength and contentLength function
>  wordingLink<Notes://D06ML014/80256D7F004ED63A/38D46BF5E8F08834852564B500129B2C/E5B500E5BE9FAE8980257CA500425DD8>
>
>
>
> Note errata 3.9, my bolding:
>
> *"3.9.** Section 12.3.5, 7.3.1, 7.3.2.  The spec originally allows
> lengthKind 'pattern' to be used when the representation of the current
> element, or of a child element, is binary, but imposes restrictions on the
> encoding that can be in force. *
>
> *Clarify that the encoding property must be defined for the element (else
> schema definition error), and that a decoding processing error is possible
> if the match of the regex encounters data that does not decode in that
> encoding, dependent on the setting of encodingErrorPolicy. Remove section
> 12.3.5.1.*
>
> *Same clarifications needed for testKind "pattern" property for asserts
> and discriminators**.*
>
> *For consistency, the restriction that a complex element of specified
> length and lengthUnits 'characters' must have children that are all text
> and that have the same encoding as the complex element, is dropped."*
>
> That's the restriction that I was referring to in my comment below.  I can
> see why it was dropped - basically the parser now just tries to decode n
> characters using the complex element's encoding (and encodingErrorPolicy).
> We could apply the same principle for dfdl:valueLength & dfdl:contentLength
> - you build the stream from the bottom up, and then decode it using the
> complex element's encoding (and encodingErrorPolicy ?) to get the length in
> characters.
>
> Note that's how unparsing for lengthKind 'prefixed' with lengthUnits
> 'characters' would work as well  - the spec just says "*For a complex
> element, the length is that of the ComplexContent region*" which is not
> sufficient (12.3.4). Similar deal for lengthKind 'explicit' - in order to
> know the size in chars of *ElementUnused* the unparser needs to know the
> size in chars of the data first (12.3.7.3).
>
> (Of course, for a fixed width encoding, you don't need to decode, you can
> just do the maths, but for the general case you need to decode. Also just
> doing the maths does not take encodingErrorPolicy into account).
>
> Regards
>
> Steve Hanson
> Architect, *IBM DFDL*<http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
>
> From:        Steve Hanson/UK/IBM
> To:        Mike Beckerle <mbeckerle.dfdl at gmail.com>,
> Cc:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, dfdl-wg-bounces at ogf.org
> Date:        24/03/2014 12:55
> Subject:        Re: [DFDL-WG] Action 242 - valueLength and contentLength
> function        wording
> ------------------------------
>
>
> Mike
>
> 23.5.3.1. Value length is only a function of the dfdl:encoding property if
> the element has a text representation. Not sure this needs to be (re)stated
> here.
>
> 23.5.3.1. *"**The value length is computed from the DFDL infoset value,
> ignoring the dfdl:length or dfdl:textOutputMinLength property. Other DFDL
> properties which affect the length of a text or binary representation are
> respected, it is only an explicit length which is ignored." *Last
> sentence is too imprecise - should be phrased in terms of the grammar.
>
> 23.5.3.1. *"**If the second argument is 'characters' then the element
> must have text representation and it is a schema definition error otherwise*
> *"*. Yes but only for a simple type, so should be qualified.
>
> 23.5.3.1. *"**If the second argument, giving the length units, is
> 'characters', then recursively, this complex type element must have text
> representation throughout all its contained elements and framing, all of
> which must also use a uniform character set encoding.**"*  I can't see
> that restriction elsewhere in the spec when it talks about length of
> ComplexContent and lengthUnits 'characters' - I was expecting it to be in
> section 12.3.4 or 12.3.7.3 which face the same issue - but it isn't. Did we
> decide not to have this restriction? Without such a restriction, how does
> the unparser come up with a meaningful length (unless it re-parses)? (
> *Tim* - what does IBM DFDL do here?)  What about delimiters and padding
> of children that use %#r entities?
>
> 23.5.3.2. The points in 23.5.3.1 about escape characters, length as a
> function of encoding, and bottom up for complex elements, apply equally to
> 23.5.3.2.  It might be easier just to say in 23.5.3.2 that
> dfdl:contentLength for complex elements is same as dfdl:valueLength, and
> for simple elements differs only by the additional inclusion of LeftPadding
> and RightPadOrFill regions.
>
> Also noted in passing:
>
> *Specified length* - An item has specified length when dfdl:lengthKind is
> "implicit", "explicit", or "prefixed".
>
> should be
>
> *Specified length* - An element has specified length when dfdl:lengthKind
> is "implicit" (simple type only), "explicit", or "prefixed".
>
> Regards
>
> Steve Hanson
> Architect, *IBM DFDL*<http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
> Date:        20/03/2014 17:21
> Subject:        [DFDL-WG] Action 242 - valueLength and contentLength
> function        wording
> Sent by:        dfdl-wg-bounces at ogf.org
> ------------------------------
>
>
>
> See attached doc which is proposed revisions to section 23.5.3
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com/>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*<http://www.ogf.org/About/abt_policies.php>
> [attachment "Action-252-DFDL-Functions-23.5.3.docx" deleted by Andrew
> Edwards/UK/IBM] --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140325/5ad1b839/attachment-0001.html>


More information about the dfdl-wg mailing list