[DFDL-WG] Missing restriction in 'endOfParent' length kind ??
Steve Hanson
smh at uk.ibm.com
Mon Apr 28 05:49:16 EDT 2014
Mike
For an element with lengthKind 'endOfParent', the lengthUnits property is
not applicable, so there is no concept of lengthUnits being the same. I
think what you are trying to say is that when the 'parent lengthUnits' is
not 'characters' then the endOfParent element must have an SBCS encoding.
We need to have a meaning for 'parent lengthUnits' for all the endOfParent
scenarios. I believe it is:
- lengthKind 'explicit' - dfdl:lengthUnits
- lengthKind 'prefixed' - dfdl:lengthUnits
- lengthKind 'pattern' - always 'characters'
- choiceLengthKind 'explicit' - always 'bytes'
- lengthKind 'endOfParent' - recursively apply above
And your suggestion also applies to other types when representation is
'text', and not just to strings.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
Date: 24/04/2014 17:49
Subject: [DFDL-WG] Missing restriction in 'endOfParent' length kind
??
Sent by: dfdl-wg-bounces at ogf.org
I think we're missing a restriction for 'endOfParent'
Restriction: If an element has lengthKind 'endOfParent', then it's
lengthUnits must be the same as the lengthUnits of the parent. SDE
otherwise.
Rationale:
If a string has fixed length, and lengthUnits 'bytes' then currently we
require the textStringPadCharacter to be a single-byte character. This
prevents issues where there are bytes that can't be filled in by a pad
character nor trimmed as a pad character.
But if the string has 'endOfParent' length, the string could have
lengthUnits 'characters', encoding utf-16, and would then typically have a
2-byte pad character. But the enclosing parent could have lengthUnits
bytes and an odd fixed/specified length. Hence, a byte could be left over
that cannot be filled with a pad character, nor trimmed as a pad
character.
It is noisome to have this corner case/excess byte problem for
'endOfParent' when we have so cleverly dodged it for all the
specified-length kinds with the restriction to single-byte pad characters
when lengthUnits is bytes.
There is language in the spec about dealing with these trailing RightFill
or ElementUnused regions. With the above restriction I believe RightFill
becomes associated only with the hexBinary type being extended to fill out
a specified-length or endOfParent box. The much more complex issue of
RightFill appearing after text characters remains, but outside of the
context where padding/trimming are involved.
It is really problematic language. Frought with possibilities for
misinterpretation. To me it is very preferable to just make 'endOfParent'
consistent with other constraints associated with padding for
specified-length elements.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140428/8e4373fb/attachment.html>
More information about the dfdl-wg
mailing list