[DFDL-WG] Missing restriction in 'endOfParent' length kind ??

Steve Hanson smh at uk.ibm.com
Mon Apr 28 05:49:16 EDT 2014


Mike

For an element with lengthKind 'endOfParent', the lengthUnits property is 
not applicable, so there is no concept of lengthUnits being the same. I 
think what you are trying to say is that when the 'parent lengthUnits' is 
not 'characters' then the endOfParent element must have an SBCS encoding.

We need to have a meaning for 'parent lengthUnits' for all the endOfParent 
scenarios. I believe it is:
- lengthKind 'explicit' - dfdl:lengthUnits 
- lengthKind 'prefixed' - dfdl:lengthUnits 
- lengthKind 'pattern' - always 'characters'
- choiceLengthKind 'explicit' - always 'bytes'
- lengthKind 'endOfParent' - recursively apply above

And your suggestion also applies to other types when representation is 
'text', and not just to strings.

Regards
 
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:   24/04/2014 17:49
Subject:        [DFDL-WG] Missing restriction in 'endOfParent' length kind 
??
Sent by:        dfdl-wg-bounces at ogf.org




I think we're missing a restriction for 'endOfParent'

Restriction: If an element has lengthKind 'endOfParent', then it's 
lengthUnits must be the same as the lengthUnits of the parent. SDE 
otherwise.

Rationale:

If a string has fixed length, and  lengthUnits 'bytes' then currently we 
require the textStringPadCharacter to be a single-byte character. This 
prevents issues where there are bytes that can't be filled in by a pad 
character nor trimmed as a pad character. 

But if the string has 'endOfParent' length, the string could have 
lengthUnits 'characters', encoding utf-16, and would then typically have a 
2-byte pad character. But the enclosing parent could have lengthUnits 
bytes and an odd fixed/specified length. Hence, a byte could be left over 
that cannot be filled with a pad character, nor trimmed as a pad 
character. 

It is noisome to have this corner case/excess byte problem for 
'endOfParent' when we have so cleverly dodged it for all the 
specified-length kinds with the restriction to single-byte pad characters 
when lengthUnits is bytes. 

There is language in the spec about dealing with these trailing RightFill 
or ElementUnused regions. With the above restriction I believe RightFill 
becomes associated only with the hexBinary type being extended to fill out 
a specified-length or endOfParent box. The much more complex issue of 
RightFill appearing after text characters remains, but outside of the 
context where padding/trimming are involved.

It is really problematic language. Frought with possibilities for 
misinterpretation. To me it is very preferable to just make 'endOfParent' 
consistent with other constraints associated with padding for 
specified-length elements.




Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140428/8e4373fb/attachment.html>


More information about the dfdl-wg mailing list