[DFDL-WG] dfdl:lengthKind='prefixed' , different encodings for prefix and content where dfdl:prefixIncludesPrefixLength is ‘yes.
Steve Hanson
smh at uk.ibm.com
Tue Aug 27 12:44:57 EDT 2013
Assuming that the prefix contains the length in characters then I think
this works ok when the encoding is different. The parser will first parse
the prefix according to prefixLengthType to get the prefix value, which is
always of known length. If prefixIncludesPrefixLength is 'yes' then it
subtracts this known length from the prefix value, giving the length of
the data, which might be in a different encoding.
I think we should continue to allow this. In the past we have talked
about a DFDL 2.0 feature that allowed the initiator and terminator to be
specified using a simple type, precisely to cover the (rare) cases where
the characteristics of these delimiters are different to the data itself.
Doing it this way prevents a property explosion on the element itself. I
view prefixLengthType as the first example of this principle.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 23/08/2013 16:20 -----
From: Alex Wood1/UK/IBM
To: Steve Hanson/UK/IBM at IBMGB, Mark Frost/UK/IBM at IBMGB,
dfdl-wg at ogf.org,
Date: 23/08/2013 14:58
Subject: dfdl:lengthKind='prefixed' , different encodings for
prefix and content where prefixIncludesPrefixLength is ‘yes.
Hi All,
Considering a case similar to that excluded by errata 2.76. An element
with lengthKind 'prefixed' and prefixIncludesPrefixLength 'true' but
where the prefix type and the element both have lengthUnits 'characters'
but have different encodings (or specifically encodings with different
lengths of characters).
I believe the issue that 2.76 is trying to avoid is the issue of
determining the length value in say characters when the prefix contains no
characters.
I am wondering if there is also a slightly subtler issue when we are
calculating a length in characters but where a part of the length is in a
different encoding from the other.
For example the prefix contains 2 UTF16 (2 byte) characters and the
content contains 2 UTF32 (4 byte) characters..
Do we just quote a length in characters regardless of encoding. eg. 4
characters. Or is this confusing ....
2.76
.
Section 12.3.4
. When property prefixIncludesPrefixLength is ‘yes’there are some
restrictions that need to be added to enable reliable lengths to be
calculated:
o If the prefix type is lengthKind 'implicit' or 'explicit' then the
lengthUnits properties of
both the prefix type and the element must be the same.
Kind Regards,
- Alex
Alex Wood -
Software Engineer -
WebSphere Message Broker Development
IBM DFDL Development
MP 211, IBM UK Labs, Hursley Park, Winchester, Hants. SO21 2JN.
Tel: Internal 246272, External 01962 816272
Notes: Alex Wood1/UK/IBM at IBMGB
e-mail: wooda at uk.ibm.com
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130827/f75fec94/attachment.html>
More information about the dfdl-wg
mailing list