[DFDL-WG] dfdl:lengthKind='prefixed' , different encodings for prefix and content where dfdl:prefixIncludesPrefixLength is ‘yes.

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue Aug 27 17:49:23 EDT 2013


I concur. The prefix is effectively a different element containing an
integer. Specifically a specially hidden integer element. There is no
problem if the encodings vary any more than there is a problem if the
prefix is binary data, but the data itself text in some encoding.

There are many knids of things like this that can be expressed in DFDL,
which don't exist anywhere in real data, but which DFDL has to be powerful
enough to express, otherwise DFDL would be too limited in other cases.

So, nobody *should* ever create data where the prefix is still text, but in
a different encoding than the data, but that said, it is no more difficult
to implement than two adjacent elements, where there is a transition from
one encoding to another. I've never seen alternating fields in different
encodings, but I have seen files with mixtures of ascii, binary, and ebcdic
data.


On Tue, Aug 27, 2013 at 12:44 PM, Steve Hanson <smh at uk.ibm.com> wrote:

> Assuming that the prefix contains the length in characters then I think
> this works ok when the encoding is different.  The parser will first parse
> the prefix according to prefixLengthType to get the prefix value, which is
> always of known length. If prefixIncludesPrefixLength is 'yes' then it
> subtracts this known length from the prefix value, giving the length of the
> data, which might be in a different encoding.
>
> I think we should continue to allow this.  In the past we have talked
> about a DFDL 2.0 feature that allowed the initiator and terminator to be
> specified using a simple type, precisely to cover the (rare) cases where
> the characteristics of these delimiters are different to the data itself.
>  Doing it this way prevents a property explosion on the element itself.  I
> view prefixLengthType as the first example of this principle.
>
> Regards
>
> Steve Hanson
> Architect, IBM Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> ----- Forwarded by Steve Hanson/UK/IBM on 23/08/2013 16:20 -----
>
> From:        Alex Wood1/UK/IBM
> To:        Steve Hanson/UK/IBM at IBMGB, Mark Frost/UK/IBM at IBMGB,
> dfdl-wg at ogf.org,
> Date:        23/08/2013 14:58
> Subject:        dfdl:lengthKind='prefixed' , different encodings for
> prefix and content where prefixIncludesPrefixLength is ‘yes.
> ------------------------------
>
>
> Hi All,
>
> Considering a case similar to that excluded by errata 2.76. An element
> with lengthKind 'prefixed' and  prefixIncludesPrefixLength 'true' but where
> the prefix type and the element both have lengthUnits 'characters' but have
> different encodings (or specifically encodings with different lengths of
> characters).
>
> I believe the issue that 2.76 is trying to avoid is the issue of
> determining the length value in say characters when the prefix contains no
> characters.
>
> I am wondering if there is also a slightly subtler issue when we are
> calculating a length in characters but where a part of the length is in a
> different encoding from the other.
> For example the prefix contains 2 UTF16 (2 byte) characters and the
> content contains 2 UTF32 (4 byte) characters..
> Do we just quote a length in characters regardless of encoding. eg. 4
> characters.  Or is this confusing ....
>
>
>
> *2.76*
> *. *
> *Section 12.3.4*
>
> *. When property prefixIncludesPrefixLength is ‘yes’there are some
> restrictions that need to be added to enable reliable lengths to be
> calculated: *
> *o  **If the prefix type is lengthKind 'implicit' or 'explicit' then the
> lengthUnits properties of *
> *both the prefix type and the element must be the same. *
>
>
> Kind Regards,
>
> - Alex
>
> Alex Wood -
> Software Engineer -
> WebSphere Message Broker Development
> IBM DFDL Development
>
> MP 211, IBM UK Labs, Hursley Park, Winchester, Hants. SO21 2JN.
> Tel: Internal 246272, External 01962 816272
> Notes: Alex Wood1/UK/IBM at IBMGB
> e-mail: wooda at uk.ibm.com
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>



-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130827/c7ad09ee/attachment-0001.html>


More information about the dfdl-wg mailing list