[DFDL-WG] Suggest should be optional feature of DFDL - dfdl:utf16Width='variable' and other corner cases

Steve Hanson smh at uk.ibm.com
Wed Sep 14 03:52:25 EDT 2016


Actions 290 and 291 raised to investigate further - see minutes.

Regards
 
Steve Hanson
IBM Integration Bus, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890



From:   Steve Hanson/UK/IBM
To:     Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date:   13/09/2016 13:14
Subject:        Re: [DFDL-WG] Suggest should be optional feature of DFDL - 
dfdl:utf16Width='variable' and other corner cases


Mike

I am assuming that the processing for utf-16 'fixed' or 'variable' is 
entirely handled by ICU so there should be no coding overhead.

IBM DFDL works ok for dfdl:lengthKind='explicit' for an element of complex 
type with dfdl:lengthUnits='characters' and dfdl:encoding="utf-8". But 
there are conditions the content of the complex type must satisfy 
otherwise an SDE results, such as: 

CTDV1524E : For a complex element, when 'lengthKind' is 'explicit' or 
'prefixed', and 'lengthUnits' is characters, all simple child elements 
must have text representation, 'lengthUnits' set to 'characters' and the 
same encoding. 

So we insist that the properties of the children are consistent with the 
properties of the parent.  If you recall, IBM DFDL does all these kinds of 
validation checks in a pre-processing phase.

That seems a pretty sensible rule but I am not sure if the rule appears in 
the spec as such - I just had a quick look but didn't spot anything.

So I guess I don't see a need for these things to be optional features?

Regards
 
Steve Hanson
IBM Integration Bus, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890




From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date:   10/08/2016 18:57
Subject:        [DFDL-WG] Suggest should be optional feature of DFDL - 
dfdl:utf16Width='variable' and other corner cases
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>




Given the limited set of required encodings for a conforming DFDL 
processor, I believe dfdl:utf16Width='variable' should be an optional 
feature.

That's just consistency with what is optional already. But it is also 
quite hard to implement. 

There are other situations that are very hard to implement, probably never 
used by real users, yet which are non optional in the spec:

I would suggest that dfdl:lengthKind='explicit' for elements of complex 
type, with dfdl:lengthUnits='characters' and a variable-width encoding 
like utf-8 is very problematic to implement. I am pretty sure IBM DFDL has 
no implementation of this per email threads, and I know I don't want to 
implement this in Daffodil even though we're trying to be very 
comprehensive in the implementation eventually.

I think implementations should be free to just not implement this.  These 
sorts of cases often exist just because we're trying to preserve some 
orthogonality of composition in the language. So it's possible to do quite 
a few things that probably aren't ever needed by anyone, that reflect 
ill-defined data formats, etc.

I'd rather not document a bunch of "non-conformances" for Daffodil or 
other implementations for these sorts of things. I'd like to say we don't 
implement them, but they're optional, and so that's allowed.

Comments?



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20160914/6b49c9d2/attachment.html>


More information about the dfdl-wg mailing list