[DFDL-WG] Part 1 - Re: Action 307 - Demonstrate implementation interoperability

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Oct 3 18:00:22 EDT 2018


I'm going to reply to this in a few parts.

With respect to:
- dfdl:binaryBooleanTrueRep with value empty string
- dfdl:assert on global element and simple type
- dfdl:discriminator on global element and simple type
- Multiple xs:appinfo elements within each xs:annotation element
I think these are minor non-compliances with the DFDL spec, and for
interoperability testing we can just revise schemas under test to not use
these constructs.

With respect to:
- When parsing, the distinction between an element being 'missing', having
an 'empty representation' and having an 'absent representation', is not in
accordance with the specification.
I think time will tell here, that is, there's nothing we can anticipate
having to do because of this as yet. If this non-compliance does not cause
interoperability problems for realistic and published DFDL schemas then I
wouldn't worry about it. Like IBM DFDL, Daffodil does not implement default
values during parsing, and that's a likely area where this issue of
missing/empty/absent has effect on behavior. It is quite possible that
despite this lack of conformance to the DFDL spec., interoperability
testing would be successful.

With respect to:
- When encoding is 'UTF-8' or 'UTF-16', byte order marks are not processed
Daffodil also does not implement byte-order-mark processing. We can dodge
this issue entirely if we make the UTF-16 charset (specifically UTF-16
without the BE or LE suffix) encoding an optional DFDL feature. That
effectively makes byte-order-mark processing also an optional feature, and
then both IBM DFDL and Daffodil would be compliant and interoperable.

With respect to:
- dfdl:encodingErrorPolicy "replace"
This one is harder. Daffodil doesn't implement encodingErrorPolicy='error'
so we have no common ground here for interoperability testing.
Making the entire encodingErrorPolicy property optional - meaning behavior
in the presence of encoding errors is implementation specified  - that's
super undesirable to me.
I suspect that implementing encodingErrorPolicy 'error' will be necessary
for Daffodil. If we do that then IBM DFDL can continue to document the lack
of this missing required feature of DFDL, or we can make 'replace' optional
in the spec., or IBM could implement 'replace'.

*Additional Non-portable/Problematic Required Features*

I did an analysis of all DFDL properties, and those that must be
implemented to meet the minimum functionality that is not optional for a
DFDL implementation per Section 21 of the spec.
Starting from a list of all DFDL properties, I eliminated any specific to
unparsing, and then any that aren't relevant given something optional in
Section 21.

Here are the remaining properties I found. Restrictions on what values of
these properties are mentioned where their full functionality is considered
optional:

   - length - integer values only
   - lengthKind - explicit, implicit only
   - lengthUnits - bytes or characters only
   - representation - binary only
   - byteOrder
   - alignment - number or 'implicit'
   - alignmentUnits - bytes only
   - fillByte
   - leadingSkip
   - trailingSkip
   - encoding - 'UTF-8'', 'UTF-16', 'UTF-16BE', 'UTF-16LE', 'ASCII', and
   'ISO-8859-1'
   - encodingErrorPolicy - (Already discussed above, so not further
   discussed in this section)
   - utf16Width - because UTF-16 is allowed for encoding, 'variable' is
   problematic.
   - textPadKind
   - textTrimKind
   - textStringJustification
   - textStringPadCharacter
   - binaryNumberRep - binary only
   - binaryFloatRep - ieee only
   - binaryBooleanTrueRep
   - binaryBooleanFalseRep - IBM DFDL doesn't allow empty string for this.
   (Minor.)
   - binaryCalendarRep - binarySeconds, binaryMillseconds only
   - binaryCalendarEpoch
   - occursCountKind - fixed only
   - occursCount - integer only

Looking at this list, there is only 1 additional issue to
portability/interoperability this raises today given what I know about the
Daffodil implementation and the IBM implementation.

*Issue: utf16Width='variable'*

This issue can be addressed with a minor change to the DFDL specification.

When the type is xs:string, lengthUnits is 'characters', then the length in
characters should take surrogate-pairs found in the UTF-16 data, and count
those as occupying 1 character.


This utf16Width='variable' feature of DFDL should be optional, as Java
JVM-based implementations will find this extremely difficult to support,
since JVM standard string representations cannot represent individual
characters with code points greater than 0xFFFF occupying 1 location in a
string.


Daffodil does not implement this 'variable' behavior, and we have no good
pathway to do so. Hence, prefer to change the DFDL spec to make this
'variable'  optional. Only 'fixed' would be required. I could support
deprecating the whole property even.


*Issue: lengthUnits='characters' and variable-width charset encodings*


I believe this is required behavior. I also believe the lack of support for
this is missing from IBM's list of non-compliances. I recall discussion
that IBM DFDL requires a fixed width encoding in this situation where
lengthUnits is 'characters'.  (Please correct me if I am wrong.)


I suggest making this combination an optional feature of the DFDL spec.,
would resolve the issue.


This complex feature was added to support naive data format conversions
where data originally had ascii encoding and lengthUnits 'bytes' is changed
to 'utf-8' with lengthUnits 'characters'.  This is a rational way to
modernize a data format adding internationalization capability. It however
requires a significant change in runtime behavior because utf-8 characters
occupy between 1 and 4 bytes per character.


*Optional Features that are Partially Implemented*

The bigger set of concerns for interoperability is the behavior of a DFDL
processor for features that are optional by strict interpretation of
Section 21, but are implemented by a specific DFDL implementation, but the
implementation is partial. This is the subject of other email messages
however.



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>



On Tue, Sep 11, 2018 at 11:33 AM Steve Hanson <smh at uk.ibm.com> wrote:

> Action 307 was raised recently and first task is for implementations to
> identify which core spec behaviour is not implemented.
>
> *IBM DFDL *
>
> The following is the list of DFDL 1.0 spec core features that IBM DFDL
> does not yet implement.
>
> - dfdl:encodingErrorPolicy "replace"
> - dfdl:binaryBooleanTrueRep with value empty string
> - dfdl:assert on global element and simple type
> - dfdl:discriminator on global element and simple type
> - Multiple xs:appinfo elements within each xs:annotation element
> - When parsing, the distinction between an element being 'missing',
> having an 'empty representation' and having an 'absent representation', is
> not in accordance with the specification.
> - When encoding is 'UTF-8' or 'UTF-16', byte order marks are not processed
>
> The above lists are derived from information at
> https://www.ibm.com/support/knowledgecenter/en/SSMKHH_10.0.0/com.ibm.etools.mft.doc/df00150_.htm
> and are those that apply to core spec features.
>
> Regards
>
> Steve Hanson
>
> IBM Hybrid Integration, Hursley, UK
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20181003/c971a888/attachment-0001.html>


More information about the dfdl-wg mailing list