[DFDL-WG] Part 2 - Re: Action 307 - Demonstrate implementation interoperability - BOM
Steve Hanson
smh at uk.ibm.com
Wed Mar 27 12:15:05 EDT 2019
Hi Mike
The outstanding item to resolve is what to about BOMs.
307
Demonstrate implementation interoperability (Steve, Mike)
4/9: Need to make sure that DFDL spec section 21 lists a correct set of
optional features, the implication being that Daffodil and IBM DFDL (and
any other minimally conforming implementation) correctly implement the
remaining required features. First step - see if there are any obvious
omissions.
16/10: Steve sent email stating IBM DFDL's missing core features and
non-compliant behaviour, and Mike responded. Discussion continuing via two
separate email threads. Part 1 for core features. Part 2 for optional
features. For the core features, agreed that the following needs to
happen:
1) IBM adds encodingErrorPolicy='replace'
2) Daffodil adds encodingErrorPolicy='error'
3) Daffodil ensures that, if not implementing default/fixed when parsing,
it gives an SDE if a required occurrence has empty rep and element has
default/fixed set.
4) A position is agreed on BOM handling - ongoing via email.
1/11: Just BOM to conclude on from the above list
15/11: Not discussed
29/11: No further progress.
10/1/19:
1) IBM have started the work to add encodingErrorPolicy='replace'.
2) Daffodil have a temp setting to tolerate encodingErrorPolicy='error'
with a warning.
3) Daffodil to investigate whether this is feasible.
4) More discussion needed on BOM
7/2: Updates:
1) In progress
2) As above.
3) In progress
4) No progress
I can't find the email thread the action mentions, but my thoughts are as
follows:
There are 3 options -
a) keep the spec as it is which implies BOM processing is core - a
problem as neither Daffodil nor IBM DFDL implement it
b) make BOM processing optional - which means there would need to
be a property to switch it on or off in case an implementation started to
support it later
c) remove BOM processing altogether from 1.0 and add to the 2.0
list
I am leaning towards c) on the following grounds:
- Only one customer that I know of ever requested BOM processing for
non-XML data (in 2010, for MRM, before IBM DFDL available)
- BOM processing only applies to the message as a whole, not to any
embedded Unicode fragments, so support is selective anyway
- It is possible to model an optional BOM and use it to set a user-defined
encoding variable which is then used by the rest of the schema
I have a schema that models BOM and it successfully parses and unparses
the 3 variants fine (no BOM present, BOM for BE present, BOM for LE
present).
If you have the BOM email thread please can you forward it, so I can see
if I have missed any part of the thought process?
I have found the original DFDL WG thread from 2011 when we added BOM
support to the spec via erratum 3.7, which discusses the original
motivation and design, I'll send you it.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From: Steve Hanson/UK/IBM
To: Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc: DFDL-WG <dfdl-wg at ogf.org>
Date: 09/10/2018 12:16
Subject: Re: Part 2 - Re: Action 307 - Demonstrate implementation
interoperability
Mike, responses in-line below.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: Steve Hanson <smh at uk.ibm.com>
Cc: DFDL-WG <dfdl-wg at ogf.org>
Date: 04/10/2018 00:50
Subject: Part 2 - Re: Action 307 - Demonstrate implementation
interoperability
Based on Daffodil JIRA ticket backlog, and documentation at
https://daffodil.apache.org/unsupported/, below are DFDL non-core features
that are not supported by Daffodil, but that seem to be supported by IBM
DFDL (based on my not finding anything that says they aren't implemented),
and so are possibly in use in DFDL schemas we will need to use for
interoperability testing.
Please advise if IBM DFDL does *not* implement any of these.
* default, fixed - for defaulting values at parse time - Daffodil support
for this is partial at parse time, unsupported at unparse time. The fixed
attribute isn't supported at all.
* unordered sequences. IBM DFDL does not support default/fixed when
parsing (see other thread).
* byte-value entities - in contexts other than fillByte
* ICU symbols 'u' and 'I' in calendarPattern
* binaryFloatRep 'ibm390Hex'
* documentFinalTerminatorCanBeMissing
* textStandardBase - with value not equal to 10
* lengthKind 'prefixed', and prefixIncludesPrefixLength, prefixLengthType
- Note IBM restricts prefixLengthType to a type that itself cannot be
prefixed. Correct.
* assert with failure type 'recoverableError'
* calendarObserveDST
* calendarCenturyStart
* textNumberPattern 'V' and 'P' symbols
* CCSID for specifying dfdl:encoding
* nilKind 'literalValue' for binary data
* choiceLengthKind 'explicit' and choiceLength
* separatorSuppressionPolicy - behaviors for these in Daffodil are known
to be both non-standard currently and also different from IBM DFDL. This
needs correcting. IBM DFDL does not support 'trailingEmptyStrict'
.
The above list (after review/correction) needs to be crossed with the
published DFDL schemas on github that were published by IBM. The features
required to run those DFDL schemas are required for Daffodil to implement
before the interoperability demonstration.
Below are features of DFDL I believe neither IBM DFDL nor Daffodil
implement, and as they are non-core, they need not be implemented by
either for interoperability testing:
* lengthKind 'endOfParent'
* nilKind 'logicalValue' IBM DFDL implements this.
* occursCountKind 'stopValue' (and occursStopValue)
* textBiDi - and other related biDi properties
* useNilForDefault IBM DFDL implements this
* floating
* fn:exactlyOne function
* fn:namespace-URI() function
* dfdl:escapeCharacterPolicy 'delimiters' - daffodil doesn't implement
this property at all.
Below are features of Daffodil that are not implemented by IBM DFDL and so
cannot be used in schemas created using Daffodil that intend to be
interoperable. These are either easy to work around, or impossible to work
around, so are not a big deal, they just have to be kept in mind if
considering a DFDL schema for use in interoperability testing. This
includes schemas published on github, for image formats, CSV, etc., and a
number of the FOUO schema published on DI2E.net/forge.mil - some of those
quite possibly can work with IBM DFDL, and if they can do so, they should
be modified so that they can be included in the interoperability testing.
The list below mostly comes from
https://www.ibm.com/support/knowledgecenter/en/SSMKHH_10.0.0/com.ibm.etools.mft.doc/df00150_.htm
* calendarTimeZone specified as "" (empty string) - This is the most
problematic one, so I've put it first. The predefined DFDL named format
that is supplied with Daffodil and used as a starting point by most
schemas has calendarTimeZone="". This is because customers didn't like
that their datetimes were all being appended with "+00:00" for UTC time
zone (in the infoset) when the data simply didn't specify a time zone.
Schemas intended for interoperability testing should specify 'UTC' for
this property.
* calendarTimeZone specified as an Olson format time zone
* inputValueCalc, outputValueCalc
* hiddenGroupRef
* Asserts and discriminators on simple type definitions or global element
definitions
* fn:concat with more than 4 arguments
* non-8-bit charset encodings
* bitOrder not mostSignifcantBitFirst
* '@' in textNumberPattern (TBD: unsure if Daffodil has this)
* "_" in calendarLanguage
* calendarLanguage an expression
* assert & discriminator messages an expression
* binaryBooleanTrueRep as "" (empty string)
* checks for binary packed numbers with length units 'bits' and not a
multiple of 4 length, and similarly for alignmentUnits bits and alignment
not multiple of 4. (relevant to negative tests only)
* lengthKind 'implicit' complex elements inside lengthKind 'delimited'
complex elements.
Additionally IBM DFDL contains these bugs in its expression processing:
1) Path locations are not correctly validated. Specifically, array
elements without predicates and references into other choice branches are
not flagged as errors.
2) In DFDL expression functions, the namespace prefixes for
http://www.w3.org/2001/XMLSchema and
http://www.w3.org/2005/xpath-functions" must be 'xs' and 'fn'
respectively, even if not declared.
3) In DFDL expressions, namespace prefixes in paths are ignored and
matching is against element name only.
For interoperability testing therefore:
- For 1) avoid the use of either example
- For 2) always declare xmlns:xs and xmlns:fn and always use those
prefixes in expressions
- For 3) avoid sibling elements that have same name but different
namespace; use elementFormDefault="unqualified" to avoid namespaces for
local elements altogether
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20190327/88ce69bb/attachment-0001.html>
More information about the dfdl-wg
mailing list