[DFDL-WG] Clarification needed: stricter separator suppresssion policy
Steve Hanson
smh at uk.ibm.com
Thu Aug 2 06:55:37 EDT 2018
It is true that we do not cover all possible nuances of separator
suppression. The ones in DFDL 1.0 essentially are equivalent to what is
supported by WTX. The closest to your requirement is 'anyEmpty' which
will suppress adjacent separators when unparsing but will tolerate
adjacent separators when parsing. At IBM we have not yet found concrete
requirements for any others.
IBM does not yet implement 'trailingEmptyStrict', which if I recall was
something Stephanie Fetzer said was needed for X12 'validation'.
Re your additional thought. I've not noticed an ambiguity.
1) An element such as of type int, which is not nillable with zero length,
not empty with default value with zero length, can have a zero-length
representation - the absent representation, which is by definition
zero-length.
2) The absent rep only arises when parsing and we encounter adjacent
delimiters (so no content) and there is no zero-length nil or empty rep.
3) The unparser never explicitly outputs an absent rep, it outputs
nothing, but when the next thing that is output is a delimiter then what
you parse could be the absent rep.
If you could be more specific with spec section references, then maybe any
ambiguity will become clearer.
I should also add that IBM DFDL has not implemented all the
empty/missing/absent stuff from the erratum that arose from action 140. We
do not make a clear distinction between missing and empty. The main effect
this has is that we can't supply a default value when parsing - so we
currently give a parse-time schema definition error if we find a
zero-length required occurrence for an element with a default value.
Regards
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: dfdl-wg at ogf.org
Date: 20/07/2018 21:06
Subject: Re: [DFDL-WG] Clarification needed: stricter separator
suppresssion policy
Sent by: "dfdl-wg" <dfdl-wg-bounces at ogf.org>
An additional thought here. The DFDL spec says that to be potentially
trailing an element must have a possible zero-length representation.
So, an element such as of type int, which is not nillable with zero
length, not empty with default value with zero length, such an element
cannot have a zero-length representation.
If one of these elements is in an all-optional minOccurs=0 array at the
end of a sequence, then trailing extra separators would NOT be acceptable
regardless of trailingEmpty being lax, because the element is not
potentially trailing.
However, elsewhere it says that absent (therefore missing) elements are
never created for optional elements.
Zero length for such an element means "absent" and so missing. And that
means not put into the infoset which suggests that they are acceptable and
ignored (though counted towards maxOccurs positions in a positional
sequence)
This seems completely ambiguous to me.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
On Fri, Jul 20, 2018 at 1:03 PM, Mike Beckerle <mbeckerle.dfdl at gmail.com>
wrote:
I have been trying to rationalize the definitions in the DFDL spec for
separated sequences.
I cannot find a way to express something very simple: A repeating element
with minOccurs non-zero-length elements with separators required, and up
to maxOccurs (or unbounded) non-zero-length elements with separators are
allowed. This means separators can never be adjacent anywhere in the
corresponding data stream (except if escaped, or hidden inside say a
fixed-length string inside a complex type element). Adjacent delimiters
would be a parse error.
I expected this to be occursCountKind 'implicit' with
separatorSuppressionPolicy 'never', but that appears to mean that
maxOccurs must be bounded and there are always exactly maxOccurs
separators, the latter of which (maxOccurs - minOccurs of them) can be
empty strings, meaning optional elements will not be created for them.
All the other 3 separator suppression policies absorb adjacent separators,
except for trailingEmptyStrict doesn't absorb them at the end of the
group.
There doesn't seem to be a way to be strict about the format and
speculatively parse only non-zero-length elements requiring each optional
occurance to appear with associated separator. I.e., no trailing adjacent
separators, and no adjacent separators in the middle or beginning either.
Are we missing separatorSuppressionPolicy='neverEmpty' or 'anyEmptyStrict'
perhaps?
Comments?
...mikeb
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180802/eb818b94/attachment.html>
More information about the dfdl-wg
mailing list