[DFDL-WG] Clarification needed: stricter separator suppresssion policy

Steve Hanson smh at uk.ibm.com
Thu Aug 2 06:55:37 EDT 2018


It is true that we do not cover all possible nuances of separator 
suppression. The ones in DFDL 1.0 essentially are equivalent to what is 
supported by WTX.  The closest to your requirement is 'anyEmpty' which 
will suppress adjacent separators when unparsing but will tolerate 
adjacent separators when parsing.  At IBM we have not yet found concrete 
requirements for any others.

IBM does not yet implement 'trailingEmptyStrict', which if I recall was 
something Stephanie Fetzer said was needed for X12 'validation'.

Re your additional thought. I've not noticed an ambiguity.
1) An element such as of type int, which is not nillable with zero length, 
not empty with default value with zero length, can have a zero-length 
representation - the absent representation, which is by definition 
zero-length.
2) The absent rep only arises when parsing and we encounter adjacent 
delimiters (so no content) and there is no zero-length nil or empty rep. 
3) The unparser never explicitly outputs an absent rep, it outputs 
nothing, but when the next thing that is output is a delimiter then what 
you parse could be the absent rep.
If you could be more specific with spec section references, then maybe any 
ambiguity will become clearer.

I should also add that IBM DFDL has not implemented all the 
empty/missing/absent stuff from the erratum that arose from action 140. We 
do not make a clear distinction between missing and empty. The main effect 
this has is that we can't supply a default value when parsing - so we 
currently give a parse-time schema definition error if we find a 
zero-length required occurrence for an element with a default value.

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     dfdl-wg at ogf.org
Date:   20/07/2018 21:06
Subject:        Re: [DFDL-WG] Clarification needed: stricter separator 
suppresssion    policy
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>



An additional thought here. The DFDL spec says that to be potentially 
trailing an element must have a possible zero-length representation. 

So, an element such as of type int, which is not nillable with zero 
length, not empty with default value with zero length, such an element 
cannot have a zero-length representation.

If one of these elements is in an all-optional minOccurs=0 array at the 
end of a sequence, then trailing extra separators would NOT be acceptable 
regardless of trailingEmpty being lax, because the element is not 
potentially trailing.

However, elsewhere it says that absent (therefore missing) elements are 
never created for optional elements. 

Zero length for such an element means "absent" and so missing. And that 
means not put into the infoset which suggests that they are acceptable and 
ignored (though counted towards maxOccurs positions in a positional 
sequence)

This seems completely ambiguous to me. 



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy


On Fri, Jul 20, 2018 at 1:03 PM, Mike Beckerle <mbeckerle.dfdl at gmail.com> 
wrote:

I have been trying to rationalize the definitions in the DFDL spec for 
separated sequences.

I cannot find a way to express something very simple: A repeating element 
with minOccurs non-zero-length elements with separators required, and up 
to maxOccurs (or unbounded) non-zero-length elements with separators are 
allowed. This means separators can never be adjacent anywhere in the 
corresponding data stream (except if escaped, or hidden inside say a 
fixed-length string inside a complex type element). Adjacent delimiters 
would be a parse error. 

I expected this to be occursCountKind 'implicit' with 
separatorSuppressionPolicy 'never', but that appears to mean that 
maxOccurs must be bounded and there are always exactly maxOccurs 
separators, the latter of which (maxOccurs - minOccurs of them) can be 
empty strings, meaning optional elements will not be created for them.  

All the other 3 separator suppression policies absorb adjacent separators, 
except for trailingEmptyStrict doesn't absorb them at the end of the 
group. 

There doesn't seem to be a way to be strict about the format and 
speculatively parse only non-zero-length elements requiring each optional 
occurance to appear with associated separator. I.e., no trailing adjacent 
separators, and no adjacent separators in the middle or beginning either. 

Are we missing separatorSuppressionPolicy='neverEmpty' or 'anyEmptyStrict' 
perhaps? 

Comments?

...mikeb


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  
https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180802/eb818b94/attachment.html>


More information about the dfdl-wg mailing list