[DFDL-WG] Spec question: Parsing sequence groups with separators

Mon Jun 9 05:35:41 EDT 2014

To put Tim's concerns in another way, the spec defines 'positional 
sequence' and 'non-positional sequence' in terms of the value of 
separatorSuppressionPolicy (section 14.2).  But separatorSuppressionPolicy 
only applies when occursCountKind is 'implicit', for other 
occursCountKinds there is an implied separatorSuppressionPolicy value 
(section 14.2.2) . We did this partly so that separatorSuppressionPolicy 
can be put in scope and not cause errors. However when you create a 
sequence that contains elements with different occursCountKinds, you can 
end up with a hybrid which is positional in places and non-positional in 
others.  We need to decide whether these kind of sequences are allowed. 
You can always wrap a group of elements in a sequence in order to change 
separatorSuppressionPolicy.

occursCountKind 'expression'. This is stated as having implied 
separatorSuppressionPolicy 'never' on the grounds that is very like 
'fixed'. That implies positional behaviour. But you need to parse the data 
in order to know the number of occurrences, so doesn't that make it 
non-positional? Also. section 16 states that when unparsing, 'expression' 
behaves like 'parsed' - and 'parsed' has implied 
separatorSuppressionPolicy 'empty'. Something not quite straight here.

Regards

Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848

From:   Tim Kimber/UK/IBM at IBMGB
To:     dfdl-wg at ogf.org, 
Date:   07/06/2014 21:31
Subject:        [DFDL-WG] Spec question: Parsing sequence groups with 
separators
Sent by:        dfdl-wg-bounces at ogf.org

The rules outlined in section 14.2.2 'Parsing Sequence Groups with 
Separators' are not properly specified, and probably cannot be 
consistently implemented. 

The last paragraph of Section 14.2.1 says this: "In the sections that 
follow, it is important to remember that the 
dfdl:separatorSuppressionPolicy property is carried on the sequence, while 
the XSDL minOccurs, XSDL maxOccurs and dfdl:occursCountKind properties are 
is carried on an element in that sequence." 
This is true, and this 'local overriding' of separatorSuppressionPolicy ( 
by arrays within the group ) is the cause of most of the problems. 

Problem #1: Complexity 
Consider a sequence group that has SSP='never' and the separator is a 
comma. Its members ( A,B,C ) must always be represented as follows: 
"a,b,c" or ",b,c" or ",,c" 
but never "b,c" because that would imply that the separator for an empty A 
had been suppressed. 

Now suppose that B is an array with minOccurs=0 and maxOccurs=3 and 
occursCountKind='implicit'. Acceptable representations are now: 
"a,b1,b2,b3,c" or "a,b1,,,c" or even "a,,,c" 

But if occursCountKind is changed to 'parsed' then the acceptable 
representations suddenly alter, and empty occurrences of B can be 
completely omitted. 
"a,b1,b2,b3,c" or "a,b1,c" or even "a,c" 
[ or should that be "a,,c" ] 

This seems wrong. The logic that implements suppression policy is hard 
enough to implement already. Bringing in an extra layer of complexity 
around arrays will make it so hard that most implementations would contain 
defects, leading to interoperability issues. 

Problem #2 Ambiguity 
See the brackets in the preceding paragraph. 
[ or should that be "a,,c" ] 

It is far from obvious whether the group should insist on having a 
delimiter for the array ( because its SSP is 'never' ) or whether the 
array should take liberty to suppress the separators for all of its 
members ( as I assumed when I wrote this email). The text of the 
specification is either silent or unclear on this point. 

Possible resolution: 
Rather than attempting to specify implied behaviours for the various 
occursCountKind settings, I believe the specification should 
a) prohibit the use of certain occursCountKinds within positional 
sequences 
b) require array occurrences to use the same SSP as other sequence 
members. 

After some discussion with the IBM team, I believe a) will not generate 
too many prohibited combinations, and the rationale for those prohibitions 
will be consistent with already-existing schema definition errors. 
b) will simplify the implementation of separation suppression, thus 
addressing the complexity problem. 

I expect we will need an action to be opened so that this can be discussed 
in the working group meetings. 

regards,

Tim Kimber, 
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140609/bb4a81d3/attachment.html>