[DFDL-WG] Spec question: Parsing sequence groups with separators
Tim Kimber
KIMBERT at uk.ibm.com
Sat Jun 7 16:30:34 EDT 2014
The rules outlined in section 14.2.2 'Parsing Sequence Groups with
Separators' are not properly specified, and probably cannot be
consistently implemented.
The last paragraph of Section 14.2.1 says this: "In the sections that
follow, it is important to remember that the
dfdl:separatorSuppressionPolicy property is carried on the sequence, while
the XSDL minOccurs, XSDL maxOccurs and dfdl:occursCountKind properties are
is carried on an element in that sequence."
This is true, and this 'local overriding' of separatorSuppressionPolicy (
by arrays within the group ) is the cause of most of the problems.
Problem #1: Complexity
Consider a sequence group that has SSP='never' and the separator is a
comma. Its members ( A,B,C ) must always be represented as follows:
"a,b,c" or ",b,c" or ",,c"
but never "b,c" because that would imply that the separator for an empty A
had been suppressed.
Now suppose that B is an array with minOccurs=0 and maxOccurs=3 and
occursCountKind='implicit'. Acceptable representations are now:
"a,b1,b2,b3,c" or "a,b1,,,c" or even "a,,,c"
But if occursCountKind is changed to 'parsed' then the acceptable
representations suddenly alter, and empty occurrences of B can be
completely omitted.
"a,b1,b2,b3,c" or "a,b1,c" or even "a,c"
[ or should that be "a,,c" ]
This seems wrong. The logic that implements suppression policy is hard
enough to implement already. Bringing in an extra layer of complexity
around arrays will make it so hard that most implementations would contain
defects, leading to interoperability issues.
Problem #2 Ambiguity
See the brackets in the preceding paragraph.
[ or should that be "a,,c" ]
It is far from obvious whether the group should insist on having a
delimiter for the array ( because its SSP is 'never' ) or whether the
array should take liberty to suppress the separators for all of its
members ( as I assumed when I wrote this email). The text of the
specification is either silent or unclear on this point.
Possible resolution:
Rather than attempting to specify implied behaviours for the various
occursCountKind settings, I believe the specification should
a) prohibit the use of certain occursCountKinds within positional
sequences
b) require array occurrences to use the same SSP as other sequence
members.
After some discussion with the IBM team, I believe a) will not generate
too many prohibited combinations, and the rationale for those prohibitions
will be consistent with already-existing schema definition errors.
b) will simplify the implementation of separation suppression, thus
addressing the complexity problem.
I expect we will need an action to be opened so that this can be discussed
in the working group meetings.
regards,
Tim Kimber,
IBM Integration Bus Development (Industry Packs)
Hursley, UK
Internet: kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140607/dd1bfa10/attachment.html>
More information about the dfdl-wg
mailing list