[DFDL-WG] Clarification needed: sequence terminator that exists or not depending on expression

Mike Beckerle mbeckerle.dfdl at gmail.com
Mon Oct 1 15:30:29 EDT 2018


Consider the following:

<element name="value" type="xs:string" ...../>
<sequence dfdl:terminator="{ if (fn:string-length(./value) eq 32) then
'%ES;' else '%NUL;' }"/>

This is used to add a NUL at the end of a string, if the string length is
less than the max length of 32. This comes up often in fixed length or
variable-length-with-max data we've seen. I've put this terminator on a
separate sequence after the element to emphasize that we're not scanning
for terminating markup here. This has nothing to do with lengthKind
'delimited'.

However, the DFDL spec says (for terminator property)

·         ES must not appear as the only DFDL string literal in the
property. It can only appear as a member of a list.

·         Neither the ES entity nor the WSP* entity may appear on their own
as one of the string literals in the list when the parser is determining
the length of a component by scanning for delimiters.

The second bullet doesn't apply to my example.

Re: first bullet, I think my terminator expression is illegal... because
the '%ES;' is a list of literals containing ES as the only DFDL string
literal.

But this is a really flawed constraint, as "%ES;%ES;" and "%ES; %ES;" both
skirt the constraint, but mean the same thing as just "%ES;" which is
illegal.

So, if we don't want to allow these hack workarounds, we need a statement
that says runs of %ES; adjacent mean the same thing as one %ES;, and that
more than one identical-meaning delimiter specified in a list of string
literals means the same as just one. Or we can make these hack workarounds
illegal.

However, why are we disallowing these?

The above construct in my example is very useful, and really hard to work
around unless we can have a terminator that is '%ES;' as the only string
literal.  Actually I have no work around for this really. I am guessing I
could come up with something, but the various things I've guessed at don't
pan out, or prevent the string named 'value' above from being modeled as a
simple type.

I know we don't want lengthKind='delimited' with terminator="%ES;" as that
is most likely just a schema-definition error, but when we're not dealing
with a lengthKind, we really do seem to need to specify situations where
conditionally the terminator region will be empty.

So I think we need to do:
1) clarify that %ES; cannot be used in combination with any other character
or entity as a member of a  list of string literals.
   1a) At the same time I would also disallow combinations of WSP* that are
misleading and unnecessary i.e., disallow %WSP*; adjacent to any other WSP,
WSP+, or WSP*.
2) clarify that the constraint that %ES; for terminator and separator
cannot appear as the only string literal in a list of string literals...
applies only when the parser is determining the length of a component by
scanning for delimiters. This is just rephrasing the two bullets above so
the clause about scanning applies to both, not just the second.

I believe this preserves the intent that when lengthKind="delimited" and we
are scanning for delimiters, there must be *some* delimiter that is
potentially not zero length. You still have to cope with the possible match
being zero length due to %ES; being in the list of terminating markup, or
WSP* similarly, with no whitespace found. But the notion that there is NO
scanning to be done can't happen. That is, the notion that the schema
specifies lengthKind delimited, but also specifies no delimiters at all, is
still ruled out.


Comments?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20181001/6d3bd72d/attachment.html>


More information about the dfdl-wg mailing list