[DFDL-WG] Clarification needed: sequence terminator that exists or not depending on expression

Steve Hanson smh at uk.ibm.com
Tue Oct 9 09:37:37 EDT 2018


 "%ES;%ES;" is already disallowed, as ES can only appear once - see the 
entity syntax table. 

 "%ES; %ES;" is also disallowed, it contravenes the first sentence "ES 
must not appear as the only DFDL string literal in the property." It 
appears twice, but it is still the only DFDL string literal :)  The 
wording is clearly ambiguous as we interpreted it differently.

Note that IBM DFDL has not yet implemented the erratum (2.148) that allows 
ES to appear anywhere other then dfdl:nilvalue. (All started from this 
public comment https://redmine.ogf.org/boards/15/topics/40)

IBM has also encountered this type of "variable-length-with-max" string. 
I'm sure I raised it in the WG a long time ago, and we discussed (and 
presumably rejected) whether it should be a new lengthKind, eg 
"delimitedMax", for convenience. Can't find anything in my email logs 
though. And not sure what we did to model it ??  My memory could be 
playing tricks.

Whatever we decide, each of initiator, terminator and separator need to be 
considered separately.  Note that ES is currently allowed (with stated 
restrictions) for initiator and terminator only, not for separator - which 
makes sense to me but is contrary to 2.148 ??

Also need to be wary of EVDP.

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     DFDL-WG <dfdl-wg at ogf.org>
Date:   01/10/2018 20:31
Subject:        [DFDL-WG] Clarification needed: sequence terminator that 
exists or not depending on expression
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>



Consider the following:

<element name="value" type="xs:string" ...../>
<sequence dfdl:terminator="{ if (fn:string-length(./value) eq 32) then 
'%ES;' else '%NUL;' }"/>

This is used to add a NUL at the end of a string, if the string length is 
less than the max length of 32. This comes up often in fixed length or 
variable-length-with-max data we've seen. I've put this terminator on a 
separate sequence after the element to emphasize that we're not scanning 
for terminating markup here. This has nothing to do with lengthKind 
'delimited'. 

However, the DFDL spec says (for terminator property)

·         ES must not appear as the only DFDL string literal in the 
property. It can only appear as a member of a list.
·         Neither the ES entity nor the WSP* entity may appear on their 
own as one of the string literals in the list when the parser is 
determining the length of a component by scanning for delimiters.

The second bullet doesn't apply to my example.

Re: first bullet, I think my terminator expression is illegal... because 
the '%ES;' is a list of literals containing ES as the only DFDL string 
literal. 

But this is a really flawed constraint, as "%ES;%ES;" and "%ES; %ES;" both 
skirt the constraint, but mean the same thing as just "%ES;" which is 
illegal. 

So, if we don't want to allow these hack workarounds, we need a statement 
that says runs of %ES; adjacent mean the same thing as one %ES;, and that 
more than one identical-meaning delimiter specified in a list of string 
literals means the same as just one. Or we can make these hack workarounds 
illegal.

However, why are we disallowing these? 

The above construct in my example is very useful, and really hard to work 
around unless we can have a terminator that is '%ES;' as the only string 
literal.  Actually I have no work around for this really. I am guessing I 
could come up with something, but the various things I've guessed at don't 
pan out, or prevent the string named 'value' above from being modeled as 
a  simple type.  

I know we don't want lengthKind='delimited' with terminator="%ES;" as that 
is most likely just a schema-definition error, but when we're not dealing 
with a lengthKind, we really do seem to need to specify situations where 
conditionally the terminator region will be empty. 

So I think we need to do:
1) clarify that %ES; cannot be used in combination with any other 
character or entity as a member of a  list of string literals. 
   1a) At the same time I would also disallow combinations of WSP* that 
are misleading and unnecessary i.e., disallow %WSP*; adjacent to any other 
WSP, WSP+, or WSP*. 
2) clarify that the constraint that %ES; for terminator and separator 
cannot appear as the only string literal in a list of string literals... 
applies only when the parser is determining the length of a component by 
scanning for delimiters. This is just rephrasing the two bullets above so 
the clause about scanning applies to both, not just the second.

I believe this preserves the intent that when lengthKind="delimited" and 
we are scanning for delimiters, there must be *some* delimiter that is 
potentially not zero length. You still have to cope with the possible 
match being zero length due to %ES; being in the list of terminating 
markup, or WSP* similarly, with no whitespace found. But the notion that 
there is NO scanning to be done can't happen. That is, the notion that the 
schema specifies lengthKind delimited, but also specifies no delimiters at 
all, is still ruled out. 


Comments?

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  
https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20181009/0250194e/attachment.html>


More information about the dfdl-wg mailing list