[DFDL-WG] clarification: on suppressed ZL string/hexBinary - do we keep variable assignments?

Steve Hanson smh at uk.ibm.com
Thu Aug 2 05:54:57 EDT 2018


There are only 2 cases. Section 22 (property precedence) makes it clear 
that EVDP and NVDP are only ever examined if there is an initiator and/or 
terminator.

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Hanson <smh at uk.ibm.com>
Cc:     dfdl-wg at ogf.org
Date:   01/08/2018 20:53
Subject:        Re: [DFDL-WG] clarification: on suppressed ZL 
string/hexBinary - do we keep variable assignments?



I spoke too soon.

I'm good with this: 
if EVDP is 'none', then empty strings have no representation syntax at 
all. So we don't create optional elements for them period. 

This statement I made is not correct.

If EVDP is not 'none', then empty string requires some syntax to be there, 
and based on this syntax appearing an empty string value is added to the 
infoset for the optional element. 

Suppose EVDP is not none, but initiator and terminator are both "", then 
the empty string does *not* require syntax to be there. So the above 
statement is simply wrong.

When EVDP is 'both' but neither initiator nor terminator are defined, then 
since EVDP is not 'none', a zero-length string would still cause an 
optional element to be added to the infoset. In a separated sequence, this 
would mean one could control how many empty strings go into optional 
values by providing more separators.

So "a,b,,,,,," would add 2 non-empty and 6 empty strings to the infoset, 
regardless of whether they are required or optional. 

If the element is named 'x', has minOccurs "3",  default="c" 
maxOccurs="12" and occursCountKind 'implicit', then the first empty would 
trigger defaulting, and the data would be 

     <x>a</x><x>b</x><x>c</x><x/><x/><x/><x/><x/>

So we would get defaulting the required index locations, but creation of 
elements with empty string values for optional index locations. 

This allows us to construct an XSD invalid document. I suppose that is no 
big deal, there are many ways to construct data by parsing with DFDL where 
the data proves to be invalid per XSD rules. 

So we really have 3 cases:
1.      EVDP is 'none'
2.      EVDP not 'none' but initiator/terminator such that empty 
representation has no syntax
3.      EVDP not 'none' but initator/termiantor such that empty 
representation DOES have syntax. 
Case 1 = optional elements never populated from ZL strings
Case 2 = optional elements are always populated from ZL strings
Case 3 = optional elements are populated from the non-ZL empty 
representation, optional elements are not populated from ZL strings. 

Daffodil has heretofore been crushing Cases 1 and 2 together with behavior 
of Case 1. 

Does IBM DFDL distinguish all 3 cases? 


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy


On Wed, Aug 1, 2018 at 2:33 PM, Mike Beckerle <mbeckerle.dfdl at gmail.com> 
wrote:
ok, so here's the way I understand this now.

if EVDP is 'none', then empty strings have no representation syntax at 
all. So we don't create optional elements for them period. 

If EVDP is not 'none', then empty string requires some syntax to be there, 
and based on this syntax appearing an empty string value is added to the 
infoset for the optional element. 

Seems simple in retrospect....


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy


On Wed, Aug 1, 2018 at 11:07 AM, Steve Hanson <smh at uk.ibm.com> wrote:
9.4.2.2        Simple element (xs:string or xs:hexBinary) 
Required occurrence: If the element has a default value then an item is 
added to the infoset using the default value, otherwise an item is added 
to the Infoset using empty string (type xs:string) or empty hexBinary 
(type xs:hexBinary) as the value. 
Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none' then 
an item is added to the Infoset using empty string (type xs:string) or 
empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is 
added to the Infoset. 
Note: To prevent unwanted empty strings or empty hexBinary values from 
being added to the Infoset, use XSD minLength > '0' and a dfdl:assert that 
uses the dfdl:checkConstraints() function, to raise a processing error. 
9.2.2        Empty Representation 
An element occurrence has an empty representation if the occurrence does 
not have a nil representation and it conforms to the grammar for 
SimpleEmptyElementRep or ComplexEmptyElementRep. Specifically, the 
EmptyElementInitiator and EmptyElementTerminator regions must be 
conformant with dfdl:emptyValueDelimiterPolicy and the occurrence's 
content in the data stream is of length zero. (If non-conformant it is not 
a processing error and the representation is not empty). LeadingAlignment, 
TrailingAlignment, PrefixLength regions may be present. 

Regards
 
Steve Hanson 
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        Steve Hanson <smh at uk.ibm.com> 
Cc:        dfdl-wg at ogf.org 
Date:        01/08/2018 15:13 
Subject:        Re: [DFDL-WG] clarification: on suppressed ZL 
string/hexBinary - do we keep variable assignments? 



To follow up then, 

I have assumed that dfdl:emptyValueDelimiterPolicy isn't even examined 
unless the element has default="..." i.e., a non-zero-length default value 
specified. I.e., without a default value, there is no concept of emptiness 
as distinct from "normal" representation. 

Are you suggesting that it is also used to control when an empty string 
(or empty hexbinary) is accepted as a normal representation value for an 
optional element, vs. treated as a missing value? That's a reasonable 
interpretation that I would support, but I don't know that the spec says 
that anywhere, so we need to add a sentence. (Unless I'm missing where 
this is stated.) 

I have also thought that dfdl:emptyValueDelimiterPolicy must be combined 
with dfdl:initiator and dfdl:terminator. If the combination of these is 
such that the empty representation is zero-length, that is what creates 
the situation of interest here, where it is ambiguous whether the value is 
the official empty representation or is the normal representation that 
just so happens to be of zero length.  That is, there's no special 
significance to the 'none' EVDP property value. 

For example, if dfdl:emptyValueDelimiterPolicy is 'both', but 
dfdl:initiator="" and dfdl:terminator="", then that's just as good as 
dfdl:emptyValueDelimiterPolicy='none' in terms of whatever effect this has 
on a decision about normal vs. missing. 

Does this match your understanding? 



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com 
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy 


On Wed, Aug 1, 2018 at 9:26 AM, Steve Hanson <smh at uk.ibm.com> wrote: 
Whether to add a zero-length string or hexBinary to the infoset for an 
optional element depends on the setting of emptyValueDelimiterPolicy. A 
setting of 'none' stops it from being added. 

Regardless, it does not give a processing error, so is therefore 
known-to-exist, and therefore does not cause backtracking, so preserving 
discriminators and variables. 

Regards
 
Steve Hanson 
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 



From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        dfdl-wg at ogf.org 
Date:        24/07/2018 15:15 
Subject:        [DFDL-WG] clarification: on suppressed ZL string/hexBinary 
- do we keep variable assignments? 
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org> 




In some situations we parse and get a successful zero-length parse for a 
string or hexBinary. 

But because the occurrence is optional, we do NOT add an element to the 
infoset. 

In that case, what happens to side-effects that occurred during the 
successful parse. There are two possible kinds of side-effects. Variables 
can be set, and a discriminator can be set to true. 

It seems to me that if a discriminator is set, then that *must* be 
preserved, and in that case it would seem the variable settings should be 
retained as well. 

Comments? 

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com 
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy 
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180802/6b26d42a/attachment-0001.html>


More information about the dfdl-wg mailing list