[DFDL-WG] Clarification discussion points for next call(s) - DFDL Spec issues around nil, empty, normal, absent, defaulting, and separator suppression.

Thu Aug 2 12:33:31 EDT 2018

Mike

I'm not sure if any of the other threads have answered any of these 
questions, I will not have time to look at this before the next call.

I will say though that an assertion failure is just another way of getting 
a processing error. I think that's why it's not called out explicitly in 
9.2.1 to 9.2.4.

IBM DFDL has not implemented stopValue yet, but there was a lot of 
discussion whether stopValue was a point of uncertainty during action 140. 
Suggest reading the original action 140 document, which is on Redmine.

Regards

Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
Note: I work Tuesday to Friday 

From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     dfdl-wg at ogf.org
Date:   20/07/2018 20:51
Subject:        [DFDL-WG] Clarification discussion points for next call(s) 
- DFDL Spec issues around nil, empty, normal, absent, defaulting, and 
separator suppression.
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>

I'm trying to tighten up my understanding of section 9 materials around 
establishing representation, known to/not-to exist, and defaulting, 
particularly as they interact with separators and separator suppresssion 
for "absent" representations.

Section 9.2.5

This sentence
     The nil representation can be a zero-length representation if 
dfdl:nilValue is "%ES;", and there is no framing or framing is suppressed 
by dfdl:nilValueDelimiterPolicy. 
Should say
      ".... if dfdl:nilValue is a list containing either %ES; alone, or 
%WSP*; alone, and ...."
This matches existing erratum 5.32. It's just another place that needs the 
same update.

Sections 9.2.1 to 9.2.4 do not say what happens if when trying to 
establish the representation, an assertion failure occurs.  In particular, 
can an assertion failure cause establishing of nil, empty, or normal 
represntation to fail, resulting in absent representation? For example, if 
an assert requires the length of the value to be greater than 1 character, 
then a zero-length string cannot be the normal representation. 

But do we parse to nil representation and run assertions - and if fail, 
parse for empty representation, then re-run assertions, and if fail parse 
for normal representation, then re-run assertions, and if fail and the rep 
was "trivially ZL", then it is absent?

Section 9.2.3 doesn't say what happens if a type conversion error happens 
when trying to establish Normal representation. E.g., if a non-defaultable 
text integer is parsed from a zero-length string. T

A section 9.2.5 should be added describing "No representation" - Computed 
elements (dfdl:inputValueCalc) have no representation at all. Not zero 
length, not anything. They have no implications for parsing of, or 
unparsing of, delimiters of any kind. 

Section 9.3

Section 9.3.1.1 says establishing known-to-exist, no processing error can 
occur.

For example, an element can have zero-length representation if it is 
nillable and nilValue=%ES; and there are no delimiters nor framing. But an 
assert can fail for this by specifically excluding dfdl:valueLength(.) eq 
0.  Or an element of type string can have zero length, but an assert can 
insist the string contain 1 or more characters.

Section 9.3.1.3 says known not to exist if the occurence is Missing, which 
means absent representation is one way. So can an assert that causes nil, 
empty, and normal representation to fail (assuming asserts are evaluated 
and contribute to that decision about representation)  can cause the 
occurrence to be absent; hence, missing.  It also says a processing error 
when parsing the component means it is known not to exist.

Section 9.3.2 Establishing Representation

Section 9.3.2.1 Simple Element
(1) already has an erratum allowing WSP* alone as well as ES.
Does not say whether (2) empty representation - qualified by "empty 
representation must be able to be of zero-length".
(3) normal representation - does not say if asserts can fail and prevent 
this zero-length from being acceptable. (actually this point about 
assertions applies to all of 1, 2, and 3. 

Section 9.3.2.2 Complex Element
If a complex type element is lengthKind 'delimited' do we still have to 
recurse the type before deciding if it is trivially zero length, or can we 
look at the data stream without recursing in? Section 9.3.2 5th bullet 
says that we can look for if a delimiter is immediately encountered, and 
does not say this is for simple types only.

A complex type element can have empty representation if the content region 
is empty and the delimiters with EVDP specify what is found in the data 
stream. 
Absent representation for a complex element can only occur if the 
representation is zero length after recursing through the type tree. This 
implies that if EVDP indicates delimiters for empty value, then ZL means 
absent. 

Section 9.3.3 

StopValue seems like it has a point of uncertainty for every occurrance. 
The fact that a stop value must exist doesn't mean there are no points of 
uncertainty. It is uncertain if the logical value will be the stop value 
or not.

But another way of thinking about it is that the stopvalue parser does not 
need to establish points for backtracking. All elements MUST succeed until 
it parses a successful stop value, and if any failure occurs we backtrack 
the entire array, not just an element. 

Use of stopValue with type xs:string creates lots of ambiguities. E.g., ZL 
can be a valid normal representation, but the stop value may be "stop", 
i.e., non-ZL. In that case, since all elements are optional according to 
minOccurs 0, then when a ZL is parsed is the optional element suppressed? 
Or not - meaning you get an array full of empty string "normal" values?

Section 9.4.2

Says a complex type must have descended into the type and returned with no 
processing error, but does not say whether processing errors signaled by 
asserts on simple type elements also disable empty representation from 
being established.

Section 9.4.2.2

Says if EVDP is not none, can an assert insist on something that subverts 
establishment of empty representation, such as that the length is > 0?
Or the assert can test something orthogonal - entirely unrelated like some 
variable is set to a certain value?

E.g., <defineVariable name="disallowEmptyValues" type="xs:boolean".../>
Then an assert on the element says { if ($disallowEmptyValues) then false 
else true }.

For optional occurrence, if EVDP is not none, then empty representation is 
established by the presence of some positive syntax - the representation 
is not ZL. However, this says a empty string or empty hexbinary becomes 
the value, not the default value of the element. Is that correct? I would 
think having positive syntax that matches the empty representation would 
satisfy known-to-exist, and then that would trigger assigning the default 
value.  However, I suppose the rationale is that such an empty value for 
an optional element means there will be no defaulting, hence, the empty 
representation corresponds to empty string or empty hexBinary as "normal" 
representation. 

The suggestion to use an assert  that checks a non-zero minLength facet 
only makes sense if the processing error will cause the end of the array 
(occursCountKind parsed or implicit). If the OCK is such that there is no 
point of uncertainty, then this processing error would cause the whole 
array-element to fail. That is, the assert suggested here really does too 
much to be used only to filter empty strings/hexBinary from going into the 
infoset. 

Or does a ZL failure for a delimited simpleType value, where the text is 
ZL, but the type conversion fails, or an assert fails, does that create 
Absent representation resulting in no empty string going into the infoset?

I suspect this issue is tied up with separator suppression policy and when 
a ZL thing is suppressed, the separator absorbed, and nothing goes into 
the infoset. 

9.4.2.3 - suggests that processors must keep track of the "all empty flag" 
for every infoset node and recursively all child nodes. 

This section should say that a complex type has empty representation if it 
is known to exist, and the position in the data doesn't change after a 
recursive traversal. 
But this last paragraph of the section contradicts what is said in the 2nd 
sentence of the section (maybe). The point the example is making - the 
principle it is illustrating, does not seem to be explicilty stated. 

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org

https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180802/ede5f53a/attachment.html>