[DFDL-WG] Clarification discussion points for next call(s) - DFDL Spec issues around nil, empty, normal, absent, defaulting, and separator suppression.

Fri Jul 20 15:45:28 EDT 2018

I'm trying to tighten up my understanding of section 9 materials around
establishing representation, known to/not-to exist, and defaulting,
particularly as they interact with separators and separator suppresssion
for "absent" representations.

Section 9.2.5

This sentence

     The nil representation can be a zero-length representation if
dfdl:nilValue is "%ES;", and there is no framing or framing is suppressed
by dfdl:nilValueDelimiterPolicy.

Should say

      ".... if dfdl:nilValue is a list containing either %ES; alone, or
%WSP*; alone, and ...."

This matches existing erratum 5.32. It's just another place that needs the
same update.

Sections 9.2.1 to 9.2.4 do not say what happens if when trying to establish
the representation, an assertion failure occurs.  In particular, can an
assertion failure cause establishing of nil, empty, or normal represntation
to fail, resulting in absent representation? For example, if an assert
requires the length of the value to be greater than 1 character, then a
zero-length string cannot be the normal representation.

But do we parse to nil representation and run assertions - and if fail,
parse for empty representation, then re-run assertions, and if fail parse
for normal representation, then re-run assertions, and if fail and the rep
was "trivially ZL", then it is absent?

Section 9.2.3 doesn't say what happens if a type conversion error happens
when trying to establish Normal representation. E.g., if a non-defaultable
text integer is parsed from a zero-length string. T

A section 9.2.5 should be added describing "No representation" - Computed
elements (dfdl:inputValueCalc) have no representation at all. Not zero
length, not anything. They have no implications for parsing of, or
unparsing of, delimiters of any kind.

Section 9.3

Section 9.3.1.1 says establishing known-to-exist, no processing error can
occur.

For example, an element can have zero-length representation if it is
nillable and nilValue=%ES; and there are no delimiters nor framing. But an
assert can fail for this by specifically excluding dfdl:valueLength(.) eq
0.  Or an element of type string can have zero length, but an assert can
insist the string contain 1 or more characters.

Section 9.3.1.3 says known not to exist if the occurence is Missing, which
means absent representation is one way. So can an assert that causes nil,
empty, and normal representation to fail (assuming asserts are evaluated
and contribute to that decision about representation)  can cause the
occurrence to be absent; hence, missing.  It also says a processing error
when parsing the component means it is known not to exist.

Section 9.3.2 Establishing Representation

Section 9.3.2.1 Simple Element
(1) already has an erratum allowing WSP* alone as well as ES.
Does not say whether (2) empty representation - qualified by "empty
representation must be able to be of zero-length".
(3) normal representation - does not say if asserts can fail and prevent
this zero-length from being acceptable. (actually this point about
assertions applies to all of 1, 2, and 3.

Section 9.3.2.2 Complex Element
If a complex type element is lengthKind 'delimited' do we still have to
recurse the type before deciding if it is trivially zero length, or can we
look at the data stream without recursing in? Section 9.3.2 5th bullet says
that we can look for if a delimiter is immediately encountered, and does
not say this is for simple types only.

A complex type element can have empty representation if the content region
is empty and the delimiters with EVDP specify what is found in the data
stream.
Absent representation for a complex element can only occur if the
representation is zero length after recursing through the type tree. This
implies that if EVDP indicates delimiters for empty value, then ZL means
absent.

Section 9.3.3

StopValue seems like it has a point of uncertainty for every occurrance.
The fact that a stop value must exist doesn't mean there are no points of
uncertainty. It is uncertain if the logical value will be the stop value or
not.

But another way of thinking about it is that the stopvalue parser does not
need to establish points for backtracking. All elements MUST succeed until
it parses a successful stop value, and if any failure occurs we backtrack
the entire array, not just an element.

Use of stopValue with type xs:string creates lots of ambiguities. E.g., ZL
can be a valid normal representation, but the stop value may be "stop",
i.e., non-ZL. In that case, since all elements are optional according to
minOccurs 0, then when a ZL is parsed is the optional element suppressed?
Or not - meaning you get an array full of empty string "normal" values?

Section 9.4.2

Says a complex type must have descended into the type and returned with no
processing error, but does not say whether processing errors signaled by
asserts on simple type elements also disable empty representation from
being established.

Section 9.4.2.2

Says if EVDP is not none, can an assert insist on something that subverts
establishment of empty representation, such as that the length is > 0?
Or the assert can test something orthogonal - entirely unrelated like some
variable is set to a certain value?

E.g., <defineVariable name="disallowEmptyValues" type="xs:boolean".../>
Then an assert on the element says { if ($disallowEmptyValues) then false
else true }.

For optional occurrence, if EVDP is not none, then empty representation is
established by the presence of some positive syntax - the representation is
not ZL. However, this says a empty string or empty hexbinary becomes the
value, not the default value of the element. Is that correct? I would think
having positive syntax that matches the empty representation would satisfy
known-to-exist, and then that would trigger assigning the default value.
However, I suppose the rationale is that such an empty value for an
optional element means there will be no defaulting, hence, the empty
representation corresponds to empty string or empty hexBinary as "normal"
representation.

The suggestion to use an assert  that checks a non-zero minLength facet
only makes sense if the processing error will cause the end of the array
(occursCountKind parsed or implicit). If the OCK is such that there is no
point of uncertainty, then this processing error would cause the whole
array-element to fail. That is, the assert suggested here really does too
much to be used only to filter empty strings/hexBinary from going into the
infoset.

Or does a ZL failure for a delimited simpleType value, where the text is
ZL, but the type conversion fails, or an assert fails, does that create
Absent representation resulting in no empty string going into the infoset?

I suspect this issue is tied up with separator suppression policy and when
a ZL thing is suppressed, the separator absorbed, and nothing goes into the
infoset.

9.4.2.3 - suggests that processors must keep track of the "all empty flag"
for every infoset node and recursively all child nodes.

This section should say that a complex type has empty representation if it
is known to exist, and the position in the data doesn't change after a
recursive traversal.
But this last paragraph of the section contradicts what is said in the 2nd
sentence of the section (maybe). The point the example is making - the
principle it is illustrating, does not seem to be explicilty stated.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180720/c81368b9/attachment.html>