[DFDL-WG] Clarification discussion points for next call(s) - DFDL Spec issues around nil, empty, normal, absent, defaulting, and separator suppression.

Mike Beckerle mbeckerle.dfdl at gmail.com
Thu Aug 2 16:19:09 EDT 2018


I've annotated the discussion below to inform discussion at the next call.
Some of the issues are no longer relevant to discuss.
See RED text.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>


On Thu, Aug 2, 2018 at 2:33 PM, Mike Beckerle <mbeckerle.dfdl at gmail.com>
wrote:

> Before the next call, I'll prune these to only the ones that are
> unresolved as yet in my mind, or where I think we really need to improve
> the spec.
>
> The responses already provided to address some of these, some of which are
> related to, other are redundant with, those threads.
>
> ...mikeb
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> www.tresys.com
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the OGF Intellectual Property Policy
> <http://www.ogf.org/About/abt_policies.php>
>
>
> On Thu, Aug 2, 2018 at 12:33 PM, Steve Hanson <smh at uk.ibm.com> wrote:
>
>> Mike
>>
>> I'm not sure if any of the other threads have answered any of these
>> questions, I will not have time to look at this before the next call.
>>
>> I will say though that an assertion failure is just another way of
>> getting a processing error. I think that's why it's not called out
>> explicitly in 9.2.1 to 9.2.4.
>>
>> IBM DFDL has not implemented stopValue yet, but there was a lot of
>> discussion whether stopValue was a point of uncertainty during action 140.
>> Suggest reading the original action 140 document, which is on Redmine.
>>
>> Regards
>>
>> Steve Hanson
>>
>> IBM Hybrid Integration, Hursley, UK
>> Architect, *IBM DFDL*
>> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
>> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
>> *smh at uk.ibm.com* <smh at uk.ibm.com>
>> tel:+44-1962-815848
>> mob:+44-7717-378890
>> Note: I work Tuesday to Friday
>>
>>
>>
>> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
>> To:        dfdl-wg at ogf.org
>> Date:        20/07/2018 20:51
>> Subject:        [DFDL-WG] Clarification discussion points for next
>> call(s) - DFDL Spec issues around nil, empty, normal, absent, defaulting,
>> and separator suppression.
>> Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>
>> ------------------------------
>>
>>
>>
>> I'm trying to tighten up my understanding of section 9 materials around
>> establishing representation, known to/not-to exist, and defaulting,
>> particularly as they interact with separators and separator suppresssion
>> for "absent" representations.
>>
>> OPEN (minor - just an update to the erratum): Section 9.2.5
>>
>> This sentence
>>
>>      The nil representation can be a zero-length representation if
>> dfdl:nilValue is "%ES;", and there is no framing or framing is suppressed
>> by dfdl:nilValueDelimiterPolicy.
>>
>> Should say
>>
>>       ".... if dfdl:nilValue is a list containing either %ES; alone, or
>> %WSP*; alone, and ...."
>>
>> This matches existing erratum 5.32. It's just another place that needs
>> the same update.
>>
>> OPEN: (This is confusion about the difference between establishing
>> representation and known (or not) to exist, and how the recursion of those
>> concepts plays out.) Sections 9.2.1 to 9.2.4 do not say what happens if
>> when trying to establish the representation, an assertion failure occurs.
>> In particular, can an assertion failure cause establishing of nil, empty,
>> or normal represntation to fail, resulting in absent representation? For
>> example, if an assert requires the length of the value to be greater than 1
>> character, then a zero-length string cannot be the normal representation.
>>
>> But do we parse to nil representation and run assertions - and if fail,
>> parse for empty representation, then re-run assertions, and if fail parse
>> for normal representation, then re-run assertions, and if fail and the rep
>> was "trivially ZL", then it is absent?
>>
> At this point I believe the spec says that normal representation requires
type convertibility to the primitive simple type, so the processing error
that type conversion can cause is part of determining normal
representation. But asserts/discriminators and expression evaluation errors
for setVars, etc. those are all evaluated after representation (including
type conversion, for normal representation) is established, as part of
determining known (or not) to exist.

> OPEN: (same topic - processing errors when establishing representation
>> vs. known/known-not to exit.) Section 9.2.3 doesn't say what happens if
>> a type conversion error happens when trying to establish Normal
>> representation. (no, but 9.3.1.1 DOES say that type conversion is part
>> of determining normal representation) E.g., if a non-defaultable text
>> integer is parsed from a zero-length string. T
>>
>> RESOLVED (Related and should be handled in same clarification about model
>> groups as children of a sequence always behaving as "required".) A
>> section 9.2.5 should be added describing "No representation" - Computed
>> elements (dfdl:inputValueCalc) have no representation at all. Not zero
>> length, not anything. They have no implications for parsing of, or
>> unparsing of, delimiters of any kind.
>>
> Section 9.3
>>
>> RESOLVED: (Just misreading. The status known-to-exists means no
>> processing error occurred (along with a few other things). It doesn't mean
>> no processing error can occur while trying to establish known-to-exist
>> status.) Section 9.3.1.1 says establishing known-to-exist, no processing
>> error can occur.
>>
> For example, an element can have zero-length representation if it is
>> nillable and nilValue=%ES; and there are no delimiters nor framing. But an
>> assert can fail for this by specifically excluding dfdl:valueLength(.) eq
>> 0.  Or an element of type string can have zero length, but an assert can
>> insist the string contain 1 or more characters.
>>
>> OPEN: (part of same issue about assertion failures when establishing
>> representation vs. when determining known/not-known above)) Section
>> 9.3.1.3 says known not to exist if the occurence is Missing, which means
>> absent representation is one way. So can an assert that causes nil, empty,
>> and normal representation to fail (assuming asserts are evaluated and
>> contribute to that decision about representation)  can cause the occurrence
>> to be absent; hence, missing.  It also says a processing error when parsing
>> the component means it is known not to exist.
>>
>>
>> Section 9.3.2 Establishing Representation
>>
>> RESOLVED: (Context of empty representation discussion is about the
>> content region, which must be empty in an empty representation) Section
>> 9.3.2.1 Simple Element
>> (1) already has an erratum allowing WSP* alone as well as ES.
>> Does not say whether (2) empty representation - qualified by "empty
>> representation must be able to be of zero-length".
>> (3) normal representation - does not say if asserts can fail and prevent
>> this zero-length from being acceptable. (actually this point about
>> assertions applies to all of 1, 2, and 3.
>>
>> OPEN: (Email thread about delimited definition in 9.3.2) Section 9.3.2.2
>> Complex Element
>> If a complex type element is lengthKind 'delimited' do we still have to
>> recurse the type before deciding if it is trivially zero length, or can we
>> look at the data stream without recursing in? Section 9.3.2 5th bullet says
>> that we can look for if a delimiter is immediately encountered, and does
>> not say this is for simple types only.
>>
>> A complex type element can have empty representation if the content
>> region is empty and the delimiters with EVDP specify what is found in the
>> data stream.
>> Absent representation for a complex element can only occur if the
>> representation is zero length after recursing through the type tree. This
>> implies that if EVDP indicates delimiters for empty value, then ZL means
>> absent.
>>
>> OPEN: (Not very urgent, as nobody has StopValue implementations as yet.)
>> Section 9.3.3
>>
>> StopValue seems like it has a point of uncertainty for every occurrance.
>> The fact that a stop value must exist doesn't mean there are no points of
>> uncertainty. It is uncertain if the logical value will be the stop value or
>> not.
>>
>> But another way of thinking about it is that the stopvalue parser does
>> not need to establish points for backtracking. All elements MUST succeed
>> until it parses a successful stop value, and if any failure occurs we
>> backtrack the entire array, not just an element.
>>
>> Use of stopValue with type xs:string creates lots of ambiguities. E.g.,
>> ZL can be a valid normal representation, but the stop value may be "stop",
>> i.e., non-ZL. In that case, since all elements are optional according to
>> minOccurs 0, then when a ZL is parsed is the optional element suppressed?
>> Or not - meaning you get an array full of empty string "normal" values?
>>
>> RESOLVED (redundant with discussion 9.3.1.3 above) Section 9.4.2
>>
>> Says a complex type must have descended into the type and returned with
>> no processing error, but does not say whether processing errors signaled by
>> asserts on simple type elements also disable empty representation from
>> being established.
>>
>> RESOLVED (email discussion of EVDP and clarifying EVDP is applicable and
>> 'none' or not 'none', Rest is redundant with discussion above about
>> 9.3.1.3) Section 9.4.2.2
>>
>> Says if EVDP is not none, can an assert insist on something that subverts
>> establishment of empty representation, such as that the length is > 0?
>> Or the assert can test something orthogonal - entirely unrelated like
>> some variable is set to a certain value?
>>
>> E.g., <defineVariable name="disallowEmptyValues" type="xs:boolean".../>
>> Then an assert on the element says { if ($disallowEmptyValues) then false
>> else true }.
>>
>> For optional occurrence, if EVDP is not none, then empty representation
>> is established by the presence of some positive syntax - the representation
>> is not ZL. However, this says a empty string or empty hexbinary becomes the
>> value, not the default value of the element. Is that correct? I would think
>> having positive syntax that matches the empty representation would satisfy
>> known-to-exist, and then that would trigger assigning the default value.
>> However, I suppose the rationale is that such an empty value for an
>> optional element means there will be no defaulting, hence, the empty
>> representation corresponds to empty string or empty hexBinary as "normal"
>> representation.
>>
>> The suggestion to use an assert  that checks a non-zero minLength facet
>> only makes sense if the processing error will cause the end of the array
>> (occursCountKind parsed or implicit). If the OCK is such that there is no
>> point of uncertainty, then this processing error would cause the whole
>> array-element to fail. That is, the assert suggested here really does too
>> much to be used only to filter empty strings/hexBinary from going into the
>> infoset.
>>
>> Or does a ZL failure for a delimited simpleType value, where the text is
>> ZL, but the type conversion fails, or an assert fails, does that create
>> Absent representation resulting in no empty string going into the infoset?
>>
>> I suspect this issue is tied up with separator suppression policy and
>> when a ZL thing is suppressed, the separator absorbed, and nothing goes
>> into the infoset.
>>
>> OPEN (part of discussion of delimited and trivially zero-length for
>> complex type) 9.4.2.3 - suggests that processors must keep track of the
>> "all empty flag" for every infoset node and recursively all child nodes.
>>
>> This section should say that a complex type has empty representation if
>> it is known to exist, and the position in the data doesn't change after a
>> recursive traversal.
>> But this last paragraph of the section contradicts what is said in the
>> 2nd sentence of the section (maybe). The point the example is making - the
>> principle it is illustrating, does not seem to be explicitly stated.
>>
>>
>>
>>
>>
>>
>> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
>> *www.tresys.com* <http://www.tresys.com>
>> Please note: Contributions to the DFDL Workgroup's email discussions are
>> subject to the *OGF Intellectual Property Policy*
>> <http://www.ogf.org/About/abt_policies.php>
>> --
>>  dfdl-wg mailing list
>>  dfdl-wg at ogf.org
>>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>>
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180802/191564da/attachment-0001.html>


More information about the dfdl-wg mailing list