[DFDL-WG] Action 280 minOccurs='0' choice branch (was: Re: OCK expression and count of 0 for a choice member....)

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue Jun 2 13:41:34 EDT 2015


I believe this action item remains open still and I would like to revive
the discussion.

I was coding up this aspect of Daffodil and have hit this subject head on.

In section 15 the spec clearly states that the root of a choice branch
cannot be optional, that is cannot have minOccurs="0".

That language is very specific, and it leaves open the possibility of
"effectively optional" things being the roots of choice branches (e.g.,
using OCK 'parsed' or 'expression')

It also allows one to trivially wrap a sequence (having no delimiters,
alignment or skips) around an element (or element ref) carrying
minOccurs="0" so as to simply dodge the restriction.

It was observed in the thread below that we cannot require choice branches
to be scalar elements as there is a need for hidden groups to be branches
of choices, and
for empty sequences carrying only asserts, as another non-element example.

Related: the DFDL spec also specifies that an element that is the root of a
choice branch cannot carry dfdl:inputValueCalc. The spec does NOT restrict
use of dfdl:outputValueCalc on the root of a choice branch, but the meaning
of such is unclear to me.

The existing restriction of "no minOccurs="0" on the root of a choice
branch seems not to accomplish anything. It is only for
occursCountKind='implicit' where this can be meaningful it seems.

Requiring the root of a choice branch to not be "variable occurrence" if it
is an element would accomplish something, but it is not clear this is
needed to eliminate ambiguity or if the ambiguity can be eliminated without
any restriction.

The stable design points I can think of are:
1) root of a choice branch must be scalar (so, only a sequence, choice, or
an element where minOccurs == maxOccurs == 1.)
2) root of a choice branch cannot be optional - for a broad sense of the
word optional - precludes arrays with OCK expression and parsed, and
implicit if minOccurs="0". Fixed length arrays would be allowed.
3) a choice branch must have some syntax

I think we discarded (3) because choice branches that really just reflect
error checking - contain only dfdl:asserts for example - are in use and
serve a useful purpose.

Daffodil's test suite has much use of choice branches that look like this:
<choicie>
.....
<sequence>
  <element name="foo" dfdl:inputValueCalc="{....}"/>
</sequence>
</choice>

These have no syntax. This allowing a kind of default-element to be
computed. In most (could be all, I've not searched exhaustively) of these
cases the IVC expression is a constant.  But note that the sequence wrapped
around the IVC element is just dodging the restriction that a choice branch
cannot be an IVC element (which is another restriction that seems
unnecessary.)

...mike


Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>


On Mon, Apr 27, 2015 at 9:30 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> Mike
>
> A couple of comments:
>
> 1) You said below
>
> *Optional here means "not required by the DFDL format", as in
> occursCountKind cannot be 'parsed' at all, because all occurrences are then
> not required, and the min/maxOccurs are only examined for validation
> purposes, also occursCountKind cannot be 'implicit' for the same reasons**,
> and occursCountKind 'expression' also.  *
>
> OccursCountKind 'implicit' is allowed, because minOccurs is used for
> parsing and micOccurs can not be 0.
>
> 2) You said below
>
> *Wrapping the array element in a sequence doesn't solve the problem unless
> the sequence has a required piece of syntax such as an initiator or
> terminator, or a hiddenGroupRef to a not-optional (recursively) thing.*
>
> A sequence has minOccurs '1' so it does satisfy the spec rule about the
> child of a choice being required. Such a sequence could have no syntax and
> could contain an element with minOccurs '0' or even be empty. I have seen
> DFDL schemas that contain a choice with the last branch being an empty
> sequence that contains an assert fn:false() in order to throw a processing
> error.
>
> Regards
>
> Steve Hanson
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        Alex Wood1/UK/IBM at IBMGB
> Cc:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
> Date:        27/04/2015 13:35
> Subject:        Re: [DFDL-WG] OCK expression and count of 0 for a choice
> member....
> Sent by:        dfdl-wg-bounces at ogf.org
> ------------------------------
>
>
>
>
> I believe any use of occursCountKind 'expression' on an element that is
> the first element on a branch of a choice should be an SDE.
>
> This is one of the cases where DFDL requires one to introduce an element
> that would not be necessary in an ordinary XML schema, but is necessary
> because DFDL does not have XML's easily parsed syntax to depend on.
>
> This is my opinion. I think we need to look at whether this restriction is
> either
>
> (a) necessary
> (b) necessary to avoid excessive complexity in implementations
> (c) unnecessary - but is the intention of what is specified already
> (despite shortcomings of the prose/description in the spec, which could be
> corrected.)
> (d) an error in the specification
>
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com/>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
>
>
> On Mon, Apr 27, 2015 at 5:49 AM, Alex Wood1 <*WOODA at uk.ibm.com*
> <WOODA at uk.ibm.com>> wrote:
> Hi Mike,
>
> Can you clarify if you are saying that OCK expression should be prohibited
> completely on a choice member (as occurrences for OCK expression are
> potentially optional regardless of minOccurs value)
>
> Or is your statement that it should cause an SDE specific to the count==0
> case?
>
>
> Kind Regards,
>
> - Alex
>
> Alex Wood -
> Software Engineer -
> WebSphere Message Broker Development
> DFDL Development
>
> MP 211, IBM UK Labs, Hursley Park, Winchester, Hants. SO21 2JN.
> Tel: Internal 246272, External 01962 816272
> Notes: Alex Wood1/UK/IBM at IBMGB
> e-mail: *wooda at uk.ibm.com* <wooda at uk.ibm.com>
>
>
>
>
> From:        Mike Beckerle <*mbeckerle.dfdl at gmail.com*
> <mbeckerle.dfdl at gmail.com>>
> To:        Alex Wood1/UK/IBM at IBMGB
> Date:        24/04/2015 15:10
> Subject:        Re: [DFDL-WG] OCK expression and count of 0 for a choice
> member....
>  ------------------------------
>
>
>
> I think this is an SDE.
>
> Choice branches cannot be optional.
>
> Optional here, does not mean minOccurs == 0, because for many
> occursCountKinds, that's never checked unless validation is on, and
> validation doesn't guide parsing anyway.
>
> Optional here means "not required by the DFDL format", as in
> occursCountKind cannot be 'parsed' at all, because all occurrences are then
> not required, and the min/maxOccurs are only examined for validation
> purposes, also occursCountKind cannot be 'implicit' for the same reasons,
> and occursCountKind 'expression' also.
>
> Wrapping the array element in a sequence doesn't solve the problem unless
> the sequence has a required piece of syntax such as an initiator or
> terminator, or a hiddenGroupRef to a not-optional (recursively) thing.
>
> Even initiator and terminator are tricky, because in a non-delimited
> format, those can be %WSP*; which can match nothing at all; hence, they do
> not "require" any syntax.
>
>
>
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> *www.tresys.com* <http://www.tresys.com/>
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the *OGF Intellectual Property Policy*
> <http://www.ogf.org/About/abt_policies.php>
>
>
> On Fri, Apr 24, 2015 at 9:07 AM, Alex Wood1 <*WOODA at uk.ibm.com*
> <WOODA at uk.ibm.com>> wrote:
> Hi All,
>
> Please see below for a history of the issue.
> This arose from fuzz testing of the IBM DFDL parser which produced a test
> with a coutn of 0 for  an OCK expression array which was a choice member.
> And subsequent reference to the specification.
>
> It was not clear what the correct outcome should be in a choice where the
> first member is an array using OCK expression where the count resolves to 0.
> a.) resolve the choice to the zero length array
> b.) move to the next choice branch
> c.) throw an error
>
>
> Kind Regards,
>
> - Alex
>
> Alex Wood -
> Software Engineer -
> WebSphere Message Broker Development
> DFDL Development
>
> MP 211, IBM UK Labs, Hursley Park, Winchester, Hants. SO21 2JN.
> Tel: Internal 246272, External 01962 816272
> Notes: Alex Wood1/UK/IBM at IBMGB
> e-mail: *wooda at uk.ibm.com* <wooda at uk.ibm.com>
>
>
>
>
> From:        Steve Hanson/UK/IBM
> To:        Alex Wood1/UK/IBM at IBMGB
> Cc:        Andrew Edwards/UK/IBM at IBMGB, Mark Frost/UK/IBM
> Date:        24/04/2015 09:19
> Subject:        Re: OCK expression and count of 0 for a choice member....
>  ------------------------------
>
>
> When I wrote the paragraph below, the one thing that troubled me was that
> the spec defines known-to-exist and known-not-to-exist in terms of
> occurrences. In the choice branch example, it is the element as a whole we
> are looking at. That's fine for scalar as element == occurrence but for an
> array it's not the same.  I think the spec is missing a definition of what
> 'missing' means for an array element. I would say that an array element is
> missing if all occurrences are missing. And an array element is not missing
> if any occurrence has a representation (empty, nil, normal).  With that in
> place, my paragraph makes sense, I think.
>
> I believe we have the same issue with 'parsed' and 'stopValue'.
>
> Regards
>
> Steve Hanson
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:*+44-1962-815848* <%2B44-1962-815848>
>
>
>
>
> From:        Steve Hanson/UK/IBM
> To:        Alex Wood1/UK/IBM at IBMGB
> Cc:        Andrew Edwards/UK/IBM at IBMGB, Mark Frost/UK/IBM at IBMGB
> Date:        23/04/2015 18:52
> Subject:        Re: OCK expression and count of 0 for a choice member....
>  ------------------------------
>
>
> Here is one interpretation...
>
> A choice is resolved by parsing the branches until one is known-to-exist
> as described in section 9.3.3.  Section 9.3.1.2 defines known-to-exist (in
> the absence of a discriminator, initiator or direct dispatch) as an
> occurrence having empty, nil or normal representation. Section 9.3.1.3
> defines known-not-to-exist (again in the absence of a discriminator,
> initiator or direct dispatchm or an assert) as an occurrence being missing
> or causing a processing error. If occursCount is zero no occurrences are
> looked for in the data (erratum 5.9) so the element has no representation
> and must be missing. Therefore a choice branch containing such an element
> is known-not-to-exist.
>
> So in your example, the first choice branch containing myInt is
> known-not-to-exist and the parser tries the next branch.
>
> This appears to contradict section 15.1.1 though. I suspect that 15.1.1
> was not updated to match section 9.3 when the latter was added.
>
> If you want to make the first choice branch known-to-exist when the count
> is zero then I think wrapping myInt in a sequence would work. Or wrapping
> myInt in a complex element.
>
> Definitely one to take to the WG though, if only to correct section 15.1.1
> to match section 9.
>
> Regards
>
> Steve Hanson
> Architect, *IBM DFDL*
> <http://www.ibm.com/developerworks/library/se-dfdl/index.html>
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK
> *smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:*+44-1962-815848* <%2B44-1962-815848>
>
>
>
>
> From:        Alex Wood1/UK/IBM
> To:        Steve Hanson/UK/IBM at IBMGB
> Cc:        Andrew Edwards/UK/IBM at IBMGB, Mark Frost/UK/IBM at IBMGB
> Date:        23/04/2015 16:33
> Subject:        OCK expression and count of 0 for a choice member....
>  ------------------------------
>
>
> Hi Steve
>
> Just been discussing this with Andy and Mark.
> I think the spec
>
> <xs:element name="Choice_Expression" dfdl:ref="config"
> dfdl:lengthKind="implicit">
>   <xs:complexType>
>      <xs:sequence dfdl:ref="config">
>             <xs:element ref="*myCount*"></xs:element>
>                     <xs:choice dfdl:choiceLengthKind="implicit"
> dfdl:ref="config">
>
>                     <xs:element ref="*myInt*" minOccurs="1"
> maxOccurs="3"></xs:element>
>                             <xs:element ref="myTxt"></xs:element>
>                 </xs:choice>
>      </xs:sequence>
>   </xs:complexType>
> </xs:element>
>
> Where *myInt*  has occursCountKind="expression" occursCount="{../myCount}"
>
> A given instance of this message could have *myCount*==0
>
> Is this valid?
> Should it resolve to 0 occurrences of myInt or move on to myTxt ?
>
> Section15 of the spec says:
>
> The Root of the Branch MUST NOT be optional. That is XSDL minOccurs MUST
> BE greater than 0.
>
> But in this case minOccurs is >0.
>
> Assuming this is not an error then in terms of resolving the choice
> section 15.1.1 says..
>
> 15.1.1 Resolving Choices via Speculation Speculative resolution works as
> follows:
> 1) Attempt to parse the first branch of the choice.
> 2) If this fails with a processing error
> a) If a dfdl:discriminator evaluated to true earlier on this branch then
> the parser is 'bound' to this branch and parsing of the entire choice
> construct fails with a processing error.
> b) If the branch has a dfdl:initiator and the choice has
> dfdl:initiatedContent ‘yes’ then the parser is 'bound' to this branch and
> parsing of the entire choice construct fails with a processing error. c)
> Otherwise we repeat from step 1 for the next branch of the choice.
> 3) It is a processing error if the branches of the choice are exhausted.
> 4) If a branch is successfully parsed without error, then that branch's
> infoset becomes the infoset for the parse of the choice construct.
>
> So seems like this is 4.) we did not fail to parse myInt...
>
> However talking with mark about real scenarios that this might apply to, a
> choice two repeating fields with counts earlier in the data only one of
> which must appear. you'd expect 0 of the first means >0 of the second and
> visa versa... So you'd probably want 0 myInt allowed the choice to resolve
> to myTxt.
>
> Thoughts ?
>
> If you agree we need more clarity in he spec will forward to WG.
>
>
> Kind Regards,
>
> - Alex
>
> Alex Wood -
> Software Engineer -
> WebSphere Message Broker Development
> DFDL Development
>
> MP 211, IBM UK Labs, Hursley Park, Winchester, Hants. SO21 2JN.
> Tel: Internal 246272, External 01962 816272
> Notes: Alex Wood1/UK/IBM at IBMGB
> e-mail: *wooda at uk.ibm.com* <wooda at uk.ibm.com>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> --
>   dfdl-wg mailing list
>   *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
>   *https://www.ogf.org/mailman/listinfo/dfdl-wg*
> <https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20150602/4b23eb5f/attachment-0001.html>


More information about the dfdl-wg mailing list