[DFDL-WG] unordered sequence with constrained occurrences

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Mar 6 16:50:23 EST 2013


Yes they are in DFDL v1.0. Just haven't been implemented by anyone yet.

On Wed, Mar 6, 2013 at 3:10 PM, Cranford, Jonathan W.
<jcranford at mitre.org>wrote:

>  > the more I think that unordered sequences should be part of DFDL 1.0.**
> **
>
> ** **
>
> Did I miss something?  Unordered sequence groups are still a part of DFDL
> 1.0, aren’t they?  They might not be supported in either MBTK or Daffodil
> yet, but they are still in the spec, right?****
>
> ** **
>
> Just checking,****
>
> ** **
>
> Jonathan Cranford****
>
> ** **
>
> *From:* dfdl-wg-bounces at ogf.org [mailto:dfdl-wg-bounces at ogf.org] *On
> Behalf Of *Garriss Jr., James P.
> *Sent:* Wednesday, March 06, 2013 11:43 AM
> *To:* dfdl-wg at ogf.org
> *Subject:* Re: [DFDL-WG] unordered sequence with constrained occurrences**
> **
>
> ** **
>
> Suppose I’m modeling IMF headers, many of which can have the exact same
> form, stuff like:****
>
> ** **
>
> From:  john at doe.com****
>
> To:  jane at gmail.com****
>
> Return-Path: bob at yahoo.com****
>
> ** **
>
> Etc.  Remember that these can be in any order, so they are an unordered
> sequence.****
>
> ** **
>
> The way that we’ve modeled these headers so far, the “From:” and “To:” and
> so on have been initiators; they aren’t elements.  But when I use our
> workaround for an unordered sequence, which requires discriminators, I am
> in trouble.  Because the thing that discriminates all of these headers is
> an initiator, not an element.****
>
> ** **
>
> So, it seems to me that I need to change all my headers so that the
> “From:” and “To:” and such are no longer initiators but elements.****
>
> ** **
>
> Does that sound right?****
>
> ** **
>
> The more I work with this workaround, the more hackish it feels, and the
> more I think that unordered sequences should be part of DFDL 1.0.  Maybe?*
> ***
>
> ** **
>
> *From:* Steve Hanson [mailto:smh at uk.ibm.com <smh at uk.ibm.com>]
> *Sent:* Wednesday, March 06, 2013 4:16 AM
> *To:* Garriss Jr., James P.
> *Cc:* dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org
> *Subject:* Re: [DFDL-WG] unordered sequence with constrained occurrences**
> **
>
> ** **
>
> James,
>
> The checkConstraints function is just a convenience that saves you having
> to duplicate constraints in an assert or discriminator. For now, just
> duplicate the constraint as a discriminator. This works fine as long as you
> can express the constraint as a DFDL expression, which with your example
> you can.
>
> I've tested your xsd exactly as you supplied below (without the
> terminator) on my latest MBTK and it parses 'abc' fine. I don't see the
> infinite loop error. We did have some bugs in that area where the check was
> being applied too strictly which we fixed.
>
>
>
> I then tried with 'cba' which parsed without error, except of course that
> the values ended up in the wrong elements. So I added discriminators to
> check that the elements matched their fixed value, and 'cba' then parsed
> into the correct elements.
>
>   <xsd:element dfdl:length="1" dfdl:lengthKind="explicit"
> dfdl:occursCountKind="implicit" fixed="b" minOccurs="0" name="b"
> type="xsd:string">
>        *<xsd:annotation>*
>             <xsd:appinfo source="http://www.ogf.org/dfdl/">
>                  *<dfdl:discriminator>*{. *eq* 'b'}*</dfdl:discriminator>*
>             *</xsd:appinfo>*
>        *</xsd:annotation>*
>   *</xsd:element>*
>
>
>
> I then tried with more complex strings, such as '*cbabaccba*', and they
> all parsed ok.
>
>
>
> To make the infoset more symmetric, with one child per array occurrence,
> you can use a choice instead of a sequence.
>
>
>
> Making that change then results in:
>
>
>
> Here's the xsd with discriminators and choices. See if it works with your
> MBTK.
>
>
>
> If you are still hitting the infinite loop error then add the %NL;
> terminator to the array element. This will parse data of the form:
>
> c
> b
> a
> b
> a
> c
> c
> b
> a
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, OGF DFDL Working Group <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> *smh at uk.ibm.com
> tel:+44-1962-815848 <+44-1962-815848>
>
>
>
> From:        "Garriss Jr., James P." <jgarriss at mitre.org>
> To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
> Date:        05/03/2013 19:15
> Subject:        Re: [DFDL-WG] unordered sequence with constrained
> occurrences
> Sent by:        dfdl-wg-bounces at ogf.org ****
>  ------------------------------
>
>
>
>
> > The error message is because you don't make forward progress through
> the data with potentially unbounded occurrences.
>
> I think you just said, “MBTK prevents an infinite loop.”  That makes sense.
>
> >  If there are delimiters then model those and you might not get the
> error.
>
> I think you just said, “To let MBTK know when it should stop checking, you
> need a terminator of some sort.”  That also makes sense.  So I added a
> terminator (%NL;) here:
>
>
>
> Good news:  That fixed the problem, so long as my input is “abc”.
>
> Bad news:  This breaks if the input is any other legal value, such as
> “abbc” or “cba” or “b”.
>
> The problem for all of these is that my dear friend, checkConstraints, is
> not implemented yet, thus I can’t prevent the parser from slurping up the
> wrong character.  I don’t know how anyone can build a non-trivial DFDL
> schema that involves any sort of choice without this method; I swear, it
> must be the single most important thing you guys have created for DFDL.
>
> Until checkConstraints is implemented, I’m not really able to test this
> schema with MBTK.
>
> Thanks so much for your help answering my questions, Steve!
>
>
> *From:* Steve Hanson [mailto:smh at uk.ibm.com <smh at uk.ibm.com>] *
> Sent:* Tuesday, March 05, 2013 1:46 PM*
> To:* Garriss Jr., James P.*
> Cc:* dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org*
> Subject:* Re: [DFDL-WG] unordered sequence with constrained occurrences
>
> James,
>
> The error message is because you don't make forward progress through the
> data with potentially unbounded occurrences. Is this because you are using
> a cut-down schema?  If there are delimiters then model those and you might
> not get the error.
>
> Once you have processed the array you can use asserts to check the count.
> However IBM DFDL does not implement the count functions yet.
>
> Give me a couple of days to look at this more closely. I have a customer
> visit tomorrow hence the delay.
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, OGF DFDL Working Group <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> *smh at uk.ibm.com*
> *tel:+44-1962-815848 <+44-1962-815848>
>
>
>
> From:        "Garriss Jr., James P." <jgarriss at mitre.org>
> To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
> Date:        05/03/2013 16:19
> Subject:        Re: [DFDL-WG] unordered sequence with constrained
> occurrences
> Sent by:        dfdl-wg-bounces at ogf.org ****
>  ------------------------------
>
>
>
>
>
> Hmmm, maybe not.  I said:
>
> > The unordered sequence can be modeled with a data array
>
> Yet when implemented in MBTK, it throws a fatal error:
>
> fatal: CTDP3148E: Infinite loop at offset 3: The DFDL parser cannot
> process array element 'ABCarray' because maxOccurs is unbounded and the
> length of the previous occurrence was zero.
>
> I think what happens is that on the last pass through the array, it
> doesn’t find a, b, or c, so it throws a fatal error.
>
> So is this a bug in MBTK?  Or can DFDL not model an unordered sequence?
>  Or am I just doing it wrong?
>
> Here’s a sample DFDL schemas that illustrates the point:
>
> <?xml version=*"1.0"* encoding=*"UTF-8"*?>
> <xsd:schema xmlns:dfdl=*"**http://www.ogf.org/dfdl/dfdl-1.0/*<http://www.ogf.org/dfdl/dfdl-1.0/>
> *"*
>      xmlns:fmt=*"**http://www.ibm.com/dfdl/GeneralPurposeFormat*<http://www.ibm.com/dfdl/GeneralPurposeFormat>
> *"*
>      xmlns:ibmSchExtn=*"**http://www.ibm.com/schema/extensions*<http://www.ibm.com/schema/extensions>
> *"* xmlns:xsd=*"**http://www.w3.org/2001/XMLSchema*<http://www.w3.org/2001/XMLSchema>
> *"*>
>      <xsd:import namespace=*"**
> http://www.ibm.com/dfdl/GeneralPurposeFormat*<http://www.ibm.com/dfdl/GeneralPurposeFormat>
> *"*
>            schemaLocation=*"IBMdefined/GeneralPurposeFormat.xsd"* />
>      <xsd:element ibmSchExtn:docRoot=*"true"* name=*"ABC"*>
>            <xsd:complexType>
>                  <xsd:sequence dfdl:separator=*""*>
>                        <xsd:annotation>
>                              <xsd:appinfo source=*"**
> http://www.ogf.org/dfdl/* <http://www.ogf.org/dfdl/>*"*>
>                                    <dfdl:sequence />
>                              </xsd:appinfo>
>                        </xsd:annotation>
>                        <xsd:element dfdl:occursCountKind=*"implicit"*
> maxOccurs=*"unbounded"*
>                              minOccurs=*"1"* name=*"ABCarray"*>
>                              <xsd:complexType>
>                                    <xsd:sequence dfdl:separator=*""*>
>                                          <xsd:element dfdl:length=*"1"*
> dfdl:lengthKind=*"explicit"*
>                                                dfdl:occursCountKind=*
> "implicit"* fixed=*"a"* minOccurs=*"0"* name=*"a"*
>                                                type=*"xsd:string"* />
>                                          <xsd:element dfdl:length=*"1"*
> dfdl:lengthKind=*"explicit"*
>                                                dfdl:occursCountKind=*
> "implicit"* fixed=*"b"* minOccurs=*"0"* name=*"b"*
>                                                type=*"xsd:string"* />
>                                          <xsd:element dfdl:length=*"1"*
> dfdl:lengthKind=*"explicit"*
>                                                dfdl:occursCountKind=*
> "implicit"* fixed=*"c"* minOccurs=*"0"* name=*"c"*
>                                                type=*"xsd:string"* />
>                                    </xsd:sequence>
>                              </xsd:complexType>
>                        </xsd:element>
>                  </xsd:sequence>
>            </xsd:complexType>
>      </xsd:element>
>      <xsd:annotation>
>            <xsd:appinfo source=*"**http://www.ogf.org/dfdl/*<http://www.ogf.org/dfdl/>
> *"*>
>                  <dfdl:format ref=*"fmt:GeneralPurposeFormat"* />
>            </xsd:appinfo>
>      </xsd:annotation>
> </xsd:schema>
>
> Test with “abc” as sample input.
>  *
> From:* Garriss Jr., James P. *
> Sent:* Tuesday, March 05, 2013 8:43 AM*
> To:* dfdl-wg at ogf.org*
> Subject:* unordered sequence with constrained occurrences
>
> Suppose text data has 3 constructs:  a, b, and c.
>
> ·       a must occur 1 time
> ·       b can occur 0 or 1 time
> ·       c can occur any number of times, 0 or more
>
> These 3 constructs can appear in any order.
>
> So these are valid inputs:
>
> abc
> a
> bcccca
>
> But these are not:
>
> ccbcc
> abbc
> abcabc
>
> Can data like this be modeled with DFDL?
>
> The unordered sequence can be modeled with a data array, like this:
>
> Array (0 to unbounded)
> Sequence
>  a (0 to 1)
>  b (0 to 1)
>  c (0 to 1)
> /Sequence
> /Array
>
> But I don’t know how to constrain the total number of occurrences.
>
> Appreciate any ideas!--
> dfdl-wg mailing list
> dfdl-wg at ogf.org
> https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> ****
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>



-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130306/7b20fad3/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.png
Type: image/png
Size: 13398 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130306/7b20fad3/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.gif
Type: image/gif
Size: 8031 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130306/7b20fad3/attachment-0005.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.gif
Type: image/gif
Size: 9109 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130306/7b20fad3/attachment-0006.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.gif
Type: image/gif
Size: 8210 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130306/7b20fad3/attachment-0007.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 5376 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130306/7b20fad3/attachment-0008.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 5275 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130306/7b20fad3/attachment-0009.gif>


More information about the dfdl-wg mailing list