[DFDL-WG] unordered sequence with constrained occurrences

Steve Hanson smh at uk.ibm.com
Thu Mar 7 06:14:23 EST 2013


James

The purpose of a discriminator is to check that the data matches the model 
at a 'point of uncertainty' such as a choice branch or an optional element 
or a variable occurence array.  If there is no discriminator then any 
processing error causes the parser to backtrack and try the next thing in 
the model. If there is a discriminator and it evaluates to false then the 
parser again backtracks and tries the next thing in the model. If it 
evaluates to true, then the parser knows for sure it is in the right place 
in the model - and crucially that any subsequent processing error is a 
hard error and does not cause backtracking at that point of uncertainty.

In the 'abc' example you gave, there are no initiators, so a discriminator 
must be used that looks at the data content. But if you have an initiator, 
like IMF headers have, you can use the initiator to do the discrimination. 
Set property dfdl:initiatedContent 'yes' on the choice itself. This acts 
just like a discriminator when an initiator matches and will stop 
backtracking happening. You no longer need the discriminators. 

   <xsd:element dfdl:occursCountKind="implicit" 
dfdl:terminator="%NL;%WSP*;" 
                maxOccurs="unbounded" name="HeaderArray"> 
     <xsd:complexType> 
       <xsd:choice dfdl:choiceLengthKind="implicit" 
dfdl:initiatedContent="yes"> 
         <xsd:element name="From" type="xsd:string" 
dfdl:initiator="From:%WSP*;" dfdl:ignoreCase="yes"/>
         <xsd:element name="To" type="xsd:string" 
dfdl:initiator="To:%WSP*;" dfdl:ignoreCase="yes"/>
         <xsd:element name="ReturnPath" type="xsd:string" 
dfdl:initiator="Return-Path:%WSP*;" dfdl:ignoreCase="yes"/>
       </xsd:choice> 
     </xsd:complexType> 
   </xsd:element> 

If the %NL; is not always going to be present, and the data content of one 
element can be terminated by the initiator of the next element, then you 
need to use lengthKind 'pattern' as Mike showed in his mail. (It wasn't 
clear to me whether that was the same or different IMF data).

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   "Garriss Jr., James P." <jgarriss at mitre.org>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:   06/03/2013 19:05
Subject:        Re: [DFDL-WG] unordered sequence with constrained 
occurrences
Sent by:        dfdl-wg-bounces at ogf.org



Suppose I’m modeling IMF headers, many of which can have the exact same 
form, stuff like:
 
From:  john at doe.com
To:  jane at gmail.com
Return-Path: bob at yahoo.com
 
Etc.  Remember that these can be in any order, so they are an unordered 
sequence.
 
The way that we’ve modeled these headers so far, the “From:” and “To:” and 
so on have been initiators; they aren’t elements.  But when I use our 
workaround for an unordered sequence, which requires discriminators, I am 
in trouble.  Because the thing that discriminates all of these headers is 
an initiator, not an element.
 
So, it seems to me that I need to change all my headers so that the 
“From:” and “To:” and such are no longer initiators but elements.
 
Does that sound right?
 
The more I work with this workaround, the more hackish it feels, and the 
more I think that unordered sequences should be part of DFDL 1.0.  Maybe?
 
From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: Wednesday, March 06, 2013 4:16 AM
To: Garriss Jr., James P.
Cc: dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org
Subject: Re: [DFDL-WG] unordered sequence with constrained occurrences
 
James, 

The checkConstraints function is just a convenience that saves you having 
to duplicate constraints in an assert or discriminator. For now, just 
duplicate the constraint as a discriminator. This works fine as long as 
you can express the constraint as a DFDL expression, which with your 
example you can. 

I've tested your xsd exactly as you supplied below (without the 
terminator) on my latest MBTK and it parses 'abc' fine. I don't see the 
infinite loop error. We did have some bugs in that area where the check 
was being applied too strictly which we fixed. 



I then tried with 'cba' which parsed without error, except of course that 
the values ended up in the wrong elements. So I added discriminators to 
check that the elements matched their fixed value, and 'cba' then parsed 
into the correct elements. 

  <xsd:element dfdl:length="1" dfdl:lengthKind="explicit" 
dfdl:occursCountKind="implicit" fixed="b" minOccurs="0" name="b" 
type="xsd:string"> 
       <xsd:annotation> 
            <xsd:appinfo source="http://www.ogf.org/dfdl/"> 
                 <dfdl:discriminator>{. eq 'b'}</dfdl:discriminator> 
            </xsd:appinfo> 
       </xsd:annotation> 
  </xsd:element> 



I then tried with more complex strings, such as 'cbabaccba', and they all 
parsed ok. 



To make the infoset more symmetric, with one child per array occurrence, 
you can use a choice instead of a sequence. 



Making that change then results in: 



Here's the xsd with discriminators and choices. See if it works with your 
MBTK. 



If you are still hitting the infinite loop error then add the %NL; 
terminator to the array element. This will parse data of the form: 

c 
b 
a 
b 
a 
c 
c 
b 
a 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        "Garriss Jr., James P." <jgarriss at mitre.org> 
To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:        05/03/2013 19:15 
Subject:        Re: [DFDL-WG] unordered sequence with constrained 
occurrences 
Sent by:        dfdl-wg-bounces at ogf.org 




> The error message is because you don't make forward progress through the 
data with potentially unbounded occurrences. 
  
I think you just said, “MBTK prevents an infinite loop.”  That makes 
sense. 
  
>  If there are delimiters then model those and you might not get the 
error. 
  
I think you just said, “To let MBTK know when it should stop checking, you 
need a terminator of some sort.”  That also makes sense.  So I added a 
terminator (%NL;) here: 
  

  
Good news:  That fixed the problem, so long as my input is “abc”. 
  
Bad news:  This breaks if the input is any other legal value, such as 
“abbc” or “cba” or “b”. 
  
The problem for all of these is that my dear friend, checkConstraints, is 
not implemented yet, thus I can’t prevent the parser from slurping up the 
wrong character.  I don’t know how anyone can build a non-trivial DFDL 
schema that involves any sort of choice without this method; I swear, it 
must be the single most important thing you guys have created for DFDL. 
  
Until checkConstraints is implemented, I’m not really able to test this 
schema with MBTK. 
  
Thanks so much for your help answering my questions, Steve! 
  
  
From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: Tuesday, March 05, 2013 1:46 PM
To: Garriss Jr., James P.
Cc: dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org
Subject: Re: [DFDL-WG] unordered sequence with constrained occurrences 
  
James, 

The error message is because you don't make forward progress through the 
data with potentially unbounded occurrences. Is this because you are using 
a cut-down schema?  If there are delimiters then model those and you might 
not get the error. 

Once you have processed the array you can use asserts to check the count. 
However IBM DFDL does not implement the count functions yet. 

Give me a couple of days to look at this more closely. I have a customer 
visit tomorrow hence the delay. 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        "Garriss Jr., James P." <jgarriss at mitre.org> 
To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:        05/03/2013 16:19 
Subject:        Re: [DFDL-WG] unordered sequence with constrained 
occurrences 
Sent by:        dfdl-wg-bounces at ogf.org 





Hmmm, maybe not.  I said: 
 
> The unordered sequence can be modeled with a data array 
 
Yet when implemented in MBTK, it throws a fatal error: 
 
fatal: CTDP3148E: Infinite loop at offset 3: The DFDL parser cannot 
process array element 'ABCarray' because maxOccurs is unbounded and the 
length of the previous occurrence was zero.   
 
I think what happens is that on the last pass through the array, it 
doesn’t find a, b, or c, so it throws a fatal error. 
 
So is this a bug in MBTK?  Or can DFDL not model an unordered sequence? Or 
am I just doing it wrong? 
 
Here’s a sample DFDL schemas that illustrates the point: 
 
<?xml version="1.0" encoding="UTF-8"?> 
<xsd:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" 
     xmlns:fmt="http://www.ibm.com/dfdl/GeneralPurposeFormat" 
     xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:xsd="
http://www.w3.org/2001/XMLSchema"> 
     <xsd:import namespace="http://www.ibm.com/dfdl/GeneralPurposeFormat" 
           schemaLocation="IBMdefined/GeneralPurposeFormat.xsd" /> 
     <xsd:element ibmSchExtn:docRoot="true" name="ABC"> 
           <xsd:complexType> 
                 <xsd:sequence dfdl:separator=""> 
                       <xsd:annotation> 
                             <xsd:appinfo source="http://www.ogf.org/dfdl/
"> 
                                   <dfdl:sequence /> 
                             </xsd:appinfo> 
                       </xsd:annotation> 
                       <xsd:element dfdl:occursCountKind="implicit" 
maxOccurs="unbounded" 
                             minOccurs="1" name="ABCarray"> 
                             <xsd:complexType> 
                                   <xsd:sequence dfdl:separator=""> 
                                         <xsd:element dfdl:length="1" 
dfdl:lengthKind="explicit" 
                                               dfdl:occursCountKind=
"implicit" fixed="a" minOccurs="0" name="a" 
                                               type="xsd:string" /> 
                                         <xsd:element dfdl:length="1" 
dfdl:lengthKind="explicit" 
                                               dfdl:occursCountKind=
"implicit" fixed="b" minOccurs="0" name="b" 
                                               type="xsd:string" /> 
                                         <xsd:element dfdl:length="1" 
dfdl:lengthKind="explicit" 
                                               dfdl:occursCountKind=
"implicit" fixed="c" minOccurs="0" name="c" 
                                               type="xsd:string" /> 
                                   </xsd:sequence> 
                             </xsd:complexType> 
                       </xsd:element> 
                 </xsd:sequence> 
           </xsd:complexType> 
     </xsd:element> 
     <xsd:annotation> 
           <xsd:appinfo source="http://www.ogf.org/dfdl/"> 
                 <dfdl:format ref="fmt:GeneralPurposeFormat" /> 
           </xsd:appinfo> 
     </xsd:annotation> 
</xsd:schema> 
 
Test with “abc” as sample input. 
 
From: Garriss Jr., James P. 
Sent: Tuesday, March 05, 2013 8:43 AM
To: dfdl-wg at ogf.org
Subject: unordered sequence with constrained occurrences 
 
Suppose text data has 3 constructs:  a, b, and c. 
 
·       a must occur 1 time 
·       b can occur 0 or 1 time 
·       c can occur any number of times, 0 or more 
 
These 3 constructs can appear in any order. 
 
So these are valid inputs: 
 
abc 
a 
bcccca 
 
But these are not: 
 
ccbcc   
abbc 
abcabc 
 
Can data like this be modeled with DFDL? 
 
The unordered sequence can be modeled with a data array, like this: 
 
Array (0 to unbounded) 
Sequence 
 a (0 to 1) 
 b (0 to 1) 
 c (0 to 1) 
/Sequence 
/Array 
 
But I don’t know how to constrain the total number of occurrences.   
 
Appreciate any ideas!--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130307/f8774596/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 5275 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130307/f8774596/attachment-0005.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 5376 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130307/f8774596/attachment-0006.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 9109 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130307/f8774596/attachment-0007.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 8210 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130307/f8774596/attachment-0008.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 8031 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130307/f8774596/attachment-0009.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 13398 bytes
Desc: not available
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130307/f8774596/attachment-0001.png>


More information about the dfdl-wg mailing list