[DFDL-WG] FW: US NRL's problem format

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue Jul 26 09:18:52 CDT 2011


 

I think the array example from the NRL guys is not an uncommon pattern, and it really does need a general solution.

 

It’s a variation on OCK parsed. I think your idea of a way to tell the parsed kind that you are on the last one is the right notion.

 

occursCountKind=”parse”      occursCountParseStop=”…expression...”

 

Problem is for unparsing. We need to say how whatever it is we’re testing gets assigned. This could be done by outputValueCalc expressions.

 

Something like this perhaps:

 

<element name=”nrlArray” maxOccurs=”unbounded”

   dfdl:occursCountKind=”parse” dfdl:occursCountParseStop=”{ ./isLastFlag = 0 }”>

  <sequence>

    <element name=”isLastflag” type=”int”  dfdl:outputValueCalc=”{ if (./count() = ./position()) 0 else 1 }“ />

    <element name=”field1” type=”int”/>

    <element name=”field2” type=”int”/>

  </sequence>

</element>

 

 

                                                                                               

From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: Wednesday, July 20, 2011 5:56 AM
To: Mike Beckerle
Subject: RE: US NRL's problem format

 


Hi Mike 

Using a separate scalar last element runs into a UPA violation, because the array element and the scalar last element have the same name.  If the scalar last element had a different name this would work. 

I think we have to be careful with occursStopValue.  The property today only works for arrays of simple elements because we need a logical type, and it does not put the stop value in the infoset.  With NRL's format, not only is the element complex, but the last element is intended to go into the infoset. 

What is needed is an ability to assert 'I am the last item in an array' and for that to stop occursCountKind 'parsed' from looking any further,   

I think we should be able to do this with variables. A variable is defined with a default value of (in this case) 1. The array element carries a discriminator that fails if the variable is 0. The trick is to place the setVariable on the child of the array element which contains the last indicator, so the next time round, the discriminator fails. There is one problem with this - it violates variable rules, as (1) a variable can not be set multiple times, and (2) it can't be set after it has been tested. (Even adding a test attribute to setVariable, whereby it is only set when the test expression evaluates to true, still violates (2) ). 
        <element name="data"> 
                <dfd;defineVariable name="last" type="int" defaultValue="1" /> 
                <complexType> 
                        <sequence> 
                                <element name="array"> 
                                        <dfd;discriminator test="{$last eq 0 /> 
                                        <complexType> 
                                                <sequence> 
                                                        <element name="lastIndicator" type="int"> 
                                                                <dfdl:setVariable ref="last"value="{.}" test="{. eq 0}"/> 
                                                        </element> 
                                                        .... 
                                                </sequence> 
                                        </complexType> 
                                </element> 
                        </sequence> 
                </complexType> 
        </element> 

A more convenient mechanism is to add a new property occursParsedLast which is an expression (able to look downwards) that returns a boolean and which is tested after each item in the array has been parsed. 

An aside ... if lengthUnits is 'bits' we could have lengthKind 'pattern' interpret the regexp as if each bit was a character 0 or 1. Then my pattern idea below would work. Could be useful for bit oriented formats? 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair,  <http://www.ogf.org/dfdl/> OGF DFDL Working Group
IBM SWG, Hursley, UK
 <mailto:smh at uk.ibm.com> smh at uk.ibm.com
tel:+44-1962-815848 


From: 

"Mike Beckerle" <mbeckerle.dfdl at gmail.com> 


To: 

Steve Hanson/UK/IBM at IBMGB 


Date: 

19/07/2011 18:32 


Subject: 

RE: US NRL's problem format

 

  _____  




To clarify: I think we can handle this, but it requires us to model it as an array of the not-last element, followed by a separate scalar last element. Then input value calc could put it all into an array, and you could tease this back apart with output value calc on output. 
  
The problem…. This is too simple a concept to require that much work. (It might make a good example of input value calc/output value calc J 
  
We’re so close with stopValue already… it seems like a small miss. 
  
 ----- Forwarded by Steve Hanson/UK/IBM on 20/07/2011 09:34 ----- 


From: 

"Mike Beckerle" <mbeckerle.dfdl at gmail.com> 


To: 

Steve Hanson/UK/IBM at IBMGB 


Date: 

19/07/2011 18:26 


Subject: 

RE: US NRL's problem format

 

  _____  




I thought of a choice that disambiguates on the flag, but you’ve identified what doesn’t work on that below. 
  
The scheme is a variation on stop value as a concept. 
  
I think we need property dfdl:stopValuePath=”…path expression…” which says where (relative to the array element, i.e., inside it) you find the stop value. This would be inverted for output, so can’t be a complex expression, simply must be a path. 
  
…mikeb 
From: Steve Hanson [ <mailto:smh at uk.ibm.com> mailto:smh at uk.ibm.com] 
Sent: Tuesday, July 19, 2011 1:06 PM
To: mbeckerle.dfdl at gmail.com
Subject: US NRL's problem format 
  

Hi Mike 

For a format like... 

......................1xxxxxxxxxxxxxxxxxxxxxxx1xxxxxxxxxxxxxxxxxxxx0xxxxxxxxxxxxxxxxxxx................ 

where the xxxx are all the same length n, you could use lengthKind="pattern" with a regexp that looked for any number of n+1 length bytes starting with a 1, followed by exactly one n+1 length bytes starting with a 0. 

You could place an assert on the 1/0 element that said it could either be 0 or 1, but that only works if the next things can't be 0 or 1 in that position. 

However, I have a feeling that the NRL's data was at the bit-level which would not allow either of these anyway. 

What was your model? 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair,  <http://www.ogf.org/dfdl/> OGF DFDL Working Group
IBM SWG, Hursley, UK
 <mailto:smh at uk.ibm.com> smh at uk.ibm.com
 <tel:+44-1962-815848> tel:+44-1962-815848

 

  _____  


  

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

 





  _____  

 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 






-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20110726/18eb284d/attachment-0001.html 


More information about the dfdl-wg mailing list