[DFDL-WG] Action 183 - chicken-and-egg situation with lengths given by expressions

Tim Kimber KIMBERT at uk.ibm.com
Tue Sep 11 07:35:19 EDT 2012


Good point. The problem is that lengthKind-'explicit' is being used for 
two things:
a) a length that is static
b) a length that is calculated
...so the DFDL serializer must assume that the expression needs to be 
evaluated.

For occursCountKind we have separate values for 'fixed' and 'expression'. 
If we did not, then occursCountKind would have the same problem except 
that it would affect defaulting rather than padding. 

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742




From:   Steve Hanson/UK/IBM at IBMGB
To:     dfdl-wg at ogf.org, 
Date:   11/09/2012 12:27
Subject:        [DFDL-WG] Action 183 - chicken-and-egg situation with 
lengths given   by expressions
Sent by:        dfdl-wg-bounces at ogf.org



This mail is on the expected behaviour of the DFDL unparser when writing 
out a 'data' element the length of which is held in an earlier 'len' 
element. 

There are several scenarios, some straightforward and some that exhibit a 
chicken-and-egg behaviour.  The principle of what happens is understood, 
the action is to make sure that the behaviour is explained in enough 
detail in the spec to enable implementations to be consistent. (Note - IBM 
DFDL does not yet support outputVaueCalc so has not hit this yet). 

Scenarios follow. The 'data' element shown is simple, but the same 
principles apply if it is complex. 

1) 'len' is set from infoset 

- 'len' can be set in augmented infoset 
- No issue as 'data's length expression may be evaluated 

  <xsd:element name="message1"> 
    <xsd:complexType> 
      <xsd:sequence> 
        <xsd:element name="len" type="xsd:int" 
                     dfdl:lengthKind="explicit" dfdl:length="2" /> 
        <xsd:element name="data" type="xsd:string" 
                     dfdl:length="{/message1/len}" 
dfdl:lengthKind="explicit" /> 
      </xsd:sequence> 
    </xsd:complexType> 
  </xsd:element>

2) 'len' is set using outputValueCalc with fixed expression 

- When 'len's outputValueCalc is encountered, it can be evaluated then and 
there 
- 'len' can be set in augmented infoset 
- No issue as 'data's length expression may be evaluated 

  <xsd:element name="message1"> 
    <xsd:complexType> 
      <xsd:sequence> 
        <xsd:element name="len" type="xsd:int" 
                     dfdl:outputValueCalc="{10}" 
dfdl:lengthKind="explicit" dfdl:length="2" /> 
        <xsd:element name="data" type="xsd:string" 
                     dfdl:length="{/message1/len}" 
dfdl:lengthKind="explicit" /> 
      </xsd:sequence> 
    </xsd:complexType> 
  </xsd:element>

3) 'len' is set using outputValueCalc with reference 'data' (unpadded) 

- When 'len's outputValueCalc is encountered, it can not yet be evaluated 
as it depends on the length of 'data' 
- 'len' can not yet be set in augmented infoset 
- Problem as 'data's length expression can not be evaluated 
- But we do know the unpadded length of 'data' so 'len's outputValueCalc 
can now be evaluated 
- In turn this means that  'data's length expression can now be evaluated 

  <xsd:element name="message1"> 
    <xsd:complexType> 
      <xsd:sequence> 
        <xsd:element name="len" type="xsd:int" 
 dfdl:outputValueCalc="{dfdl:unpaddedLength(/message1/data)}" 
dfdl:lengthKind="explicit" dfdl:length="2" /> 
        <xsd:element name="data" type="xsd:string" 
                     dfdl:length="{/message1/len}" 
dfdl:lengthKind="explicit" /> 
      </xsd:sequence> 
    </xsd:complexType> 
  </xsd:element>

4) 'len' is set using outputValueCalc with reference 'data' (padded) 

- When 'len's outputValueCalc is encountered, it can not yet be evaluated 
as it depends on the length of 'data' 
- 'len' can not yet be set in augmented infoset 
- Problem as 'data's length expression can not be evaluated 
- We don't know the padded length of 'data' because we don't know 'len' 
- Problem: 'data's length expression can never be evaluated 

  <xsd:element name="message1"> 
    <xsd:complexType> 
      <xsd:sequence> 
        <xsd:element name="len" type="xsd:int" 
 dfdl:outputValueCalc="{dfdl:representationLength(/message1/data)}" 
dfdl:lengthKind="explicit" dfdl:length="2" /> 
        <xsd:element name="data" type="xsd:string" 
                     dfdl:length="{/message1/len}" 
dfdl:lengthKind="explicit" /> 
      </xsd:sequence> 
    </xsd:complexType> 
  </xsd:element>



Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        Steve Hanson/UK/IBM 
To:        dfdl-wg at ogf.org 
Date:        04/09/2012 17:31 
Subject:        Fw: Behaviour for lengthKind 'endOfparent' is still not 
fully specified 


DFDL WG call 4th Sept 2012: 

1) Agreed that for binary data, only xs:hexBinary and packed/BCD allowed 
to have endOfParent 

2) Agreed this is the correct behaviour when filling to a known length 

3) Agreed this is the correct behaviour when filling to a known length 

4) Agreed this is the correct behaviour when filling to a known length 

It was noted that lengthKind 'explicit' on the parent may not result in a 
known length if the length is an expression. This is an example of a more 
general chicken-and-egg situation with lengths given by expressions, for 
which outputValueCalc and DFDL functions unpaddedLength() were added can 
be used. Action raised to ensure that the behaviour of an implementation 
is fully defined by the spec. 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 04/09/2012 17:24 ----- 

From:        Steve Hanson/UK/IBM 
To:        dfdl-wg at ogf.org <dfdl-wg at ogf.org> 
Date:        04/09/2012 14:08 
Subject:        Behaviour for lengthKind 'endOfparent' is still not fully 
specified 


Noted when I reviewed latest spec - endOfParent and unparsing is not fully 
thought through. 

The spec today says that I can use endOfParent with binary data. There is 
a restriction in section 12.3.8, but it only applies when an element is 
endOfParent and its parent is lengthKind delimited. 

There are a couple of cases to consider: 

1) Binary data of restricted length (see list in other email "proposed 
clarification/narrowing - delimited binary data should decimal"). I don't 
think it makes sense to allow these. We don't allow these binary reps for 
delimited. 

2) Text data of variable length when unparsing. Box scenario. If the data 
in the infoset is shorter than the space in the box, what we do?  I think 
we should pad to box length with appropriate padChar, according to 
justification, as that is effectively a 'specified length'. Error if 
textPadKind is 'none'. Use parent's lengthUnits. 

3) HexBinary data of variable length when unparsing. Box scenario. If the 
data in the infoset is shorter than the space in the box, what we do?  I 
think we should right-pad to box length with fill byte, as that is 
effectively a 'specified length'. 

4) Packed/BCD binary data of variable length when unparsing. Box scenario. 
If the data in the infoset is shorter than the space in the box, what we 
do?  I think we should pad to box length with zero bytes, according to 
justification, as that is effectively a 'specified length'.  (Must be zero 
bytes and not fill byte as must be numeric in order to be parsed). 

In relation to 2 - 4, note that lengthKind 'endOfParent' can only be used 
with a parent lengthKind of 'explicit', 'pattern', 'prefixed' or 
'endOfParent' or a choice with choiceLengthKind 'explicit', so the box 
scenario when unparsing therefore occurs only when lengthKind is 
'explicit' or choiceLengthKind is 'explicit' - these are the cases when 
the length is known.  Also note that when there are nested 'endOfParent' 
elements (which is allowed) then all padding must be done on the simple 
element (ie, the innermost element), to ensure that what is output can be 
parsed. 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120911/52cb0ff4/attachment-0001.html>


More information about the dfdl-wg mailing list