[DFDL-WG] outputValueCalc and unparse example

Tue Jun 9 05:42:16 CDT 2009

Mike

I'd like to state what we said were the use cases yesterday. There were 
three. .

Use case 1
Element "val" is fixed length, length known at design time and provided by 
dfdl:length="x" on input and output.
On output, the infoset data for "val" is padded to the length.

Use case 2
Element "val" is fixed length, length known at runtime and provided by 
dfdl:length="{..\len}" on input and output. 
On output the infoset provides the value for element "len".
On output, the infoset data for "val" is padded to the length according to 
the rules for dfdl:lengthKind='explicit'/'implicit'.

Use case 3
Element "val" is really variable length, length only known once the data 
is serialised, and provided by dfdl:length="..\len" on input. 
On output the value of element "len" is set only once the length of "val" 
is known. 
On output, the infoset data for "val" is not padded to the length.

You've added a variation to use case 3 in your example, where there is a 
need to add some padding. Let's call it use case 4.

Alan and I have explored an alternative, where dfdl:length is always used 
for all use cases. The difference for use case 3 & 4 is that the value of 
element "len" is only set during the processing of "val". Instead of using 
a flag, with accompanying output length property, to signal case 3 & 4, we 
use an extra parameter on dfdl:length() that says whether to use padding 
or not when dfdl:lengthKind="explicit"/"implicit". Note that any escape 
scheme must and will be taken into account (to answer your question). 

For use case 3 when no padding is needed you example simplifies to the 
following. When "len" is encountered, there is an outputValueCalc that 
references "val" so the unparser defers the setting of the value of "len". 
When it gets to "val", it knows it must work out its unpadded length, and 
set that in "len", before doing any length related processing for "val".

<sequence>
  <element name="len" type="int" 
    dfdl:outputValueCalc=
      "{
           dfdl:length(../val, false)   !-- false => no pad
       }" />

  ... many elements in between ....

  <element name="val" type="string" 
     dfdl:encoding="utf-8"
     dfdl:lengthKind="explicit" 
     dfdl:lengthUnits="bytes" 
     dfdl:length="{ ../len }"
     dfdl:textTrimKind="padChar"
     dfdl:textStringJustification="left"
     dfdl:textPadCharacter="%#r0;" 
    />
</sequence>

For use case 4 when some padding might be needed you example simplifies to 
the following.  When the unparser starts to process "val", it works out 
the unpadded length, uses it in the expression and generates the value for 
"len". When it does the length processing for "val" it pads to the value 
of "len". 

<sequence>
  <element name="len" type="int" 
    dfdl:outputValueCalc=
      "{
           fn:ceiling(dfdl:length(../val, false) div 4) * 4 
       }" />

  ... many elements in between ....

  <element name="val" type="string" 
     dfdl:encoding="utf-8"
     dfdl:lengthKind="explicit" 
     dfdl:lengthUnits="bytes" 
     dfdl:length="{ ../len }"
     dfdl:textTrimKind="padChar"
     dfdl:textStringJustification="left"
     dfdl:textPadCharacter="%#r0;" 
    />
</sequence>

A variation on use case 4 is when we need to pad to a minimum length. 

<sequence>
  <element name="len" type="int" 
    dfdl:outputValueCalc=
      "{
           fn:min(dfdl:length(../val, false), 20) 
       }" />

  ... many elements in between ....

  <element name="val" type="string" 
     dfdl:encoding="utf-8"
     dfdl:lengthKind="explicit" 
     dfdl:lengthUnits="bytes" 
     dfdl:length="{ ../len }"
     dfdl:textTrimKind="padChar"
     dfdl:textStringJustification="left"
     dfdl:textPadCharacter="%#r0;" 
    />
</sequence>

You might be tempted to ask why the minimum is explicitly added. It's 
because, as currently spec'd, xs:minLength facet (and dfdl:outputMinLength 
for non-strings) are not used when dfdl:lengthKind="explicit".  We could 
change this but it does make the padding rules more complicated.  We opted 
for leaving the padding rules simpler.

Yesterday we also dsicussed whether implict/explicit needed to change. 
With the above scheme we think a change is not necessary.

Regards

Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848

"Mike Beckerle" <mbeckerle.dfdl at gmail.com> 
Sent by: dfdl-wg-bounces at ogf.org
09/06/2009 04:30
Please respond to
mbeckerle.dfdl at gmail.com

To
<dfdl-wg at ogf.org>
cc

Subject
[DFDL-WG] outputValueCalc and unparse example

I did not get as far as I wanted to on this issue. I would like to discuss 
this example:

<sequence>
  <element name="len" type="int" 
 dfdl:fillByte="%#r0;"
    dfdl:outputValueCalc=
      "{
           dfdl:representation-output-length(../val) 
       }" />

  ... many elements in between ....

  <element name="val" type="string" 
     dfdl:encoding="utf-8"
     dfdl:lengthKind="explicit" 
     dfdl:lengthUnits="bytes" 
    dfdl:useLengthForOutput="false"
     dfdl:length="{ ../len }"
    dfdl:outputLength="{ 
         fix:ceiling( 
           dfdl:representation-inherent-length(.) div 4 
           ) * 4 
    }"
 dfdl:textTrimKind="padChar"
 dfdl:textStringJustification="left"
    dfdl:textPadCharacter="%#r0;" 
    />
</sequence>

You will notice I added a dfdl:outputLength property, and a 
dfdl:representation-output-length() function and 
dfdl:representation-inherent-length().

I am accepting candidates for better names for these properties and 
functions. We need to distinguish these 3 concepts:

1) inherent length – of the infoset item without reference to any facets, 
and with out respect to escape sequences, padding or truncation. 

(TBD: think about escape sequences? Is this right)

2) output target length – the length of the box we’re filling in with the 
data value representation. The box can be bigger or smaller than the 
inherent length, which implies use of padding/filling, or truncation.

3) input length – length of the box we’re getting when parsing. The 
inherent length of the value after parsing can be smaller than the length 
of the box due to removal of escape characters, and the trimming of 
padding.

 --
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20090609/ad1da997/attachment-0001.html