[DFDL-WG] dfdl-wg Digest, Vol 43, Issue 2

Steve Hanson smh at uk.ibm.com
Mon Mar 8 06:55:39 CST 2010


Tim 

Thanks for your comments on the choices section. 

The dfdl:choiceKind property is intended to be used when the space 
occupied by the choice is implicitly defined by the children but the space 
occupied must always be that of the longest branch.  The primary use case 
is, as you say, COBOL REDEFINES and C Unions, where a compiler is 
allocating memory for the language 'choice' construct.

The main issues you have highlighted are:

a) The calculation of the length of the longest branch.
b) The length units to use - the dfdl:lengthUnits property does not exist 
on a choice
c) The name could be better

 Let's have a look at a COBOL example.

01 DATA.
  05 HEADER           PIC X(10)
  05 BODY             PIC X(10).
  05 DETAIL REDEFINES BODY.
     10 KEY           PIC X(3).
     10 CONTENT       PIC X(7). 
  05 TRAILER          PIC X(10)
 
What we would like to see for the logical structure, to preserve the COBOL 
naming hierarchy into the DFDL infoset, is:

<xs:element name="DATA">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="HEADER" type="xs:string"/> 
      <xs:choice>
        <xs:element name="BODY" type="xs:string"/> 
        <xs:element name="DETAIL"/> 
          <xs:complexType>
            <xs:sequence>
              <xs:element name="KEY" type="xs:string"/> 
              <xs:element name="CONTENT" type="xs:string"/> 
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:choice>
      <xs:element name="TRAILER" type="xs:string"/> 
    </xs:sequence>
  </xs:complexType>
</xs:element>

You suggested wrapping the xs:choice in an xs:element to carry a length 
computed by COBOL -> DFDL tooling, or by the user manually. The problem is 
that forces the introduction of an extra level and name into the infoset, 
which does not match the COBOL.  Users will not expect that. Further, 
existing IBM COBOL -> XSD tooling creates the above logical structure with 
no wrapping, so any wrapping will not be compatible. For users without 
COBOL -> DFDL tooling, you are forcing them to compute the length 
manually. I don't think your suggestion will work.

When the xs:choice is included directly in a xs:sequence or another 
xs:choice, there is no dfdl:lengthKind and no dfdl:lengthUnits, because we 
no longer have those properties on xs:choice, they are only on xs:element. 
 We can't solve this by wrapping in an element, as just shown, so the 
solution is to decouple dfdl:choiceKind from its parent altogether.
 
You are correct in pointing out that the length calculation is not always 
easy.  That can be alleviated by restricting the cases when 
dfdl:choiceKind='fixedLength' is allowed. Any violation is detected at 
static validation time and a schema definition error results. These can, 
and likely will, be very restrictive as we are supporting a specific use 
case here.

We can debate the name/enums for the property.  For example, 
dfdl:choicePadKind='none'/'longest' or dfdl:choicePadToLongest='yes'/'no' 
conveys the semantic to me. 

My proposal is therefore to retain the property but to:

i) State the conditions that must apply to use this property, and enforce 
them in the validator => schema definition error otherwise
ii) Decouple the choice from its parent by calculating the length of each 
branch based solely on the properties of the branches components, 
irrespective of any parent dfdl:lengthKind
iii) Choose a better name for the property


Regards

Steve Hanson
Programming Model Architect, WebSphere Message Broker,
Co-Chair, OGF DFDL WG
Hursley, UK,
Internet: smh at uk.ibm.com,
Phone (+44)/(0) 1962-815848



From:
Tim Kimber/UK/IBM at IBMGB
To:
dfdl-wg at ogf.org
Date:
04/03/2010 11:31
Subject:
Re: [DFDL-WG] dfdl-wg Digest, Vol 43, Issue 2
Sent by:
dfdl-wg-bounces at ogf.org




I have a bunch of questions/issues relating to dfdl:choiceKind. I'm not 
asking for changes in v0.40, but I expect there will be changes required. 

The issues that I want to raise are: 
a) The description of the property in v0.39 contains several typos and 
inaccuracies. 
- 'implicit' is being used where 'fixedLength' was intended. 
- nothing is said about the units in which the length is calculated. 
- there's no need to discuss how the choice is resolved when discussing 
the 'variableLength' enum 
- we should use standard phraseology when indicating whether a property 
can be computed from a DFDL expression. 

b) Property name should be 'choiceLengthKind' to accurately reflect its 
meaning 

c) There is a need for a related property 'choiceLengthUnits' 
Consider the recursive algorithm for calculating the length of each 
branch. It needs to know whether it is calculating a length in bytes or 
characters. If the length is in bytes, then the length cannot be 
calculated for variable-width encodings. If the length is in characters, 
then the length cannot always be calculated reliably if there are raw byte 
values in the markup. 

d) The rules for calculating the max length of the choice are not 
provided. They are complex, and not at all obvious. Consider these issues: 

The length of a branch cannot be calculated if 
- there are any optional elements or variable-length arrays anywhere in 
the branch 
- any field in the branch has dfdl:alignment > "1" ( at least, I can't 
work out what the rules would be. The alignment of the parent element 
would need to be factored in ) 
- any element or group in the branch specifies its initiator, terminator 
or separator as a DFDL expression 
- any element or group in the branch specifies its length as a DFDL 
expression 

if choiceLengthUnits='characters' then the length cannot be calculated if 
- any element or group in the branch specifies a DFDL string literal 
containing DFDL mnemonics %NL; %WSP*; or %WSP+; 
- any element or group in the branch uses a DFDL string literal that 
contains sequence of raw byte values with length different from the fixed 
character width 

if choiceLengthUnits='characters' then the length cannot be calculated if 
- any element in the branch specifies a variable-width encoding, or 
specifies its encoding as a DFDL expression. 
  
There are probably other rules which need to be applied, but the above 
should illustrate the point. Calculating the length is only possible under 
some *very* restrictive conditions. 

e) I think the property may not be required 
As far as I am aware, this property was introduced to provide support for 
COBOL REDEFINES, and to allow MRM message sets to be migrated to DFDL. If 
true, the problem gets a lot simpler: 
- COBOL does not use initiators/terminators. 
- The COBOL compiler contains code that calculates the length of the 
structure ( it must, because COBOL has a rule that a REDEFINES cannot be 
longer than the record that it is redefining ). 
Presumably, it takes alignment into account in some way, and handles 
issues relating to character width as well. 
- COBOL does not allow an anonymous REDEFINES. If imported, A REDEFINES 
will always produce a complex element whose content is a fixed-length 
choice. 
Note : This means that the same will be true of any MRM message set 
created by message broker's COBOLimporter. 

If those assumptions are correct, then in all cases the same effect could 
be achieved by putting the precalculated length of the REDEFINES onto the 
parent element. I think this merits serious consideration. The cost of 
implementing choiceKind='fixedLength' is quite high because of the 
complexity of the rules, and the fact that groups, as well as complex 
elements, can have a fixed length. But it's not really an implementation 
issue, it's a complexity issue. DFDL should not contain a propery with 
such complex implementation requirements unless there's a strong case for 
it - otherwise potential implementers are going to be put off. 

The existing COBOL importer probably does not set the precalculated length 
of a REDEFINES on the parent element. That would be required if we wanted 
to remove the property - so we would have to discuss that with the group 
that provides the importer technology. 

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 246742





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 





--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/dfdl-wg







Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100308/ceeb22c4/attachment.html 


More information about the dfdl-wg mailing list