[DFDL-WG] Fw: [DFDL] zero length (was Re: Fw: TDS length reference) ** updated **

Steve Hanson smh at uk.ibm.com
Wed Jan 13 06:49:05 CST 2010


For discussion on today'call.....

Regards

Steve Hanson
Programming Model Architect, WebSphere Message  Brokers,
OGF DFDL WG Co-Chair,
Hursley, UK,
Internet: smh at uk.ibm.com,
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 13/01/2010 12:46 -----



** Complex element design added. Please review **

------------------------------------------------------------------------------------------------------

Proposal extends the earlier work done in this area & described by spec 
section 15.13 and 5.7.  To paraphrase those sections:

5.7
The 'default' attribute is used to provide the logical value of a required 
element while parsing when the representation is empty (content length is 
zero).
15.13
When we get 'empty content' from an element, and the element is optional, 
then it is not present and is not added to the infoset.

When we get empty content from an element, and the element is required, 
then we start to look at nil handling and default handling properties.
- If the properties are such that the empty string is a nil value then the 
infoset value is the special value nil. 
- If the properties are such that there is a default value specified then 
the infoset value is the default value. 
- Otherwise if empty string is valid for the type (ie, is derived from 
xs:string) then the infoset value is a zero length string.

So we know what empty content is and how it is applied to simple elements. 
We need to define when it is possible to get empty content and what it 
means to elements of complex type or of non-string simple type. 

Proposal: 

1. Parsing

Simple elements

1) It is not a schema definition error nor a processing error if a length 
is being used to extract data and it is zero. This covers dfdl:lengthKind 
implicit, explicit, prefixed and endOfParent (when parent length is 
known). The result is 'empty content'. (Note that for implicit, XSDL 
allows maxLength/length facet to be 0, so disallowing it for others is not 
consistent). 

2) It is not a processing error if scanning for data and the length of the 
returned bytes is zero. This applies to dfdl:lengthKind delimited, pattern 
and  endOfParent (when parent length is not known). The result is 'empty 
content'. (This is just stating the obvious).

(The above two rules ensure that it is possible to apply empty content to 
trigger optional, nil value or default value processing regardless of data 
type and dfdl:lengthKind). 

3) Optional, nil and default processing are applied as per spec.

4) If the element is required, and nil value or default value is not used, 
and empty string is not in the lexical space of the element's type, then 
it is a processing error. 

The two initiator related properties dfdl:nilValueInitiatorPolicy and 
dfdl:defaultValueInitiatorPolicy define whether nils and defaults are 
applied when initiated empty content is found, they don't affect the 
definition of empty content or what it means for the type.

[Note: If you recall, this discussion was triggered by a customer that was 
using an expression to calculate the length of a standard text decimal. He 
wanted 0 length to mean 0 ended up in the infoset. He can achieve this by 
making the element required with a default value of 0.]

Complex elements

It is possible to get returned empty content for a complex element for 
cases 1) and 2) above. 

1) If the complex element is optional then it is not added to the infoset. 
 

2) If the complex element does not have an initiator specified & is 
required then it is added to the infoset.

3) If the element has an initiator specified then 
dfdl:defaultValueInitiatorPolicy applies
        - required => element is added to infoset only if initiator is 
present (processing error if no initiator & empty content)
        - prohibited => element is added to infoset only if initiator is 
not present (initiator implies real content follows so processing error if 
initiator & empty content)

4) If the complex element is added to the infoset, then the parser 
processes the child content of the complex type. This may or may not cause 
a processing error.  If it doesn't then default value processing applies 
for required child elements. If we don't do this then we will not create 
default values for all missing required simple elements, and that would be 
wrong.

5) If the contained sequence or choice has an initiator or terminator then 
it is a processing error.


2. Unparsing

Simple elements

Data in the infoset can result in empty content being added to the bit 
stream (ie, nothing), with an accompanying 0 value in any length prefix or 
length expression field, if appropiate to the dfdl:lengthKind.

Complex elements

The absence from the infoset of a required complex element will cause any 
specified initiator to be output, plus if there are required children then 
default values will be output for those children. If we don't do this then 
we will not create default values for nested missing required simple 
elements, and that would be wrong. This enables creation of a sparse 
infoset containing just the elements with explicit values, with the rest 
defaulting regardless of nesting. 


3. Choices

Worth noting that the concept of 'required' for the elements of a choice 
does not apply. Even if minOccurs > 0.


4. Outstanding Issues

Is it ok to reuse dfdl:defaultValueInitiatorPolicy for complex elements? 
Should it be renamed? Should we add a separate property for complex 
elements?









Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100113/7173855c/attachment.html 


More information about the dfdl-wg mailing list