[DFDL-WG] Fw: [DFDL] zero length (was Re: Fw: TDS length reference) ** updated **
Steve Hanson
smh at uk.ibm.com
Wed Jan 13 06:49:05 CST 2010
For discussion on today'call.....
Regards
Steve Hanson
Programming Model Architect, WebSphere Message Brokers,
OGF DFDL WG Co-Chair,
Hursley, UK,
Internet: smh at uk.ibm.com,
Phone (+44)/(0) 1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 13/01/2010 12:46 -----
** Complex element design added. Please review **
------------------------------------------------------------------------------------------------------
Proposal extends the earlier work done in this area & described by spec
section 15.13 and 5.7. To paraphrase those sections:
5.7
The 'default' attribute is used to provide the logical value of a required
element while parsing when the representation is empty (content length is
zero).
15.13
When we get 'empty content' from an element, and the element is optional,
then it is not present and is not added to the infoset.
When we get empty content from an element, and the element is required,
then we start to look at nil handling and default handling properties.
- If the properties are such that the empty string is a nil value then the
infoset value is the special value nil.
- If the properties are such that there is a default value specified then
the infoset value is the default value.
- Otherwise if empty string is valid for the type (ie, is derived from
xs:string) then the infoset value is a zero length string.
So we know what empty content is and how it is applied to simple elements.
We need to define when it is possible to get empty content and what it
means to elements of complex type or of non-string simple type.
Proposal:
1. Parsing
Simple elements
1) It is not a schema definition error nor a processing error if a length
is being used to extract data and it is zero. This covers dfdl:lengthKind
implicit, explicit, prefixed and endOfParent (when parent length is
known). The result is 'empty content'. (Note that for implicit, XSDL
allows maxLength/length facet to be 0, so disallowing it for others is not
consistent).
2) It is not a processing error if scanning for data and the length of the
returned bytes is zero. This applies to dfdl:lengthKind delimited, pattern
and endOfParent (when parent length is not known). The result is 'empty
content'. (This is just stating the obvious).
(The above two rules ensure that it is possible to apply empty content to
trigger optional, nil value or default value processing regardless of data
type and dfdl:lengthKind).
3) Optional, nil and default processing are applied as per spec.
4) If the element is required, and nil value or default value is not used,
and empty string is not in the lexical space of the element's type, then
it is a processing error.
The two initiator related properties dfdl:nilValueInitiatorPolicy and
dfdl:defaultValueInitiatorPolicy define whether nils and defaults are
applied when initiated empty content is found, they don't affect the
definition of empty content or what it means for the type.
[Note: If you recall, this discussion was triggered by a customer that was
using an expression to calculate the length of a standard text decimal. He
wanted 0 length to mean 0 ended up in the infoset. He can achieve this by
making the element required with a default value of 0.]
Complex elements
It is possible to get returned empty content for a complex element for
cases 1) and 2) above.
1) If the complex element is optional then it is not added to the infoset.
2) If the complex element does not have an initiator specified & is
required then it is added to the infoset.
3) If the element has an initiator specified then
dfdl:defaultValueInitiatorPolicy applies
- required => element is added to infoset only if initiator is
present (processing error if no initiator & empty content)
- prohibited => element is added to infoset only if initiator is
not present (initiator implies real content follows so processing error if
initiator & empty content)
4) If the complex element is added to the infoset, then the parser
processes the child content of the complex type. This may or may not cause
a processing error. If it doesn't then default value processing applies
for required child elements. If we don't do this then we will not create
default values for all missing required simple elements, and that would be
wrong.
5) If the contained sequence or choice has an initiator or terminator then
it is a processing error.
2. Unparsing
Simple elements
Data in the infoset can result in empty content being added to the bit
stream (ie, nothing), with an accompanying 0 value in any length prefix or
length expression field, if appropiate to the dfdl:lengthKind.
Complex elements
The absence from the infoset of a required complex element will cause any
specified initiator to be output, plus if there are required children then
default values will be output for those children. If we don't do this then
we will not create default values for nested missing required simple
elements, and that would be wrong. This enables creation of a sparse
infoset containing just the elements with explicit values, with the rest
defaulting regardless of nesting.
3. Choices
Worth noting that the concept of 'required' for the elements of a choice
does not apply. Even if minOccurs > 0.
4. Outstanding Issues
Is it ok to reuse dfdl:defaultValueInitiatorPolicy for complex elements?
Should it be renamed? Should we add a separate property for complex
elements?
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20100113/7173855c/attachment.html
More information about the dfdl-wg
mailing list