[DFDL-WG] clarifications needed?: dfdl:contentLength function and dfdl:valueLength function on empty and literal nil representations, and escaping
Steve Hanson
smh at uk.ibm.com
Thu May 19 16:59:15 EDT 2016
Mike,
I believe that this is all part of the subject of deferred action 242. It
sounds like we should undefer the action as it is impacting the work on
the Daffodil serializer.
I have the last email exchanges for action 242, from April 2014. I can
re-send them.
Regards
Steve Hanson
IBM Integration Bus, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date: 19/05/2016 17:25
Subject: [DFDL-WG] clarifications needed?: dfdl:contentLength
function and dfdl:valueLength function on empty and literal nil
representations, and escaping
Sent by: "dfdl-wg" <dfdl-wg-bounces at ogf.org>
The dfdl:contentLength function is defined in terms of the SimpleContent
or ComplexContent regions of the grammar.
Let's just look at Simple types for a sec.
We do not specify what the dfdl:contentLength is for an element of
SimpleType which has the SimpleLiteralNilElementRep or
SimpleEmptyElementRep.
I suggest the value should be zero for SimpleEmptyElementRep. When
parsing, an empty element by definition has no content. The fact that a
default value might be inserted because of the empty representation should
not change the fact that there was no content. When unparsing,
SimpleEmptyElementRep can occur if an empty string is the value of a
string-valued element, or an empty byte array is the value of a hexBinary
element. The grammar is just stipulating the different treatment of
initiator/terminator for these special cases of empty things. The content
is length zero.
But consider the round-trip scenario. We parse data to the infoset. During
parsing the dfdl:contentLength of an element having SimpleEmptyElementRep
is zero. A default value is inserted. Now we unparse this same infoset.
The default value's representation very well may be SimpleNormalRep, with
non-zero dfdl:contentLength.
I claim this is ok. This is just another case where some data formats
don't round trip unchanged. It does add an implementation headache, which
is if the contentLength is cached on the infoset item, you need separate
cache locations to be used when parsing and when unparsing.
For SimpleLiteralNilElementRep, it should be the length of the
NilLiteralCharacters or NilElementLiteralContent regions. (Note: there's
the word "Content" implying that we think of the nil literal
representation as content. ) This applies to both parsing and unparsing.
For elements of complex type, I think for both ComplexLiteralNilElementRep
and ComplexEmptyElementRep, the dfdl:contentLength should be zero when
parsing. When unparsing, again a complex default may be created (because
default values for interior elements of the complex type might be filled
in as part of the augmented infoset.) and the dfdl:contentLength might not
be zero if these default values have non-zero content length. Again I
think this is ok.
For dfdl:contentLength, we should clarify that the length should also
include the contributions of any escape characters, escape-escape
characters, and escapeBlockStart/End characters. (This is implied, because
such characters are in the "value" regions of simple types, and value
regions are always contained in the content region, but I think the
clarification is still helpful.
Similarly we need to clarify what dfdl:valueLength does.
For SimpleEmptyElementRep the dfdl:valueLength should be zero.
For SimpleLiteralNilElementRep, the dfdl:valueLength should be zero,
because a nilled element has no value.
The corner case of SimpleLiteralNilElementRep for a nillable simple
element of type xs:string - since a literal nil representation and a
string value are ambiguous, should be handled by calling
dfdl:contentLength instead of dfdl:valueLength. So a nillable string
element with literal nil nilValue="nil", should have dfdl:valueLength of
zero, but dfdl:contentLength (in characters) of 3. Same element but not
nilled, containing the string "nil" as its value, would have
dfdl:valueLength of 3 (characters), and dfdl:contentLength of 3
(characters).
For complex type elements, dfdl:valueLength is already defined to be the
same as dfdl:contentLength.
For elements that are not represented (that is, elements that have the
dfdl:inputValueCalc property on them), I believe both dfdl:valueLength and
dfdl:contentLength should cause an SDE, as this has to be an error on the
part of the schema author. (An argument can be made that these should
return zero however. See next paragraph.)
Note however that these functions can be called on elements of complex
type that contain elements that are not represented. Such contained
non-represented elements contribute zero to the content length in all
cases. (Consistency with this is why calling dfdl:valueLength or
dfdl:contentLength directly on a non-represented element might want to
return zero, instead of SDE.)
dfdl:valueLength is already specified to exclude the length of padding
characters that are trimmed/added.
I believe we should explicitly state that it *includes* the length of
escape, escape-escape, and escapeBlockStart/End characters.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20160519/bba759e6/attachment-0001.html>
More information about the dfdl-wg
mailing list