[DFDL-WG] clarifications needed?: dfdl:contentLength function and dfdl:valueLength function on empty and literal nil representations, and escaping

Mike Beckerle mbeckerle.dfdl at gmail.com
Thu May 19 12:25:16 EDT 2016


The dfdl:contentLength function is defined in terms of the SimpleContent or
ComplexContent regions of the grammar.

Let's just look at Simple types for a sec.

We do not specify what the dfdl:contentLength is for an element of
SimpleType which has the SimpleLiteralNilElementRep or
SimpleEmptyElementRep.

I suggest the value should be zero for SimpleEmptyElementRep. When parsing,
an empty element by definition has no content. The fact that a default
value might be inserted because of the empty representation should not
change the fact that there was no content. When unparsing,
SimpleEmptyElementRep can occur if an empty string is the value of a
string-valued element, or an empty byte array is the value of a hexBinary
element. The grammar is just stipulating the different treatment of
initiator/terminator for these special cases of empty things. The content
is length zero.

But consider the round-trip scenario. We parse data to the infoset. During
parsing the dfdl:contentLength of an element having SimpleEmptyElementRep
is zero. A default value is inserted. Now we unparse this same infoset. The
default value's representation very well may be SimpleNormalRep, with
non-zero dfdl:contentLength.

I claim this is ok. This is just another case where some data formats don't
round trip unchanged. It does add an implementation headache, which is if
the contentLength is cached on the infoset item, you need separate cache
locations to be used when parsing and when unparsing.

For SimpleLiteralNilElementRep, it should be the length of the
NilLiteralCharacters or NilElementLiteralContent regions. (Note: there's
the word "Content" implying that we think of the nil literal representation
as content. ) This applies to both parsing and unparsing.

For elements of complex type, I think for both ComplexLiteralNilElementRep
and ComplexEmptyElementRep, the dfdl:contentLength should be zero when
parsing. When unparsing, again a complex default may be created (because
default values for interior elements of the complex type might be filled in
as part of the augmented infoset.) and the dfdl:contentLength might not be
zero if these default values have non-zero content length. Again I think
this is ok.

For dfdl:contentLength, we should clarify that the length should also
include the contributions of any escape characters, escape-escape
characters, and escapeBlockStart/End characters. (This is implied, because
such characters are in the "value" regions of simple types, and value
regions are always contained in the content region, but I think the
clarification is still helpful.

Similarly we need to clarify what dfdl:valueLength does.

For SimpleEmptyElementRep the dfdl:valueLength should be zero.
For SimpleLiteralNilElementRep, the dfdl:valueLength should be zero,
because a nilled element has no value.

The corner case of SimpleLiteralNilElementRep for a nillable simple element
of type xs:string - since a literal nil representation and a string value
are ambiguous, should be handled by calling dfdl:contentLength instead of
dfdl:valueLength. So a nillable string element with literal nil
nilValue="nil", should have dfdl:valueLength of zero, but
dfdl:contentLength (in characters) of 3.  Same element but not nilled,
containing the string "nil" as its value, would have dfdl:valueLength of 3
(characters), and dfdl:contentLength of 3 (characters).

For complex type elements, dfdl:valueLength is already defined to be the
same as dfdl:contentLength.

For elements that are not represented (that is, elements that have the
dfdl:inputValueCalc property on them), I believe both dfdl:valueLength and
dfdl:contentLength should cause an SDE, as this has to be an error on the
part of the schema author. (An argument can be made that these should
return zero however. See next paragraph.)

Note however that these functions can be called on elements of complex type
that contain elements that are not represented. Such contained
non-represented elements contribute zero to the content length in all
cases. (Consistency with this is why calling dfdl:valueLength or
dfdl:contentLength directly on a non-represented element might want to
return zero, instead of SDE.)

dfdl:valueLength is already specified to exclude the length of padding
characters that are trimmed/added.
I believe we should explicitly state that it *includes* the length of
escape, escape-escape, and escapeBlockStart/End characters.








Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20160519/cd37cfd9/attachment.html>


More information about the dfdl-wg mailing list