[DFDL-WG] clarifications needed?: dfdl:contentLength function and dfdl:valueLength function on empty and literal nil representations, and escaping

Steve Hanson smh at uk.ibm.com
Thu May 19 16:59:15 EDT 2016


Mike, 

I believe that this is all part of the subject of deferred action 242.  It 
sounds like we should undefer the action as it is impacting the work on 
the Daffodil serializer.

I have the last email exchanges for action 242, from April 2014. I can 
re-send them.

Regards
 
Steve Hanson
IBM Integration Bus, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date:   19/05/2016 17:25
Subject:        [DFDL-WG] clarifications needed?: dfdl:contentLength 
function and dfdl:valueLength function on empty and literal nil 
representations, and escaping
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>




The dfdl:contentLength function is defined in terms of the SimpleContent 
or ComplexContent regions of the grammar.

Let's just look at Simple types for a sec.

We do not specify what the dfdl:contentLength is for an element of 
SimpleType which has the SimpleLiteralNilElementRep or 
SimpleEmptyElementRep. 

I suggest the value should be zero for SimpleEmptyElementRep. When 
parsing, an empty element by definition has no content. The fact that a 
default value might be inserted because of the empty representation should 
not change the fact that there was no content. When unparsing, 
SimpleEmptyElementRep can occur if an empty string is the value of a 
string-valued element, or an empty byte array is the value of a hexBinary 
element. The grammar is just stipulating the different treatment of 
initiator/terminator for these special cases of empty things. The content 
is length zero. 

But consider the round-trip scenario. We parse data to the infoset. During 
parsing the dfdl:contentLength of an element having SimpleEmptyElementRep 
is zero. A default value is inserted. Now we unparse this same infoset. 
The default value's representation very well may be SimpleNormalRep, with 
non-zero dfdl:contentLength. 

I claim this is ok. This is just another case where some data formats 
don't round trip unchanged. It does add an implementation headache, which 
is if the contentLength is cached on the infoset item, you need separate 
cache locations to be used when parsing and when unparsing. 

For SimpleLiteralNilElementRep, it should be the length of the 
NilLiteralCharacters or NilElementLiteralContent regions. (Note: there's 
the word "Content" implying that we think of the nil literal 
representation as content. ) This applies to both parsing and unparsing. 

For elements of complex type, I think for both ComplexLiteralNilElementRep 
and ComplexEmptyElementRep, the dfdl:contentLength should be zero when 
parsing. When unparsing, again a complex default may be created (because 
default values for interior elements of the complex type might be filled 
in as part of the augmented infoset.) and the dfdl:contentLength might not 
be zero if these default values have non-zero content length. Again I 
think this is ok. 

For dfdl:contentLength, we should clarify that the length should also 
include the contributions of any escape characters, escape-escape 
characters, and escapeBlockStart/End characters. (This is implied, because 
such characters are in the "value" regions of simple types, and value 
regions are always contained in the content region, but I think the 
clarification is still helpful. 

Similarly we need to clarify what dfdl:valueLength does.

For SimpleEmptyElementRep the dfdl:valueLength should be zero.
For SimpleLiteralNilElementRep, the dfdl:valueLength should be zero, 
because a nilled element has no value. 

The corner case of SimpleLiteralNilElementRep for a nillable simple 
element of type xs:string - since a literal nil representation and a 
string value are ambiguous, should be handled by calling 
dfdl:contentLength instead of dfdl:valueLength. So a nillable string 
element with literal nil nilValue="nil", should have dfdl:valueLength of 
zero, but dfdl:contentLength (in characters) of 3.  Same element but not 
nilled, containing the string "nil" as its value, would have 
dfdl:valueLength of 3 (characters), and dfdl:contentLength of 3 
(characters).

For complex type elements, dfdl:valueLength is already defined to be the 
same as dfdl:contentLength.

For elements that are not represented (that is, elements that have the 
dfdl:inputValueCalc property on them), I believe both dfdl:valueLength and 
dfdl:contentLength should cause an SDE, as this has to be an error on the 
part of the schema author. (An argument can be made that these should 
return zero however. See next paragraph.)

Note however that these functions can be called on elements of complex 
type that contain elements that are not represented. Such contained 
non-represented elements contribute zero to the content length in all 
cases. (Consistency with this is why calling dfdl:valueLength or 
dfdl:contentLength directly on a non-represented element might want to 
return zero, instead of SDE.)

dfdl:valueLength is already specified to exclude the length of padding 
characters that are trimmed/added.
I believe we should explicitly state that it *includes* the length of 
escape, escape-escape, and escapeBlockStart/End characters. 








Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20160519/bba759e6/attachment-0001.html>


More information about the dfdl-wg mailing list