[DFDL-WG] Fw: * DFDL Errata* Clarification: Limitations on use of endOfParent

Steve Hanson smh at uk.ibm.com
Mon Jan 23 06:19:46 EST 2012


----- Forwarded by Steve Hanson/UK/IBM on 23/01/2012 11:16 -----

From:   Steve Hanson/UK/IBM
To:     Mike Beckerle <mbeckerle.dfdl at gmail.com>, Tim Kimber/UK/IBM
Date:   18/01/2012 16:26
Subject:        * DFDL Errata* Clarification:  Limitations on use of 
endOfParent


As agreed on WG extra call on 18th Jan.

Will be raised as a separate issue on next DFDL WG call.

Constraints on element lengthKind 'endOfParent'...

- element maxOccurs = 1
- no terminator on element 
- if element is in a sequence
    - separatorPolicy of  sequence must not be 'postFix'
    - sequenceKind of sequence must be 'ordered'
    - no floating elements in the sequence
    - must be the 'last' in the sequence statically **
- if element is in a choice it is always 'last' statically **
- parent element lengthKind must not be 'implicit' or 'delimited'
- if element is complex then all possible 'last' elements ** must also be 
'endOfParent' 
- not sensitive to any in-scope markup 
- not allowed on root element

** Need a concise description of walking the content of a complex element 
and building the list of 'last elements'. Involves factoring out local 
sequences and coping with choices. 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Steve Hanson/UK/IBM
To:     Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc:     Tim Kimber/UK/IBM at IBMGB
Date:   08/11/2011 18:22
Subject:        Re: Coping with Character code U+0000 - and how to 
end-of-parent in an array


Hi Mike

For the record, I knew we said something about U+0000 - it's in section 
5...

        String – In DFDL a string can contain any character codes. None 
are reserved. (Including the character with character code U+0000, which 
is not permitted in XML documents.)

After discussion with Sandy Gao, this is what we wrote in the DFDL to XDM 
mapping document:
Note: SimpleElement [datavalue] values may contain characters that are 
illegal in XML, for example, DFDL strings can contain the character code 0 
(zero) within them, but XML does not allow this character code in any XML 
content even if it is represented as a character entity. Nevertheless, a 
DFDL described string is mapped to an XDM string data value.
and later for the actual mapping to XDM:
SimpleElement: If the value of [datavalue] is special value “nil”, then 
the empty string, otherwise the value of [datavalue] converted to its 
canonical lexical representation. 

On to your examples (and assuming separatorPosition is 'infix')...

You are right about the first (endOfParent) example being odd. This 
example would work fine if the lengthKind was 'delimited'. Remember that 
the 'explicit' length of the parent element creates a box which scopes the 
delimited behaviour. 

The second (delimited) works fine. 

endOfParent and delimited behave almost identically most of the time. When 
the element is an array, this looks to be one of the differences. I am 
thinking that endOfParent should not be allowed when maxOccurs > 1.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848




From:
Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:
Steve Hanson/UK/IBM at IBMGB, Tim Kimber/UK/IBM at IBMGB
Date:
08/11/2011 17:02
Subject:
Coping with Character code U+0000 - and how to end-of-parent in an array



We have this incompatiability with XML infoset around U+0000 aka NUL.

However, one can model data containing character code U+0000 in the 
content as an array of strings with NUL termination. That is, we split the 
string on the NUL characters so as to avoid putting them in the infoset.

So I tried to do this and ran into issues. E.g., if data contains a string 
of length 80, but inside it the character code 0 can appear, then this 
could be modeled as:

<element name="stringsWithNul" dfdl:lengthKind="fixed" dfdl:length="80">
  <sequence dfdl:separator="%x0000;">
  <element name="substring" type="string" maxOccurs="80" 
dfdl:lengthKind="endOfParent" />  <!-- ????  -->
  </sequence>
</element>

Problem: is that use of endOfParent length kind right? It's the last thing 
in the group, but the same element decl also describes the prior elements. 
If there are no NULs in the string, then endOfParent is exactly what you 
want. There will be only one substring, it will have all 80 characters. 
But if there are NULs in the middle, then you want the earlier array 
elements to be delimited by the sequence's separators, and only the last 
element to be delimited by endOfParent. 

This semantics where the parent is providing the constraints on length, 
but sometimes its separator, just for the last thing it's endOfParent, is 
not something we can express I believe.

I was actually even unclear on this one: If the data 'string' has a 
terminator of ! then perhaps:

<element name="stringsWithNul" dfdl:lengthKind="delimited" 
dfdl:terminator="!">
  <sequence dfdl:separator="%x0000;">
  <element name="substring" type="string" maxOccurs="unbounded" 
dfdl:lengthKind="delimited"/> <!-- delimited entirely by 
ancestor/enclosing-specified delimiters. -->
  </sequence>
</element>

Is the array element delimited? Is that the right length kind for this 
situation?

Thanks for comments

...mikeb


...mikeb








Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU











Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU











Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120123/df7b635e/attachment.html>


More information about the dfdl-wg mailing list