[DFDL-WG] Fw: * DFDL Errata* Clarification: Limitations on use of endOfParent

Mike Beckerle mbeckerle.dfdl at gmail.com
Mon Jan 23 12:00:42 EST 2012


Only question I have is on "endOfParent" not being allowed on root element.

If you have a DFDL implementation processing message buffers, and the root
element's content ends at the end of the buffer/end-of-data, how do we
express that?

I expected that to be end-of-parent, the notion being that there's an
implicit parent for all content, which has an end which is the true
end-of-data.

...mikeb

On Mon, Jan 23, 2012 at 6:19 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> ----- Forwarded by Steve Hanson/UK/IBM on 23/01/2012 11:16 -----
>
> From:        Steve Hanson/UK/IBM
> To:        Mike Beckerle <mbeckerle.dfdl at gmail.com>, Tim Kimber/UK/IBM
> Date:        18/01/2012 16:26
> Subject:        * DFDL Errata* Clarification:  Limitations on use of
> endOfParent
> ------------------------------
>
>
> As agreed on WG extra call on 18th Jan.
>
> Will be raised as a separate issue on next DFDL WG call.
>
> Constraints on element lengthKind 'endOfParent'...
>
> - element maxOccurs = 1
> - no terminator on element
> - if element is in a sequence
>     - separatorPolicy of  sequence must not be 'postFix'
>     - sequenceKind of sequence must be 'ordered'
>     - no floating elements in the sequence
>     - must be the 'last' in the sequence statically **
> - if element is in a choice it is always 'last' statically **
> - parent element lengthKind must not be 'implicit' or 'delimited'
> - if element is complex then all possible 'last' elements ** must also be
> 'endOfParent'
> - not sensitive to any in-scope markup
> - not allowed on root element
>
> ** Need a concise description of walking the content of a complex element
> and building the list of 'last elements'. Involves factoring out local
> sequences and coping with choices.
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
> From:        Steve Hanson/UK/IBM
> To:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> Cc:        Tim Kimber/UK/IBM at IBMGB
> Date:        08/11/2011 18:22
> Subject:        Re: Coping with Character code U+0000 - and how to
> end-of-parent in an array
> ------------------------------
>
>
> Hi Mike
>
> For the record, I knew we said something about U+0000 - it's in section
> 5...
>
> *        String – In DFDL a string can contain any character codes. None
> are reserved. (Including the character with character code U+0000, which is
> not permitted in XML documents.)*
>
> After discussion with Sandy Gao, this is what we wrote in the DFDL to XDM
> mapping document:
>
> *Note: SimpleElement [datavalue] values may contain characters that are
> illegal in XML, for example, DFDL strings can contain the character code 0
> (zero) within them, but XML does not allow this character code in any XML
> content even if it is represented as a character entity. Nevertheless, a
> DFDL described string is mapped to an XDM string data value.*
>
> and later for the actual mapping to XDM:
>
> *SimpleElement:* If the value of *[datavalue]* is special value *“nil”*,
> then the empty string, otherwise the value of *[datavalue]* converted to
> its canonical lexical representation.
>
>
> On to your examples (and assuming separatorPosition is 'infix')...
>
> You are right about the first (endOfParent) example being odd. This
> example would work fine if the lengthKind was 'delimited'. Remember that
> the 'explicit' length of the parent element creates a box which scopes the
> delimited behaviour.
>
> The second (delimited) works fine.
>
> endOfParent and delimited behave almost identically most of the time. When
> the element is an array, this looks to be one of the differences. I am
> thinking that endOfParent should not be allowed when maxOccurs > 1.
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
>  From: Mike Beckerle <mbeckerle.dfdl at gmail.com> To: Steve
> Hanson/UK/IBM at IBMGB, Tim Kimber/UK/IBM at IBMGB Date: 08/11/2011 17:02
> Subject: Coping with Character code U+0000 - and how to end-of-parent in
> an array
> ------------------------------
>
>
>
> We have this incompatiability with XML infoset around U+0000 aka NUL.
>
> However, one can model data containing character code U+0000 in the
> content as an array of strings with NUL termination. That is, we split the
> string on the NUL characters so as to avoid putting them in the infoset.
>
> So I tried to do this and ran into issues. E.g., if data contains a string
> of length 80, but inside it the character code 0 can appear, then this
> could be modeled as:
>
> <element name="stringsWithNul" dfdl:lengthKind="fixed" dfdl:length="80">
>   <sequence dfdl:separator="%x0000;">
>   <element name="substring" type="string" maxOccurs="80"
> dfdl:lengthKind="endOfParent" />  <!-- ????  -->
>   </sequence>
> </element>
>
> Problem: is that use of endOfParent length kind right? It's the last thing
> in the group, but the same element decl also describes the prior elements.
> If there are no NULs in the string, then endOfParent is exactly what you
> want. There will be only one substring, it will have all 80 characters. But
> if there are NULs in the middle, then you want the earlier array elements
> to be delimited by the sequence's separators, and only the last element to
> be delimited by endOfParent.
>
> This semantics where the parent is providing the constraints on length,
> but sometimes its separator, just for the last thing it's endOfParent, is
> not something we can express I believe.
>
> I was actually even unclear on this one: If the data 'string' has a
> terminator of ! then perhaps:
>
> <element name="stringsWithNul" dfdl:lengthKind="delimited"
> dfdl:terminator="!">
>   <sequence dfdl:separator="%x0000;">
>   <element name="substring" type="string" maxOccurs="unbounded"
> dfdl:lengthKind="delimited"/> <!-- delimited entirely by
> ancestor/enclosing-specified delimiters. -->
>   </sequence>
> </element>
>
> Is the array element delimited? Is that the right length kind for this
> situation?
>
> Thanks for comments
>
> ...mikeb
>
>
> ...mikeb
>
>
>
>
>
>  ------------------------------
>
> *
> *
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>
>
>
>
>
>  ------------------------------
>
> *
> *
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>
>
>
>
>
>  ------------------------------
>
> *
> *
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>
>
>
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>



-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120123/e4da7981/attachment.html>


More information about the dfdl-wg mailing list