[DFDL-WG] Fw: DFDL String Literal type

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Jul 17 15:26:27 EDT 2013


Well, it's looking to me like xml/xsd just doesn't have the right
pre-defined whitespace-handling concepts that DFDL needs for DFDL String
Literal nor for DFDL Expression. The whitespace-separated list of DFDL
String literals works, but this is almost by accident.

If xml/xsd aren't going to have the right thing for us, I think we should
state our own rules, and we should avoid deriving from the behavor of
xs:token  because it collapses even quoted whitespace inside expressions,
which is very undesirable.

To me given this value "      { ../foo          eq          '  .   '  }
" the whitespace everywhere except between the single quotes is
insignificant and can be collapsed, but collapsing shouldn't mess with a
schema author's quoted strings.

Yes we have dfdl:decodeDFDLEntities('%SP;%SP;.%SP;%SP;") which could be
plugged in instead. But I think this is a hack.

So to me, from the XSD schema of DFDL annotations point of view, DFDL
expression is a whitespace-preserving string, and DFDL String Literal is as
well. The DFDL implementation must then provide the behavior for removal of
insignificant whitespace.

For DFDL Expressions, all whitespace is insignificant except that between
quotation marks which is significant.

For DFDL String Literals, no whitespace is allowed, and DFDL Character
Entities must be used.





On Wed, Jul 17, 2013 at 11:06 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> We discussed the correct XML schema type for DFDL String Literal on the
> last WG call.  I read up on xs:NMTOKEN  - not appropriate as it is
> basically a name so does not allow the full range of characters we need.
> Then I looked at restricting xs:token, but I could not work out from the
> XML Schema 1.0 spec how whitespace facets were handled when other facets
> were present.  So I asked Sandy, and got the very useful clarification
> below. Please review for next call.
>
> Regards
>
> Steve Hanson
> Architect, IBM Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
> ----- Forwarded by Steve Hanson/UK/IBM on 17/07/2013 16:01 -----
>
> From:        Sandy Gao/Toronto/IBM at IBMCA
> To:        Steve Hanson/UK/IBM at IBMGB,
> Date:        17/07/2013 13:33
> Subject:        Re: DFDL String Literal type
> ------------------------------
>
>
> Hi Steve,
>
> Yes, that should work. All other facet checking, including pattern,
> happens *after* whitespace handling.
>
> This was made clearer in Schema 1.1, where "whitespace" is called a
> pre-lexical facet, and "pattern" etc. are called lexical facets.
>
> Thanks,
> Sandy Gao
> Source Code Monitoring (SCMon)
> IBM Canada*
> **sandygao at ca.ibm.com* <sandygao at ca.ibm.com>
>
>
>
> From:        Steve Hanson/UK/IBM at IBMGB
> To:        Sandy Gao/Toronto/IBM at IBMCA,
> Date:        2013-07-17 06:07 AM
> Subject:        DFDL String Literal type
> ------------------------------
>
>
> Hi Sandy
>
> Please can I ask your advice on use of the whitespace facet in conjunction
> with the pattern facet?  This is in order to model the correct data type
> for a DFDL String Literals. This is defined as:
>
>
> *DFDL String Literal*
> *DFDL String Literals represent a sequence of literal bytes or characters
> which appear in the data stream. This presents the following challenges*
>
> *-        the literal characters in the data stream might not be in the
> same encoding as the DFDL schema*
>
> *-        it may be necessary to specify a literal character which is not
> valid in an XML document*
>
> *-        it may be necessary to specify one or more raw byte values*
>
> *A DFDL string literal can describe any of the following types of literal
> data in any combination:*
>
> *-        a single literal character in any encoding*
>
> *-        a string of literal characters in any encoding*
>
> *-        a bi-directional character string*
>
> *-        one or more characters from a set of related characters ( e.g.
> end-of-line characters)*
>
> *-        a literal byte value *
>
> *A DFDL string literal is therefore able to describe any arbitrary
> sequence of bytes and characters.*
>
> *Empty Strings:** Empty string is not allowed as a DFDL string literal
> value unless explicitly stated otherwise in the description of a property.
> In this case the use of empty string provides some property specific
> behavior different from simply using the empty string as a value. When the
> empty string is to be used as a value, the entity %ES; must be used in the
> corresponding DFDL string literal.*
>
> *Whitespace: **When whitespace must be used as part of a property value,
> the DFDL string literal must use entities (such as %WSP;) to represent the
> whitespace. (This allows a property to represent lists of DFDL string
> literals by using literal spaces to separate list elements.)*
>
> The nearest match to an XSDL built-in type is xs:token, but we require the
> additional constraint that no whitespace can appear.  My thought is to
> define a restriction of xs:token that applies a pattern facet to disallow
> use of #x20, given that the whitespace 'collapse' implied by xs:token would
> have replaced #x9, #xA, #xD with #x20, collapsed contiguous #x20, and
> trimmed leading/trailing #x20.  Does that sound right?
>
> Regards
>
> Steve Hanson
> Architect, IBM Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>



-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130717/0e682758/attachment.html>


More information about the dfdl-wg mailing list