[DFDL-WG] Fw: DFDL String Literal type

Steve Hanson smh at uk.ibm.com
Wed Jul 17 11:06:27 EDT 2013


We discussed the correct XML schema type for DFDL String Literal on the 
last WG call.  I read up on xs:NMTOKEN  - not appropriate as it is 
basically a name so does not allow the full range of characters we need. 
Then I looked at restricting xs:token, but I could not work out from the 
XML Schema 1.0 spec how whitespace facets were handled when other facets 
were present.  So I asked Sandy, and got the very useful clarification 
below. Please review for next call.
 
Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 17/07/2013 16:01 -----

From:   Sandy Gao/Toronto/IBM at IBMCA
To:     Steve Hanson/UK/IBM at IBMGB, 
Date:   17/07/2013 13:33
Subject:        Re: DFDL String Literal type


Hi Steve,

Yes, that should work. All other facet checking, including pattern, 
happens *after* whitespace handling.

This was made clearer in Schema 1.1, where "whitespace" is called a 
pre-lexical facet, and "pattern" etc. are called lexical facets.

Thanks,
Sandy Gao
Source Code Monitoring (SCMon)
IBM Canada
sandygao at ca.ibm.com



From:   Steve Hanson/UK/IBM at IBMGB
To:     Sandy Gao/Toronto/IBM at IBMCA, 
Date:   2013-07-17 06:07 AM
Subject:        DFDL String Literal type


Hi Sandy

Please can I ask your advice on use of the whitespace facet in conjunction 
with the pattern facet?  This is in order to model the correct data type 
for a DFDL String Literals. This is defined as:


DFDL String Literal
DFDL String Literals represent a sequence of literal bytes or characters 
which appear in the data stream. This presents the following challenges
-       the literal characters in the data stream might not be in the same 
encoding as the DFDL schema
-       it may be necessary to specify a literal character which is not 
valid in an XML document
-       it may be necessary to specify one or more raw byte values
A DFDL string literal can describe any of the following types of literal 
data in any combination:
-       a single literal character in any encoding
-       a string of literal characters in any encoding
-       a bi-directional character string
-       one or more characters from a set of related characters ( e.g. 
end-of-line characters)
-       a literal byte value 
A DFDL string literal is therefore able to describe any arbitrary sequence 
of bytes and characters.
Empty Strings: Empty string is not allowed as a DFDL string literal value 
unless explicitly stated otherwise in the description of a property. In 
this case the use of empty string provides some property specific behavior 
different from simply using the empty string as a value. When the empty 
string is to be used as a value, the entity %ES; must be used in the 
corresponding DFDL string literal.
Whitespace: When whitespace must be used as part of a property value, the 
DFDL string literal must use entities (such as %WSP;) to represent the 
whitespace. (This allows a property to represent lists of DFDL string 
literals by using literal spaces to separate list elements.)

The nearest match to an XSDL built-in type is xs:token, but we require the 
additional constraint that no whitespace can appear.  My thought is to 
define a restriction of xs:token that applies a pattern facet to disallow 
use of #x20, given that the whitespace 'collapse' implied by xs:token 
would have replaced #x9, #xA, #xD with #x20, collapsed contiguous #x20, 
and trimmed leading/trailing #x20.  Does that sound right? 
Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130717/e6ec188b/attachment.html>


More information about the dfdl-wg mailing list