[DFDL-WG] Fw: DFDL String Literal type
Steve Hanson
smh at uk.ibm.com
Wed Jul 17 11:06:27 EDT 2013
We discussed the correct XML schema type for DFDL String Literal on the
last WG call. I read up on xs:NMTOKEN - not appropriate as it is
basically a name so does not allow the full range of characters we need.
Then I looked at restricting xs:token, but I could not work out from the
XML Schema 1.0 spec how whitespace facets were handled when other facets
were present. So I asked Sandy, and got the very useful clarification
below. Please review for next call.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 17/07/2013 16:01 -----
From: Sandy Gao/Toronto/IBM at IBMCA
To: Steve Hanson/UK/IBM at IBMGB,
Date: 17/07/2013 13:33
Subject: Re: DFDL String Literal type
Hi Steve,
Yes, that should work. All other facet checking, including pattern,
happens *after* whitespace handling.
This was made clearer in Schema 1.1, where "whitespace" is called a
pre-lexical facet, and "pattern" etc. are called lexical facets.
Thanks,
Sandy Gao
Source Code Monitoring (SCMon)
IBM Canada
sandygao at ca.ibm.com
From: Steve Hanson/UK/IBM at IBMGB
To: Sandy Gao/Toronto/IBM at IBMCA,
Date: 2013-07-17 06:07 AM
Subject: DFDL String Literal type
Hi Sandy
Please can I ask your advice on use of the whitespace facet in conjunction
with the pattern facet? This is in order to model the correct data type
for a DFDL String Literals. This is defined as:
DFDL String Literal
DFDL String Literals represent a sequence of literal bytes or characters
which appear in the data stream. This presents the following challenges
- the literal characters in the data stream might not be in the same
encoding as the DFDL schema
- it may be necessary to specify a literal character which is not
valid in an XML document
- it may be necessary to specify one or more raw byte values
A DFDL string literal can describe any of the following types of literal
data in any combination:
- a single literal character in any encoding
- a string of literal characters in any encoding
- a bi-directional character string
- one or more characters from a set of related characters ( e.g.
end-of-line characters)
- a literal byte value
A DFDL string literal is therefore able to describe any arbitrary sequence
of bytes and characters.
Empty Strings: Empty string is not allowed as a DFDL string literal value
unless explicitly stated otherwise in the description of a property. In
this case the use of empty string provides some property specific behavior
different from simply using the empty string as a value. When the empty
string is to be used as a value, the entity %ES; must be used in the
corresponding DFDL string literal.
Whitespace: When whitespace must be used as part of a property value, the
DFDL string literal must use entities (such as %WSP;) to represent the
whitespace. (This allows a property to represent lists of DFDL string
literals by using literal spaces to separate list elements.)
The nearest match to an XSDL built-in type is xs:token, but we require the
additional constraint that no whitespace can appear. My thought is to
define a restriction of xs:token that applies a pattern facet to disallow
use of #x20, given that the whitespace 'collapse' implied by xs:token
would have replaced #x9, #xA, #xD with #x20, collapsed contiguous #x20,
and trimmed leading/trailing #x20. Does that sound right?
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130717/e6ec188b/attachment.html>
More information about the dfdl-wg
mailing list