[DFDL-WG] Fw: DFDL String Literal type
Steve Hanson
smh at uk.ibm.com
Thu Jul 18 05:13:21 EDT 2013
I agree that xs:string is necessary for DFDL Expressions and DFDL Regexs,
that's what I recommended in the other thread.
But I'm not seeing what is wrong with using xs:token as the base type for
a DFDL String Literal. The replace/collapse algorithm:
a) Removes leading/trailing whitespace, which we want to happen to handle
element form
b) Does not lose the fact that whitespace was there - you just end up with
a single space. Which we can then detect as illegal.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Tim Kimber/UK/IBM at IBMGB
To: dfdl-wg at ogf.org,
Date: 18/07/2013 09:06
Subject: Re: [DFDL-WG] Fw: DFDL String Literal type
Sent by: dfdl-wg-bounces at ogf.org
I agree with all of that.
The best way to specify the type of a DFDL string literal in the 'schema
for DFDL annotations' would be:
- define a global simple type called 'DFDLStringLiteral' that is a
restriction of xs:string ( not xs:token ) and contains a pattern facet
that describes its lexical space..
- define a separate global simple type 'ListOfDFDLStringLiteral' that is a
list of DFDLStringLiteral
regards,
Tim Kimber, DFDL Team,
Hursley, UK
Internet: kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 37246742
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: Steve Hanson/UK/IBM at IBMGB,
Cc: dfdl-wg at ogf.org
Date: 17/07/2013 20:35
Subject: Re: [DFDL-WG] Fw: DFDL String Literal type
Sent by: dfdl-wg-bounces at ogf.org
Well, it's looking to me like xml/xsd just doesn't have the right
pre-defined whitespace-handling concepts that DFDL needs for DFDL String
Literal nor for DFDL Expression. The whitespace-separated list of DFDL
String literals works, but this is almost by accident.
If xml/xsd aren't going to have the right thing for us, I think we should
state our own rules, and we should avoid deriving from the behavor of
xs:token because it collapses even quoted whitespace inside expressions,
which is very undesirable.
To me given this value " { ../foo eq ' . ' } "
the whitespace everywhere except between the single quotes is
insignificant and can be collapsed, but collapsing shouldn't mess with a
schema author's quoted strings.
Yes we have dfdl:decodeDFDLEntities('%SP;%SP;.%SP;%SP;") which could be
plugged in instead. But I think this is a hack.
So to me, from the XSD schema of DFDL annotations point of view, DFDL
expression is a whitespace-preserving string, and DFDL String Literal is
as well. The DFDL implementation must then provide the behavior for
removal of insignificant whitespace.
For DFDL Expressions, all whitespace is insignificant except that between
quotation marks which is significant.
For DFDL String Literals, no whitespace is allowed, and DFDL Character
Entities must be used.
On Wed, Jul 17, 2013 at 11:06 AM, Steve Hanson <smh at uk.ibm.com> wrote:
We discussed the correct XML schema type for DFDL String Literal on the
last WG call. I read up on xs:NMTOKEN - not appropriate as it is
basically a name so does not allow the full range of characters we need.
Then I looked at restricting xs:token, but I could not work out from the
XML Schema 1.0 spec how whitespace facets were handled when other facets
were present. So I asked Sandy, and got the very useful clarification
below. Please review for next call.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 17/07/2013 16:01 -----
From: Sandy Gao/Toronto/IBM at IBMCA
To: Steve Hanson/UK/IBM at IBMGB,
Date: 17/07/2013 13:33
Subject: Re: DFDL String Literal type
Hi Steve,
Yes, that should work. All other facet checking, including pattern,
happens *after* whitespace handling.
This was made clearer in Schema 1.1, where "whitespace" is called a
pre-lexical facet, and "pattern" etc. are called lexical facets.
Thanks,
Sandy Gao
Source Code Monitoring (SCMon)
IBM Canada
sandygao at ca.ibm.com
From: Steve Hanson/UK/IBM at IBMGB
To: Sandy Gao/Toronto/IBM at IBMCA,
Date: 2013-07-17 06:07 AM
Subject: DFDL String Literal type
Hi Sandy
Please can I ask your advice on use of the whitespace facet in conjunction
with the pattern facet? This is in order to model the correct data type
for a DFDL String Literals. This is defined as:
DFDL String Literal
DFDL String Literals represent a sequence of literal bytes or characters
which appear in the data stream. This presents the following challenges
- the literal characters in the data stream might not be in the
same encoding as the DFDL schema
- it may be necessary to specify a literal character which is not
valid in an XML document
- it may be necessary to specify one or more raw byte values
A DFDL string literal can describe any of the following types of literal
data in any combination:
- a single literal character in any encoding
- a string of literal characters in any encoding
- a bi-directional character string
- one or more characters from a set of related characters ( e.g.
end-of-line characters)
- a literal byte value
A DFDL string literal is therefore able to describe any arbitrary sequence
of bytes and characters.
Empty Strings: Empty string is not allowed as a DFDL string literal value
unless explicitly stated otherwise in the description of a property. In
this case the use of empty string provides some property specific behavior
different from simply using the empty string as a value. When the empty
string is to be used as a value, the entity %ES; must be used in the
corresponding DFDL string literal.
Whitespace: When whitespace must be used as part of a property value, the
DFDL string literal must use entities (such as %WSP;) to represent the
whitespace. (This allows a property to represent lists of DFDL string
literals by using literal spaces to separate list elements.)
The nearest match to an XSDL built-in type is xs:token, but we require the
additional constraint that no whitespace can appear. My thought is to
define a restriction of xs:token that applies a pattern facet to disallow
use of #x20, given that the whitespace 'collapse' implied by xs:token
would have replaced #x9, #xA, #xD with #x20, collapsed contiguous #x20,
and trimmed leading/trailing #x20. Does that sound right?
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130718/75a192ed/attachment.html>
More information about the dfdl-wg
mailing list