[DFDL-WG] Fw: DFDL String Literal type

Tim Kimber KIMBERT at uk.ibm.com
Thu Jul 18 03:59:05 EDT 2013


I agree with all of that.

The best way to specify the type of a DFDL string literal in the 'schema 
for DFDL annotations' would be:
- define a global simple type called 'DFDLStringLiteral' that is a 
restriction of xs:string ( not xs:token ) and contains a pattern facet 
that describes its lexical space.. 
- define a separate global simple type 'ListOfDFDLStringLiteral' that is a 
list of DFDLStringLiteral

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742




From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Hanson/UK/IBM at IBMGB, 
Cc:     dfdl-wg at ogf.org
Date:   17/07/2013 20:35
Subject:        Re: [DFDL-WG] Fw: DFDL String Literal type
Sent by:        dfdl-wg-bounces at ogf.org



Well, it's looking to me like xml/xsd just doesn't have the right 
pre-defined whitespace-handling concepts that DFDL needs for DFDL String 
Literal nor for DFDL Expression. The whitespace-separated list of DFDL 
String literals works, but this is almost by accident.

If xml/xsd aren't going to have the right thing for us, I think we should 
state our own rules, and we should avoid deriving from the behavor of 
xs:token  because it collapses even quoted whitespace inside expressions, 
which is very undesirable.

To me given this value "      { ../foo          eq          '  .   '  }    
" the whitespace everywhere except between the single quotes is 
insignificant and can be collapsed, but collapsing shouldn't mess with a 
schema author's quoted strings. 

Yes we have dfdl:decodeDFDLEntities('%SP;%SP;.%SP;%SP;") which could be 
plugged in instead. But I think this is a hack. 

So to me, from the XSD schema of DFDL annotations point of view, DFDL 
expression is a whitespace-preserving string, and DFDL String Literal is 
as well. The DFDL implementation must then provide the behavior for 
removal of insignificant whitespace. 

For DFDL Expressions, all whitespace is insignificant except that between 
quotation marks which is significant.

For DFDL String Literals, no whitespace is allowed, and DFDL Character 
Entities must be used. 





On Wed, Jul 17, 2013 at 11:06 AM, Steve Hanson <smh at uk.ibm.com> wrote:
We discussed the correct XML schema type for DFDL String Literal on the 
last WG call.  I read up on xs:NMTOKEN  - not appropriate as it is 
basically a name so does not allow the full range of characters we need. 
Then I looked at restricting xs:token, but I could not work out from the 
XML Schema 1.0 spec how whitespace facets were handled when other facets 
were present.  So I asked Sandy, and got the very useful clarification 
below. Please review for next call. 
 
Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 17/07/2013 16:01 ----- 

From:        Sandy Gao/Toronto/IBM at IBMCA 
To:        Steve Hanson/UK/IBM at IBMGB, 
Date:        17/07/2013 13:33 
Subject:        Re: DFDL String Literal type 


Hi Steve, 

Yes, that should work. All other facet checking, including pattern, 
happens *after* whitespace handling. 

This was made clearer in Schema 1.1, where "whitespace" is called a 
pre-lexical facet, and "pattern" etc. are called lexical facets.

Thanks,
Sandy Gao
Source Code Monitoring (SCMon)
IBM Canada
sandygao at ca.ibm.com 



From:        Steve Hanson/UK/IBM at IBMGB 
To:        Sandy Gao/Toronto/IBM at IBMCA, 
Date:        2013-07-17 06:07 AM 
Subject:        DFDL String Literal type 


Hi Sandy 

Please can I ask your advice on use of the whitespace facet in conjunction 
with the pattern facet?  This is in order to model the correct data type 
for a DFDL String Literals. This is defined as: 


DFDL String Literal 
DFDL String Literals represent a sequence of literal bytes or characters 
which appear in the data stream. This presents the following challenges 
-        the literal characters in the data stream might not be in the 
same encoding as the DFDL schema 
-        it may be necessary to specify a literal character which is not 
valid in an XML document 
-        it may be necessary to specify one or more raw byte values 
A DFDL string literal can describe any of the following types of literal 
data in any combination: 
-        a single literal character in any encoding 
-        a string of literal characters in any encoding 
-        a bi-directional character string 
-        one or more characters from a set of related characters ( e.g. 
end-of-line characters) 
-        a literal byte value 
A DFDL string literal is therefore able to describe any arbitrary sequence 
of bytes and characters. 
Empty Strings: Empty string is not allowed as a DFDL string literal value 
unless explicitly stated otherwise in the description of a property. In 
this case the use of empty string provides some property specific behavior 
different from simply using the empty string as a value. When the empty 
string is to be used as a value, the entity %ES; must be used in the 
corresponding DFDL string literal. 
Whitespace: When whitespace must be used as part of a property value, the 
DFDL string literal must use entities (such as %WSP;) to represent the 
whitespace. (This allows a property to represent lists of DFDL string 
literals by using literal spaces to separate list elements.)
The nearest match to an XSDL built-in type is xs:token, but we require the 
additional constraint that no whitespace can appear.  My thought is to 
define a restriction of xs:token that applies a pattern facet to disallow 
use of #x20, given that the whitespace 'collapse' implied by xs:token 
would have replaced #x9, #xA, #xD with #x20, collapsed contiguous #x20, 
and trimmed leading/trailing #x20.  Does that sound right? 
Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg



-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130718/3979c10b/attachment-0001.html>


More information about the dfdl-wg mailing list