[DFDL-WG] Fw: DFDL String Literal type

Suman Kalia kalia at ca.ibm.com
Thu Jul 18 09:41:01 EDT 2013


Based on the notes mail chain - I think we would have DFDLStringLiteral 
derived from xsd:token and restricted with a pattern facet which does not 
allow whitespace at all.. 

DFDLExpression derived from xsd:string with a pattern facet that does not 
allow whitespace before and after the curly braces..

Suman Kalia
IBM Canada Lab
WMB Toolkit Architect and Development Lead
Tel: 905-413-3923 T/L 313-3923
Email: kalia at ca.ibm.com

For info on Message broker
http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html





From:   Steve Hanson <smh at uk.ibm.com>
To:     Tim Kimber <KIMBERT at uk.ibm.com>, 
Cc:     dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
Date:   07/18/2013 05:21 AM
Subject:        Re: [DFDL-WG] Fw: DFDL String Literal type
Sent by:        dfdl-wg-bounces at ogf.org



I agree that xs:string is necessary for DFDL Expressions and DFDL Regexs, 
that's what I recommended in the other thread. 

But I'm not seeing what is wrong with using xs:token as the base type for 
a DFDL String Literal.  The replace/collapse algorithm: 
a) Removes leading/trailing whitespace, which we want to happen to handle 
element form 
b) Does not lose the fact that whitespace was there - you just end up with 
a single space. Which we can then detect as illegal. 

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        Tim Kimber/UK/IBM at IBMGB 
To:        dfdl-wg at ogf.org, 
Date:        18/07/2013 09:06 
Subject:        Re: [DFDL-WG] Fw: DFDL String Literal type 
Sent by:        dfdl-wg-bounces at ogf.org 



I agree with all of that. 

The best way to specify the type of a DFDL string literal in the 'schema 
for DFDL annotations' would be: 
- define a global simple type called 'DFDLStringLiteral' that is a 
restriction of xs:string ( not xs:token ) and contains a pattern facet 
that describes its lexical space.. 
- define a separate global simple type 'ListOfDFDLStringLiteral' that is a 
list of DFDLStringLiteral 

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742




From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        Steve Hanson/UK/IBM at IBMGB, 
Cc:        dfdl-wg at ogf.org 
Date:        17/07/2013 20:35 
Subject:        Re: [DFDL-WG] Fw: DFDL String Literal type 
Sent by:        dfdl-wg-bounces at ogf.org 



Well, it's looking to me like xml/xsd just doesn't have the right 
pre-defined whitespace-handling concepts that DFDL needs for DFDL String 
Literal nor for DFDL Expression. The whitespace-separated list of DFDL 
String literals works, but this is almost by accident.

If xml/xsd aren't going to have the right thing for us, I think we should 
state our own rules, and we should avoid deriving from the behavor of 
xs:token  because it collapses even quoted whitespace inside expressions, 
which is very undesirable.

To me given this value "      { ../foo          eq          '  .   '  } " 
the whitespace everywhere except between the single quotes is 
insignificant and can be collapsed, but collapsing shouldn't mess with a 
schema author's quoted strings. 

Yes we have dfdl:decodeDFDLEntities('%SP;%SP;.%SP;%SP;") which could be 
plugged in instead. But I think this is a hack. 

So to me, from the XSD schema of DFDL annotations point of view, DFDL 
expression is a whitespace-preserving string, and DFDL String Literal is 
as well. The DFDL implementation must then provide the behavior for 
removal of insignificant whitespace. 

For DFDL Expressions, all whitespace is insignificant except that between 
quotation marks which is significant.

For DFDL String Literals, no whitespace is allowed, and DFDL Character 
Entities must be used. 





On Wed, Jul 17, 2013 at 11:06 AM, Steve Hanson <smh at uk.ibm.com> wrote: 
We discussed the correct XML schema type for DFDL String Literal on the 
last WG call.  I read up on xs:NMTOKEN  - not appropriate as it is 
basically a name so does not allow the full range of characters we need. 
Then I looked at restricting xs:token, but I could not work out from the 
XML Schema 1.0 spec how whitespace facets were handled when other facets 
were present.  So I asked Sandy, and got the very useful clarification 
below. Please review for next call. 

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 
----- Forwarded by Steve Hanson/UK/IBM on 17/07/2013 16:01 ----- 

From:        Sandy Gao/Toronto/IBM at IBMCA 
To:        Steve Hanson/UK/IBM at IBMGB, 
Date:        17/07/2013 13:33 
Subject:        Re: DFDL String Literal type 


Hi Steve, 

Yes, that should work. All other facet checking, including pattern, 
happens *after* whitespace handling. 

This was made clearer in Schema 1.1, where "whitespace" is called a 
pre-lexical facet, and "pattern" etc. are called lexical facets.

Thanks,
Sandy Gao
Source Code Monitoring (SCMon)
IBM Canada
sandygao at ca.ibm.com 



From:        Steve Hanson/UK/IBM at IBMGB 
To:        Sandy Gao/Toronto/IBM at IBMCA, 
Date:        2013-07-17 06:07 AM 
Subject:        DFDL String Literal type 


Hi Sandy 

Please can I ask your advice on use of the whitespace facet in conjunction 
with the pattern facet?  This is in order to model the correct data type 
for a DFDL String Literals. This is defined as: 


DFDL String Literal 
DFDL String Literals represent a sequence of literal bytes or characters 
which appear in the data stream. This presents the following challenges 
-        the literal characters in the data stream might not be in the 
same encoding as the DFDL schema 
-        it may be necessary to specify a literal character which is not 
valid in an XML document 
-        it may be necessary to specify one or more raw byte values 
A DFDL string literal can describe any of the following types of literal 
data in any combination: 
-        a single literal character in any encoding 
-        a string of literal characters in any encoding 
-        a bi-directional character string 
-        one or more characters from a set of related characters ( e.g. 
end-of-line characters) 
-        a literal byte value 
A DFDL string literal is therefore able to describe any arbitrary sequence 
of bytes and characters. 
Empty Strings: Empty string is not allowed as a DFDL string literal value 
unless explicitly stated otherwise in the description of a property. In 
this case the use of empty string provides some property specific behavior 
different from simply using the empty string as a value. When the empty 
string is to be used as a value, the entity %ES; must be used in the 
corresponding DFDL string literal. 
Whitespace: When whitespace must be used as part of a property value, the 
DFDL string literal must use entities (such as %WSP;) to represent the 
whitespace. (This allows a property to represent lists of DFDL string 
literals by using literal spaces to separate list elements.) 
The nearest match to an XSDL built-in type is xs:token, but we require the 
additional constraint that no whitespace can appear.  My thought is to 
define a restriction of xs:token that applies a pattern facet to disallow 
use of #x20, given that the whitespace 'collapse' implied by xs:token 
would have replaced #x9, #xA, #xD with #x20, collapsed contiguous #x20, 
and trimmed leading/trailing #x20.  Does that sound right? 
Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 


--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 



-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130718/1eab809c/attachment.html>


More information about the dfdl-wg mailing list