[DFDL-WG] DFDL character entities in DFDL expressions

Tim Kimber KIMBERT at uk.ibm.com
Wed Dec 5 08:28:19 EST 2012


The key point for me is this: what happens when a DFDL expression ( which 
looks exactly like an XPath 2.0 expression ) gets lifted out of the DFDL 
xsd and used by a non-DFDL XPath processor?  If we allow the DFDL entities 
to be used like XML entities then the expression will appear to be a valid 
XPath expression,  but it will fail in some unpredictable way. On the 
other hand, if DFDL entities can only be used in conjunction with a DFDL 
function then  (presumably ) the non-DFDL XPath engine will report that an 
unknown extension function is being used.

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742




From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Andrew Coleman/UK/IBM at IBMGB, 
Cc:     Steve Hanson/UK/IBM at IBMGB, Tim Kimber/UK/IBM at IBMGB, 
dfdl-wg at ogf.org
Date:   05/12/2012 13:17
Subject:        Re: [DFDL-WG] DFDL character entities in DFDL expressions



Well, yes, I think we're discussing exactly how that should work.

No matter what, we do need to be clear that string values created in 
DFDL's expression language can contain these XML-illegal characters, since 
they are allowed in DFDL's infoset. This means that DFDL implementations 
can only re-use an existing XPath implementation to create their DFDL 
expression language implementation to the extent that it does NOT enforce 
the XML-illegal characters restrictions all over the place. 

I am currently working with standard Saxon-B XPath and will report back. 

But let's be optimistic. The question then is just what is the solution to 
creating a string-literal including these characters. That cannot be done 
without some beyond-XML mechanism. DFDL has a string-literal notation for 
expressing these characters, so we either say that string literals in the 
expression language can use the DFDL character and numeric entities, or we 
can do something more 'library like', and provide a function which 
interprets the string-literal notation, and isolate the implementation 
concerns a bit.

As a language embedded in XML schema, we already straddle the fence of two 
somewhat inconsistent language environments. 

E.g., the literals one can use as the value of the default attribute on an 
element declaration cannot use DFDL character entities, as this is a 
purely XML Schema construct. 

Similarly, the regular expressions one can use for the XML schema pattern 
facet are more restrictive than the DFDL regular expressions one can use 
in a dfdl:assert, or a dfdl:lengthKind='pattern'. 

So, it's acceptable to me to say that expressions also have some split 
where the dfdl-specific aspects, like the dfdl character and numeric 
entities notation, is isolated in a sub-construct.



On Wed, Dec 5, 2012 at 6:37 AM, Andrew Coleman <andrew_coleman at uk.ibm.com> 
wrote:
No, casting a hexBinary to a string will just write out the octets - i.e. 
the string will be '00'. 

XPath itself has no mechanism for interpreting entity references or 
character references.  Its hosting language (XQuery or XSLT/XML) provides 
this.  Since DFDL is XML, wouldn't that provide a mechanism? 

Regards, 
- Andy 

__________________________________________
Andrew Coleman
WebSphere Message Broker Development
IBM Hursley Park




From:        Steve Hanson/UK/IBM 
To:        Tim Kimber/UK/IBM at IBMGB, 
Cc:        dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org, Mike Beckerle <
mbeckerle.dfdl at gmail.com>, Andrew Coleman/UK/IBM at IBMGB 
Date:        05/12/2012 11:06 
Subject:        Re: [DFDL-WG] DFDL character entities in DFDL expressions 



Aren't XPath facilities sufficient here? 

outputValueCalc="{   if (fn:string-length(../s) lt 64) then 
fn:concat(../s, xs:string(xs:hexBinary('00'))) else ../s   }" 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 




From:        Tim Kimber/UK/IBM at IBMGB 
To:        Mike Beckerle <mbeckerle.dfdl at gmail.com>, 
Cc:        dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org 
Date:        05/12/2012 10:51 
Subject:        Re: [DFDL-WG] DFDL character entities in DFDL expressions 
Sent by:        dfdl-wg-bounces at ogf.org 



I think the restriction was aimed at avoiding things like this: 

outputValueCalc="{   if (fn:string-length(../s) lt 64) then 
fn:concat(../s, '%#rFF;') else ../s   }" 

I agree that a total ban is too restrictive. My personal preference would 
be for the dfdl:string() function because it makes the usage of 
DFDL-specific features obvious in the DFDL expression. But what would be 
the return type of dfdl:string()? It it returned a sequence of characters 
then the raw byte entity ( %#rnn; ) would still need to be disallowed. 
regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742  
Internal tel. 37246742




From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        dfdl-wg at ogf.org, 
Date:        04/12/2012 23:36 
Subject:        [DFDL-WG] DFDL character entities in DFDL expressions 
Sent by:        dfdl-wg-bounces at ogf.org 




We currently have this language in the spec:

"Within an expression, a string is never interpreted as a DFDL string 
literal."

To me this means one cannot use DFDL character entities in an expression.

However, I need to do this:

        outputValueCalc="{   if (fn:string-length(../s) lt 64) then 
fn:concat(../s, '%NUL;') else ../s   }"

Basically, I need to append a NUL on the end of the string in the output 
value case.

Unless I can put a %NUL; into an expression and have it interpreted as a 
DFDL String literal,  I am not sure how I can achieve this. 

At minimum I need a new DFDL function which might be an alternate string 
constructor, such as dfdl:string('....') which interprets the argument as 
something where the contents are to be scanned for DFDL character entities 
and they are substituted so that the resulting string can contain the 
characters that are disallowed in XML. (like NUL)

-- 
Mike Beckerle | OGF DFDL WG Co-Chair | Tresys Technologies
Tel:  781-330-0412

--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg 


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



-- 
Mike Beckerle | OGF DFDL WG Co-Chair | Tresys Technologies
Tel:  781-330-0412



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20121205/d8bc6476/attachment-0001.html>


More information about the dfdl-wg mailing list