[DFDL-WG] String literal syntax for hexBinary ?? - need DFDL string literal constructor

Suman Kalia kalia at ca.ibm.com
Mon Jun 11 10:12:44 EDT 2012


We should make this enhancement, it improves usability of the function..

Suman Kalia
IBM Canada Lab
WMB Toolkit Architect and Development Lead
Tel: 905-413-3923 T/L 313-3923
Email: kalia at ca.ibm.com

For info on Message broker
http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html





From:   Steve Hanson <smh at uk.ibm.com>
To:     Suman Kalia/Toronto/IBM at IBMCA, 
Cc:     dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org, Tim Kimber 
<KIMBERT at uk.ibm.com>
Date:   06/11/2012 08:47 AM
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - 
need DFDL string literal constructor



I see what you are saying.  The function should be able to detect if a 
well-formed DFDL entity is present in the data and leave it alone. Well, 
yes, it could do. But in practical terms that will make no difference as 
no real world format will contain DFDL entities in the data. 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        Suman Kalia <kalia at ca.ibm.com> 
To:        Steve Hanson/UK/IBM at IBMGB 
Cc:        dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org, Tim 
Kimber/UK/IBM at IBMGB 
Date:        11/06/2012 13:14 
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - 
need DFDL string literal constructor 



Use this function when the value of a DFDL delimiter property (initiator, 
terminator, separator) is obtained from the data stream using an 
expression, and the data might contain '%' or space  characters. Do not 
use if the input string already contains DFDL entities. 

>> SKK   - Tim I think the function needs to be enhanced.   User should 
not have to scan the string to determine if it contains % or space 
characters etc before calling this function.. If the input string contains 
DFDL entities then they should be left unchanged.. 


Suman Kalia 
IBM Canada Lab 
WMB Toolkit Architect and Development Lead 
Tel: 905-413-3923 T/L 313-3923 
Email: kalia at ca.ibm.com 

For info on Message broker 
http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html 






From:        Steve Hanson <smh at uk.ibm.com> 
To:        Tim Kimber <KIMBERT at uk.ibm.com>, 
Cc:        dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org 
Date:        06/11/2012 06:33 AM 
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - 
need DFDL string literal constructor 
Sent by:        dfdl-wg-bounces at ogf.org 



Here's an attempt at putting Tim's description in the form used in the 
spec. Changed the name to drop the 'DFDL'. 
dfdl:stringLiteralFromString ($arg) 
Returns a DFDL string literal constructed from the $arg string argument. 
If $arg contains any '%' and/or space characters, then the return value 
replaces each '%' with '%%' and each space with '%SP;', otherwise $arg is 
returned unchanged. 
Use this function when the value of a DFDL delimiter property (initiator, 
terminator, separator) is obtained from the data stream using an 
expression, and the data might contain '%' or space characters. Do not use 
if the input string already contains DFDL entities.


Also, I have just noticed that our DFDL function names are not in keeping 
with the style of XPath's own function names.  XPath uses a hyphen to link 
words, instead of camel case.  Should we change DFDL function names to 
match? 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        Tim Kimber/UK/IBM at IBMGB 
To:        Steve Hanson/UK/IBM at IBMGB 
Cc:        dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org 
Date:        25/05/2012 10:59 
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - 
need DFDL string literal constructor 
Sent by:        dfdl-wg-bounces at ogf.org 



Proposed function signature : 

/** 
* Given a string, returns a DFDL String literal that matches that string. 
* Intended to be used when a delimiter ( initiator, terminator, separator 
) has been extracted from the data stream, and the value might contain 
* the % or space character. 
* Each occurrenceof '%' will be replaced by '%%' 
* Each space character will be replaced by '%SP;' 
* Do not use if the input string already contains DFDL entities. 
*/ 
String DFDLStringLiteralFromString( String delimiter ) 

Note that my proposed description of behaviour omits any mention of %ES;. 
because it is allowed only in the nilValue property, but that property 
cannot be set via a DFDL expression. 

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 246742




From:        Steve Hanson/UK/IBM at IBMGB 
To:        dfdl-wg at ogf.org 
Date:        25/05/2012 07:07 
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - 
need DFDL string literal constructor 
Sent by:        dfdl-wg-bounces at ogf.org 



Agreed today that such a function is needed, errata taken. Proposals for 
function name welcome. 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        Steve Hanson/UK/IBM 
To:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
Cc:        dfdl-wg at ogf.org 
Date:        20/04/2012 09:04 
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - 
need DFDL string literal constructor 


I think what we need is a function that takes a string and returns the 
equivalent DFDL string literal. It looks for the characters that need 
cause problems (%, space) and replaces them (with '%%' and '%SP;' 
respectively), and if the string is the empty string replaces it with 
'%ES;'. I think it's just those that are problematic, as all other 
characters will be interpreted correctly (the entity syntax is primarily 
just for convenience of data entry). 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 




From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        Steve Hanson/UK/IBM at IBMGB 
Cc:        dfdl-wg at ogf.org 
Date:        19/04/2012 14:27 
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re: 
String literals - various usage patterns thereof 



Ok, these are very good reasons.

So we need the function(s) in our xpath library to make doing this same 
substitution easy (e.g., the replace function discussed elsewhere in this 
thread would do it I believe because all we have to do is replace "%" with 
"%%".)


On Thu, Apr 19, 2012 at 9:20 AM, Steve Hanson <smh at uk.ibm.com> wrote: 
The reason that % needs escaping is that most entities start with just % - 
eg %NUL;  - and it means we have simple rules - you want to use a literal 
% then escape it. and if you use just % then we expect to see an entity 
next.  If we don't do it this way, then we will not  be able to extend the 
list of entities in the future without breaking existing expressions, and 
we won't detect the very common error of leaving off the trailing 
semi-colon by mistake.. 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        Steve Hanson/UK/IBM at IBMGB 
Cc:        Tim Kimber/UK/IBM at IBMGB, dfdl-wg at ogf.org 
Date:        19/04/2012 14:01 
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re: 
String literals - various usage patterns thereof 




I don't think % by itself requires any escaping. There is only a need for 
escaping when the characters after the % match the syntax for one of our 
entities. 
I don't expect Dfdl:terminator="%done%" to require any escaping. 
On Apr 19, 2012 7:16 AM, "Steve Hanson" <smh at uk.ibm.com> wrote: 
Tim, thinking some more on this:

- A DFDL expression is sometimes allowed to *return* a DFDL String 
Literal.
In this case, the returned value is an xs:string that conforms to the DFDL
String Literal syntax

That is indeed how properties like initiator, terminator and separator
behave today.  But there is a problem. Let's say I have dynamically 
defined
a separator at the start of my data. The value in the data is %. My
dfdl:separator expression therefore returns %. That will give an error as 
a
badly formed DFDL entity. DFDL string literal rules say that you must use
%% to represent a single % character. The expression itself can work 
around
this by checking for % and if so substituting %%, but that's a bit
unfriendly especially as fn:replace() is not in the DFDL XPath subset - I
think this is because it comes under this category
http://www.w3.org/TR/xpath-functions/#string.match and not
http://www.w3.org/TR/xpath-functions/#substring.functions.  Perhaps we
should include fn:replace() or provide a DFDL function that handles %?

I started wondering whether these properties' expression should return a
list of String. I can envisage no format that has in its data a value that
contains DFDL entity syntax and intends it to mean a DFDL entity!  That is
too contrived to be real.  However I can certainly envisage an expression
like this:

   dfdl:separator = "{if ../version eq 1 then %CR;%LF; else %LF;}"

So I think String is not sufficient and DFDL String Literal must be
allowed.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   Tim Kimber/UK/IBM
To:     Steve Hanson/UK/IBM at IBMGB
Cc:     Mike Beckerle <mbeckerle.dfdl at gmail.com>, dfdl-wg at ogf.org
Date:   19/04/2012 11:32
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re:
       String literals - various usage patterns thereof


I agree with all of that.  One refinement, though. I don't think it's
necessary to *require* an implementation to auto-cast the result of a DFDL
Expression into the target type. If an implementation wants to be picky
about the return type AND issue a clearly-worded Schema Definition Error
stating what the problem is then I think we should allow it.

Arguably, this would reduce the portability of DFDL schemas, but there is
precedent for defining a portable subset  of a language while allowing
conveniences for users who don't need portability ( e.g. ANSI 'C' ). We
already take that line for the regular expression syntax, so there is
precedent for this in the DFDL specification too.

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 246742





From:   Steve Hanson/UK/IBM
To:     Tim Kimber/UK/IBM
Cc:     Mike Beckerle <mbeckerle.dfdl at gmail.com>, dfdl-wg at ogf.org
Date:   19/04/2012 11:04
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re:
       String literals - various usage patterns thereof


Hi Tim

Firstly, both your bulleted assertions are correct, but your conclusion is
not.

Secondly, let me flesh out my earlier reply about constructor functions.

The last paragraph of Section 23 says: "The result of evaluating the
expression must be a single atomic value of the type expected by the
context, and it is a schema definition error otherwise".

This is where the XPath constructors come into play. Eg: <element
name="myHexBin" type="xs:hexBinary" dfdl:inputValueCalc="{ xs:hexBinary
(...) }"/>
These xs: constructors, plus the special fn:dateTime() constructor that
DFDL adds, allow the correct types to be created.

Note that you don't always need the constructors.  An expression that
returns a quoted value is returning an XPath string literal so that is
automatically xs:string.  An expression that returns an unquoted number is
returning an XPath number literal so that can be xs:decimal, xs:integer,
xs:double (depends whether the number contains a '.' or 'e' or 'E').
This is described here: http://www.w3.org/TR/xpath20/#id-literals

So simply returning the literal 'DEADBEEF' will return an xs:string and if
the context requires xs:hexBinary that is a schema definition error
according to DFDL spec.

A clarification is worth while though.  Take the following expression:
{if ../type eq 'A' then 10000 else 20000}.  That returns xs:integer.
- What if my context was xs:decimal?  xs:integer is a restriction of
xs:decimal so the value will always be in range, so is that 'auto-cast'
allowed?
- What if my context was xs:long or another restriction of xs:integer? The
value may or may not be in range, so is that 'auto-cast' iff value in
range?
I think that we should auto-cast when type restrictoions are involved, and
clarify that in the spec.

We *could* change the spec to say that the result of the expression is
always automatically cast to the type expected by the context. That takes
some of the burden off the modeler and makes it much more likely that
expressions written by XPath novices will return the correct results. But
it could also hide accidental errors. Note this proposal I shall call (d)
as it not the same as Mike's (c). If we made this change, then returning
the literal 'DEADBEEF' for xs:hexBinary would succeed. I don't think it
affects the desire for expressions to be statically type checkable -
because it is known whether type X can be cast to type Y, so a cast
mismatch can be statically detected.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848




From:   Tim Kimber/UK/IBM
To:     Mike Beckerle <mbeckerle.dfdl at gmail.com>
Cc:     dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org, Steve
       Hanson/UK/IBM at IBMGB
Date:   19/04/2012 09:35
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re:
       String literals - various usage patterns thereof


I'm pretty sure that the rules are:
- DFDL expressions must not *contain* DFDL String Literals. They must be
valid XPath 2.0 expressions except that the list of allowable function
names includes the DFDL extension functions.
- A DFDL expression is sometimes allowed to *return* a DFDL String 
Literal.
In this case, the returned value is an xs:string that conforms to the DFDL
String Literal syntax. But that does not apply to your example because the
dfdl:inputValueCalc must return a value ( an XML value ) that is valid for
the type of the element.

I think that corresponds to your answer a) ; 'DEADBEEF' is a valid
xs:hexBinary lexical value.

regards,

Tim Kimber, Common Transformation Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 246742





From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Hanson/UK/IBM at IBMGB
Cc:     dfdl-wg at ogf.org
Date:   19/04/2012 07:42
Subject:        [DFDL-WG] String literal syntax for hexBinary ?? - Re: 
String
       literals - various usage patterns thereof
Sent by:        dfdl-wg-bounces at ogf.org



What is the DFDL string literal syntax for a hexBinary type value?

E.g.,  I want a hex binary whose value is the 4 bytes described by this
hex: DE AD BE EF.

<element name="myHexBin" type="xs:hexBinary"
dfdl:inputValueCalc="{ ... }"/>

So, what can one syntactically put, for literal constant values, in the
input value calculation expression?

Note that this is legal pure (non-DFDL) XSD (I think)

<element name="aHexBin" type="xs:hexBinary" fixed="DeadBeef"/>

That is, the fixed/default are allowed and one specifies these values as
just strings of hex digits. Notice no special escaping or anything. You
just use a string that just so happens to contain hex digits.

I think there are three possibilites
(a) we allow "DEADBEEF" i.e., because the type of the expression is
hexBinary, a string is cast to hexBinary by interpreting it as hex 
nibbles.

(b) we require a special kind of string literal - a bytes-only string
literal, so for example: "%#rDE;%#rAD;%#rBE;%#rEF;" is the way you create 
4
bytes. If you just put characters, then that's a processing error - like a
cast failure. Only raw-bytes allowed.
(c) Anything you return from the expression is converted to a hexBinary by
unparsing it to bytes (using current properties), then using the bytes as
the hexBinary data. So you could have an expression that returns a double,
and that would create 8 bytes if representation="binary".  In this case 
the
decimal number 3735928559 (hex 0xdeadbeef) as a binary bigEndian int would
produce the 4 bytes I want.



--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 




-- 
Mike Beckerle | OGF DFDL WG Co-Chair 
Tel:  781-330-0412





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 







Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120611/3b5e1b22/attachment-0001.html>


More information about the dfdl-wg mailing list