[DFDL-WG] String literal syntax for hexBinary ?? - Re: String literals - various usage patterns thereof

Mike Beckerle mbeckerle.dfdl at gmail.com
Thu Apr 19 09:27:09 EDT 2012


Ok, these are very good reasons.

So we need the function(s) in our xpath library to make doing this same
substitution easy (e.g., the replace function discussed elsewhere in this
thread would do it I believe because all we have to do is replace "%" with
"%%".)


On Thu, Apr 19, 2012 at 9:20 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> The reason that % needs escaping is that most entities start with just % -
> eg %NUL;  - and it means we have simple rules - you want to use a literal
> % then escape it. and if you use just % then we expect to see an entity
> next.  If we don't do it this way, then we will not  be able to extend the
> list of entities in the future without breaking existing expressions, and
> we won't detect the very common error of leaving off the trailing
> semi-colon by mistake..
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
> From:        Mike Beckerle <mbeckerle.dfdl at gmail.com>
> To:        Steve Hanson/UK/IBM at IBMGB
> Cc:        Tim Kimber/UK/IBM at IBMGB, dfdl-wg at ogf.org
> Date:        19/04/2012 14:01
> Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? -
> Re: String literals - various usage patterns thereof
> ------------------------------
>
>
>
> I don't think % by itself requires any escaping. There is only a need for
> escaping when the characters after the % match the syntax for one of our
> entities.
>
> I don't expect Dfdl:terminator="%done%" to require any escaping.
>
> On Apr 19, 2012 7:16 AM, "Steve Hanson" <*smh at uk.ibm.com* <smh at uk.ibm.com>>
> wrote:
> Tim, thinking some more on this:
>
> - A DFDL expression is sometimes allowed to *return* a DFDL String Literal.
> In this case, the returned value is an xs:string that conforms to the DFDL
> String Literal syntax
>
> That is indeed how properties like initiator, terminator and separator
> behave today.  But there is a problem. Let's say I have dynamically defined
> a separator at the start of my data. The value in the data is %. My
> dfdl:separator expression therefore returns %. That will give an error as a
> badly formed DFDL entity. DFDL string literal rules say that you must use
> %% to represent a single % character. The expression itself can work around
> this by checking for % and if so substituting %%, but that's a bit
> unfriendly especially as fn:replace() is not in the DFDL XPath subset - I
> think this is because it comes under this category*
> **http://www.w3.org/TR/xpath-functions/#string.match*<http://www.w3.org/TR/xpath-functions/#string.match>and not
> *
> **http://www.w3.org/TR/xpath-functions/#substring.functions*<http://www.w3.org/TR/xpath-functions/#substring.functions>.
>  Perhaps we
> should include fn:replace() or provide a DFDL function that handles %?
>
> I started wondering whether these properties' expression should return a
> list of String. I can envisage no format that has in its data a value that
> contains DFDL entity syntax and intends it to mean a DFDL entity!  That is
> too contrived to be real.  However I can certainly envisage an expression
> like this:
>
>        dfdl:separator = "{if ../version eq 1 then %CR;%LF; else %LF;}"
>
> So I think String is not sufficient and DFDL String Literal must be
> allowed.
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, OGF DFDL Working Group
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:*+44-1962-815848* <%2B44-1962-815848>
>
>
>
> From:   Tim Kimber/UK/IBM
> To:     Steve Hanson/UK/IBM at IBMGB
> Cc:     Mike Beckerle <*mbeckerle.dfdl at gmail.com*<mbeckerle.dfdl at gmail.com>>,
> *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
> Date:   19/04/2012 11:32
> Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re:
>            String literals - various usage patterns thereof
>
>
> I agree with all of that.  One refinement, though. I don't think it's
> necessary to *require* an implementation to auto-cast the result of a DFDL
> Expression into the target type. If an implementation wants to be picky
> about the return type AND issue a clearly-worded Schema Definition Error
> stating what the problem is then I think we should allow it.
>
> Arguably, this would reduce the portability of DFDL schemas, but there is
> precedent for defining a portable subset  of a language while allowing
> conveniences for users who don't need portability ( e.g. ANSI 'C' ). We
> already take that line for the regular expression syntax, so there is
> precedent for this in the DFDL specification too.
>
> regards,
>
> Tim Kimber, Common Transformation Team,
> Hursley, UK
> Internet:  *kimbert at uk.ibm.com* <kimbert at uk.ibm.com>
> Tel. 01962-816742
> Internal tel. 246742
>
>
>
>
>
> From:   Steve Hanson/UK/IBM
> To:     Tim Kimber/UK/IBM
> Cc:     Mike Beckerle <*mbeckerle.dfdl at gmail.com*<mbeckerle.dfdl at gmail.com>>,
> *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
> Date:   19/04/2012 11:04
> Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re:
>            String literals - various usage patterns thereof
>
>
> Hi Tim
>
> Firstly, both your bulleted assertions are correct, but your conclusion is
> not.
>
> Secondly, let me flesh out my earlier reply about constructor functions.
>
> The last paragraph of Section 23 says: "The result of evaluating the
> expression must be a single atomic value of the type expected by the
> context, and it is a schema definition error otherwise".
>
> This is where the XPath constructors come into play. Eg: <element
> name="myHexBin" type="xs:hexBinary" dfdl:inputValueCalc="{ xs:hexBinary
> (...) }"/>
> These xs: constructors, plus the special fn:dateTime() constructor that
> DFDL adds, allow the correct types to be created.
>
> Note that you don't always need the constructors.  An expression that
> returns a quoted value is returning an XPath string literal so that is
> automatically xs:string.  An expression that returns an unquoted number is
> returning an XPath number literal so that can be xs:decimal, xs:integer,
> xs:double (depends whether the number contains a '.' or 'e' or 'E').
> This is described here: *http://www.w3.org/TR/xpath20/#id-literals*<http://www.w3.org/TR/xpath20/#id-literals>
>
> So simply returning the literal 'DEADBEEF' will return an xs:string and if
> the context requires xs:hexBinary that is a schema definition error
> according to DFDL spec.
>
> A clarification is worth while though.  Take the following expression:
> {if ../type eq 'A' then 10000 else 20000}.  That returns xs:integer.
> - What if my context was xs:decimal?  xs:integer is a restriction of
> xs:decimal so the value will always be in range, so is that 'auto-cast'
> allowed?
> - What if my context was xs:long or another restriction of xs:integer? The
> value may or may not be in range, so is that 'auto-cast' iff value in
> range?
> I think that we should auto-cast when type restrictoions are involved, and
> clarify that in the spec.
>
> We *could* change the spec to say that the result of the expression is
> always automatically cast to the type expected by the context. That takes
> some of the burden off the modeler and makes it much more likely that
> expressions written by XPath novices will return the correct results. But
> it could also hide accidental errors. Note this proposal I shall call (d)
> as it not the same as Mike's (c). If we made this change, then returning
> the literal 'DEADBEEF' for xs:hexBinary would succeed. I don't think it
> affects the desire for expressions to be statically type checkable -
> because it is known whether type X can be cast to type Y, so a cast
> mismatch can be statically detected.
>
> Regards
>
> Steve Hanson
> Architect, Data Format Description Language (DFDL)
> Co-Chair, OGF DFDL Working Group
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:*+44-1962-815848* <%2B44-1962-815848>
>
>
>
>
> From:   Tim Kimber/UK/IBM
> To:     Mike Beckerle <*mbeckerle.dfdl at gmail.com*<mbeckerle.dfdl at gmail.com>
> >
> Cc:     *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>, *dfdl-wg-bounces at ogf.org*<dfdl-wg-bounces at ogf.org>,
> Steve
>            Hanson/UK/IBM at IBMGB
> Date:   19/04/2012 09:35
> Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re:
>            String literals - various usage patterns thereof
>
>
> I'm pretty sure that the rules are:
> - DFDL expressions must not *contain* DFDL String Literals. They must be
> valid XPath 2.0 expressions except that the list of allowable function
> names includes the DFDL extension functions.
> - A DFDL expression is sometimes allowed to *return* a DFDL String Literal.
> In this case, the returned value is an xs:string that conforms to the DFDL
> String Literal syntax. But that does not apply to your example because the
> dfdl:inputValueCalc must return a value ( an XML value ) that is valid for
> the type of the element.
>
> I think that corresponds to your answer a) ; 'DEADBEEF' is a valid
> xs:hexBinary lexical value.
>
> regards,
>
> Tim Kimber, Common Transformation Team,
> Hursley, UK
> Internet:  *kimbert at uk.ibm.com* <kimbert at uk.ibm.com>
> Tel. 01962-816742
> Internal tel. 246742
>
>
>
>
>
> From:   Mike Beckerle <*mbeckerle.dfdl at gmail.com*<mbeckerle.dfdl at gmail.com>
> >
> To:     Steve Hanson/UK/IBM at IBMGB
> Cc:     *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
> Date:   19/04/2012 07:42
> Subject:        [DFDL-WG] String literal syntax for hexBinary ?? - Re:
> String
>            literals - various usage patterns thereof
> Sent by:        *dfdl-wg-bounces at ogf.org* <dfdl-wg-bounces at ogf.org>
>
>
>
> What is the DFDL string literal syntax for a hexBinary type value?
>
> E.g.,  I want a hex binary whose value is the 4 bytes described by this
> hex: DE AD BE EF.
>
> <element name="myHexBin" type="xs:hexBinary"
> dfdl:inputValueCalc="{ ... }"/>
>
> So, what can one syntactically put, for literal constant values, in the
> input value calculation expression?
>
> Note that this is legal pure (non-DFDL) XSD (I think)
>
> <element name="aHexBin" type="xs:hexBinary" fixed="DeadBeef"/>
>
> That is, the fixed/default are allowed and one specifies these values as
> just strings of hex digits. Notice no special escaping or anything. You
> just use a string that just so happens to contain hex digits.
>
> I think there are three possibilites
> (a) we allow "DEADBEEF" i.e., because the type of the expression is
> hexBinary, a string is cast to hexBinary by interpreting it as hex nibbles.
>
> (b) we require a special kind of string literal - a bytes-only string
> literal, so for example: "%#rDE;%#rAD;%#rBE;%#rEF;" is the way you create 4
> bytes. If you just put characters, then that's a processing error - like a
> cast failure. Only raw-bytes allowed.
> (c) Anything you return from the expression is converted to a hexBinary by
> unparsing it to bytes (using current properties), then using the bytes as
> the hexBinary data. So you could have an expression that returns a double,
> and that would create 8 bytes if representation="binary".  In this case the
> decimal number 3735928559 (hex 0xdeadbeef) as a binary bigEndian int would
> produce the 4 bytes I want.
>
>
>
> --
>  dfdl-wg mailing list
>  *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>
>  *https://www.ogf.org/mailman/listinfo/dfdl-wg*<https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
>
>
>
>  ------------------------------
>
> *
> *
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>
>
>


-- 
Mike Beckerle | OGF DFDL WG Co-Chair
Tel:  781-330-0412
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120419/5c34fce9/attachment-0001.html>


More information about the dfdl-wg mailing list