[DFDL-WG] String literal syntax for hexBinary ?? - Re: String literals - various usage patterns thereof

Thu Apr 19 13:18:40 EDT 2012

I have responded in-line. Agree with nearly everything.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848

From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Tim Kimber/UK/IBM at IBMGB
Cc:     dfdl-wg at ogf.org
Date:   19/04/2012 16:07
Subject:        Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re: 
String literals - various usage patterns thereof
Sent by:        dfdl-wg-bounces at ogf.org

On Thu, Apr 19, 2012 at 4:35 AM, Tim Kimber <KIMBERT at uk.ibm.com> wrote:

- DFDL expressions must not *contain* DFDL String Literals. They must be 
valid XPath 2.0 expressions except that the list of allowable function 
names includes the DFDL extension functions. 

I'm pretty sure the above statement isn't right, or doesn't mean to me 
what you intended.
SMH: I read this as meaning that DFDL string literals do not make sense 
within the expression, only when you interpret the value returned by the 
expression. I agree with that. 

Some expressions return string literals, and so their component parts must 
be able to contain string literal syntax or fragments thereof. What we 
don't want is for the semantics to require that those string literal 
syntaxes be interpreted by the xpath processor.

Let me analyze this by cases. Below are what I think are the right 
behaviors.

Case X1:

Appearing in dfdl:initiator="{ '%#234;' } The result for the initiator is 
one character, exactly as if one had written dfdl:initiator="%#234;" That 
is, the return value of the expression is then subsequently treated as a 
string literal. So I could also return a whitespace separated list of 
initiators if I wanted to.
SMH: Agree

The implications of this are that a few things one might want to return 
from an expression will cause issues. Ex: suppose dfdl:separator="{...}" 
and the expression wants to return a space character. In that case one 
must check for that and return "%SP;" instead. Whitespace generally will 
cause issues. Similarly "%" has to be "%%".  This is a headache, but I 
feel it is preferable to having different sets of rules for expression and 
non-expression cases. Doing this escapifying does require a replace 
function on strings, as has been pointed out elsewhere. Just a basic 
replace might not be sufficient. We might want a dfdl:escapify(...) 
function to deal with the all-varieties-of-whitespace issue. 
SMH: Agree. Yes it makes some things awkward and we need to make it easier 
to work round.

Case X2:
Appearing in dfdl:initiator="{ fn:concat('%#23', '4;') }" also represents 
one character, as it is the result of the xpath evaluation that we analyze 
to see what it means. 

I'm expecting this to be controversial. But again it is the result of the 
expression that is a string 'literal'.
SMH: Agree

Case X2.5:

Suppose I have a header field. If the value is N, it means terminator is 
ASCII null. So I want to write 

dfdl:terminator="{ if (headerIndicator = 'N') then '%NUL;' else ';' }"  

In that case I really do need to post process the expression to find the 
%NUL; and convert to a zero codepoint value. I can't see any other way to 
get the zero codepoint into the terminator expression in this case. This 
case X2.5 doesn't introduce anything new, it's just amplifying the point 
of case X2.
SMH: Agree

Case X3:

Appearing in <element name="foo" type="xs:string" dfdl:inputValueCalc="{ 
'%#234;' }"/>  

I am pretty sure this is 6 characters. It's a string value. There is 
nothing said about string literals here.
SMH: Agree

Case X4:

Appearing in <sequence dfdl:separator="{ if ('%#x2c;' = ',') then ';' else 
'!' }">....</sequence>  

The above would appear to need to interpret the dfdl string literals as 
soon as they are created down within the expression. That is the right 
thing, but I suggest we could live without this.
SMH: We should not interpret DFDL string literals in the middle of 
expressions. Which is the position you take in Rule 3.

We need to be very clear if we want to say only the result of an Xpath is 
ever interpreted for dfdl entities and then only for certain properties. 
SMH: We should say this clearly. The spec implies this today.

Case X4.5

Ouch check this out:

<sequence dfdl:initiator="{ '%#x2c;' }" dfdl:terminator="," 
dfdl:separator="{ if (dfdl:property('initiator', '.') eq 
dfdl:property('terminator', '.')) then ';' else '!' }"> .... </sequence>
SMH: Syntax corrected to add 2nd argument (path)

Does dfdl:property return the value after or before entities have been 
replaced? 

I'm assuming here it returns the "value" of the property, i.e., any 
expressions have been evaluated. But has the entity substitution been 
done?

I believe the right answer here is that the value of the property is the 
value before DFDL entities have been replaced. That prevents a referential 
transparency gap, and a bunch of totally bizarre stuff like people using 
delimiters just to get the entities substitution done, asking for the 
value of them with dfdl:property(...), and then redefining the delimiter 
back to say "". (Basically, we want to avoid exposing the implementation's 
entity processing behavior as a user-visible behavior.)
SMH: Agree.  But are we all happy that the result of dfdl:property when 
the property value is an expression, is the result of executing that 
expression? How complicated is that going to make implementations?

Case X5:

Appearing in <element name="bar" type="xs:string" default="{ '%#234;' }"/> 
it's 12 characters, because it's not even an expression when it appears in 
XSD string literal context.

I'm not expecting any controversy here. This seems weird, but it is part 
of being embedded properly in XSD.
SMH: Agree.

Summary: 

I think there are rules we need to articulate.

Rule 1: if a DFDL property takes an expression in addition to other 
literal syntax (enum, or string literal of some kind), then the expression 
can return a string containing the same syntax as the enum or string 
literal that the property accepts, and it is interpreted the same way.
SMH: Agree. And all property descriptions in the spec say this, I believe.

We do have one exception to this already unfortunately, which is we don't 
allow an expression to return "" in case of delimiters (thereby 
dynamically shutting off the use of the delimiter). 

(Side note: I no longer require this restriction. I asked for this, and I 
still think it's probably a good idea, but my concern when I asked for 
this restriction was based on implementation concerns. Much more 
implementation thought has gone into this now, and the planned 
implementation technique can handle this, so I don't see a requirement 
here anymore. Apologies for flip-flopping on this issue.)
SMH: We should keep this restriction please. 

Rule 2: in a DFDL xpath expression that returns a schema typed value 
(inputValueCalc - is this the only case?) the value is not examined for 
DFDL entities.
SMH: Agree, with correction. Also outputValueCalc (type of element), 
discriminators and asserts (boolean), defineVariable, newVariableInstance, 
setVariable (type of variable).

Rule 3: dfdl:property returns the value of a property before any DFDL 
entities replacements have been done.
SMH: Agree. But I think we should discuss whether it evaluates an 
expression.

So dfdl:textStandardDecimalSeparator="{ fn:concat('%#x2', 'c;') }" works, 
creates a %#x2c; which is the codepoint for a comma I believe.
SMH: Agree.

but...  dfdl:textStandardDecimalSeparator="{ if (fn:concat('%#x2', 'c;')  
= ',') then ',' else ' %SP;' }" the predicate fails because the 
intermediate result of the concat is not examined for DFDL entities, so 
the result is %SP;. That entity is however interpreted correctly as a 
space character because the final result of the expression IS examined for 
entities.
SMH: Agree.

- A DFDL expression is sometimes allowed to *return* a DFDL String 
Literal. In this case, the returned value is an xs:string that conforms to 
the DFDL String Literal syntax. But that does not apply to your example 
because the dfdl:inputValueCalc must return a value ( an XML value ) that 
is valid for the type of the element. 

Agreed. I had to argue myself into it, but I do think this is right now.
SMH: Agree.

I think that corresponds to your answer a) ; 'DEADBEEF' is a valid 
xs:hexBinary lexical value. 

This issue seems orthogonal to me now. I do agree that if XSD allows 
"DEADBEEF" as a literal for the default value of a hexBinary, then DFDL 
expressions should do the same. 
SMH: Disagree. XPath 2.0 does not have the same rules as XSDL in this 
regard. I specifically checked this point. 

...mikeb--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120419/e3d59b74/attachment.html>