[DFDL-WG] Further progress on action 235: Request clarifications of the Escape Schemes.

Steve Hanson smh at uk.ibm.com
Tue Nov 12 07:16:02 EST 2013


Some extra notes added by Steve to Mike's original answers.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 12/11/2013 12:14 -----

From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     Steve Hanson/UK/IBM at IBMGB, Tim Kimber/UK/IBM at IBMGB, 
Date:   23/10/2013 17:02
Subject:        Partial progress on action 235: Request clarifications of 
the Escape Schemes.



Questions from Taylor Wise:

1.      Does any character effectively escape the block start or is the 
block start inside the data a syntax error (is a valid escape block start 
only one that appears at the beginning of the data?).
2.      Does an end block have to be followed by the delimiter (optionally 
padding first) or does the absence of a delimiter mean that it is not an 
end block?
3.      Without an escape block start are the escape escape characters 
still interpreted.
4.      Does extra escape characters require escape kind = 
escapeCharacter.
5.      What is the appropriate behavior for the following:
Assuming escapeBlockStart="Start", escapeBlockEnd="END", 
escapeEscapeCharacter="!"
,4StartStart1!!23END4!ENDEND, ( comma is delimiter)

----------------------------------------------------------------

Answers:

1. Yes

2. Yes - no lookahead. To be clear:

There may not be a delimiter. When following a block start, the block end, 
not preceded by an escape escape character, is always interpreted as 
ending the content region. It may be followed by a delimiter if that is 
what is expected in the model; however, there is no lookahead for the 
delimiter or anything else. 

For an element with dfdl:lengthKind='delimited', it is a processing error 
if the block end is not followed by optional padding and a delimiter. 

3. No - without a block start nothing will be interpreted as an escape 
escape character nor as a block end.

4. No - For escapeKind="escapeBlock" presence of any of the extra escaped 
characters in the data implies that the data must be surrounded by the 
block start and block end when unparsing. This is stated in the spec. See 
dfdl:generateEscapeBlock. 

5. <SMH>Taking the example above: ,4StartStart1!!23END4!ENDEND, ( comma is 
delimiter). 
a) If the leading '4' is not trimmed as a padding character, then the 
escape block start is not treated as such because it is not at the start 
of the data, so the infoset contains '4StartStart1!!23END4!ENDEND' - no 
escaping is applied.
b) If the leading '4' is trimmed as a padding character, then the first 
'Start' is treated as escape block start, and the first unescaped 'END' is 
treated as escape block end. The '4' after 'END' may also be trimmed as a 
padding character if justification is 'center'. But the first '!' will 
cause a processing error, because the next character is expected to be the 
',' delimiter.
c) If the data was instead ',4StartStart1!!23!END!ENDEND,' and the leading 
'4' is trimmed' as per b) then the first two occurrences of 'END' are 
escaped by the '!' and the last 'END' is treated as the escape block end. 
The infoset contains 'Start1!!23ENDEND' (because 
spec says the escape escape character is not removed when it does not 
precede the escape block end **).</SMH>

The definition of escapeKind for escapeBlock needs clarification, because 
it implies one can isolate the data without interpreting the block start 
and block end. For delimited formats, the block start and block end are 
integral to identification of the delimiter. 

<SMH> Agree. And it's just not delimited formats. The text needs to be 
processed from start to finish to handle the escape escape character. 
</SMH> 

Need to clarify that the escape escape character does not apply to the 
block start ever.

Consider expressing this with a small grammar. 

** <SMH>Is this really correct, or should the escape escape character 
always be removed? </SMH>

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20131112/d09f17fc/attachment.html>


More information about the dfdl-wg mailing list