[DFDL-WG] A gap in the behaviour for when escapeKind=escapeBlock

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue Sep 9 10:41:10 EDT 2014


Now that's a really good catch.

Is it sufficient to say the EEC is escaped if it is last in the data to
avoid creating the escaped block end?



Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>


On Mon, Sep 8, 2014 at 12:05 PM, Andrew Edwards <andy.edwards at uk.ibm.com>
wrote:

> While working in the escapeBlock area, I seem to have found a gap in the
> definition for DFDL's behaviour when using escape blocks.
>
> From section 13.2.1, relevant extracts are:
>
> *When 'escapeBlock': On unparsing the entire data are escaped by adding
> dfdl:escapeBlockStart to the beginning and dfdl:escapeBlockEnd to the end
> of the data. The data is either always escaped or escaped when needed as
> specified by dfdl:generateEscapeBlock. If the data is escaped and contains
> the dfdl:escapeBlockEnd then first character of each appearance of the
> dfdl:escapeBlockEnd is escaped by the dfdl:escapeEscapeCharacter. *
>
> and
>
> *On parsing the dfdl:escapeBlockStart string must be the first characters
> in the (trimmed) data in order to activate the escape scheme. The
> dfdl:escapeBlockStart string is removed from the beginning of the data.
> Until a matching dfdl:escapeBlockEnd string (that is, one not preceded by
> the dfdl:escapeEscapeCharacter) is found in the data, any in-scope
> terminating delimiter encountered in the data is not interpreted as such,
> and any dfdl:escapeEscapeCharacters are removed when they precede an
> dfdl:escapeBlockEnd string.*
>
>
> Now consider a a model where:
> escapeBlockStart="start"
> escapeBlockEnd="end"
> escapeEscapeCharacter="#"
>
> Then take a logical value of:
> A hash is a #
>
> When we serialize the value, we wrap the value with the escapeBlockStart
> and escapeBlockEnd, and we preceed any instance of the escapeBlockEnd *within
> the data* with an escapeEscapeCharacter.  This then gives us the physical
> value "startA hash is a #end".  If we were to parse that data, we see the
> "#end" as an escaped escapeBlockEnd and report that there is no
> escapeBlockEnd.
>
> The gap in the behavioural definition seems to be that the specification
> makes no claim to do anything to escape an instance of an
> escapeEscapeCharacter when serializing;  There is nothing to catch the case
> of an escapeEscapeCharacter that isn't escaping an escapeBlockEnd but ends
> up doing it by circumstance.
>
>
>
> Andy
>  *Andy Edwards* - *IBM Integration Bus*
> <http://www-03.ibm.com/software/products/us/en/integration-bus> - *DFDL*
> <https://w3-connections.ibm.com/wikis/home?lang=en-gb#!/wiki/IBM%20Data%20Format%20Description%20Language>
>   *Email:* *andy.edwards at uk.ibm.com* <andy.edwards at uk.ibm.com> *Snail
> Mail:*   MP211, Hursley park, Hursley, WINCHESTER, Hants, SO21 2JN *Tel
> int:* 247222 *Tel ext:* +44 (0)1962 817222 *Desk:* DE3 V17
> *The Feynman problem solving Algorithm*
>  1) Write down the problem
>  2) Think real hard
>  3) Write down the answer
> -- Murray Gell-mann in the NY Times
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140909/5724fe9d/attachment.html>


More information about the dfdl-wg mailing list