[DFDL-WG] clarification on when escape characters are needed

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed Jun 19 10:04:15 EDT 2013


I would state it this way:

Your format requires that you can have an unescaped, unquoted appearance of
the 'separator' which is '=' inside the data value of an element.

That is inconsistent with separators. (Albeit there is debate about whether
it should be ok for a known-statically-last element when there are only
infix separators, but that is a subtlety that will make the format fragile
if you depend on it. E.g., what if a subsequent field is added somehow?)

On Wed, Jun 19, 2013 at 9:53 AM, Steve Hanson <smh at uk.ibm.com> wrote:

> James
>
> Escape schemes work by specifying special character(s) that indicate that
> other characters in the data are not to be treated as delimiters.
>
> The line in question is:
>
>         password=f82+=7&%q
>
> There's no quotes around the data part of it. That's why you can't use an
> escape scheme here.
>
> Your other example:
>
>         boundary="----=_Part_150709_149622714.1370937621731"
>
> There's quotes around the data part of it. That's why you can use an
> escape scheme (that specifies quotes as start/end characters) and it works.
>
> Regards
>
> Steve Hanson
> Architect, IBM Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>
> tel:+44-1962-815848
>
>
>
> From:        "Garriss Jr., James P." <jgarriss at mitre.org>
> To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
> Date:        19/06/2013 14:12
> Subject:        Re: [DFDL-WG] clarification on when escape characters are
> needed
> Sent by:        dfdl-wg-bounces at ogf.org
> ------------------------------
>
>
>
> This origin of this issue is the Content-Type header in email, where the
> parameters can be quoted, but sometimes are not:
>
> Content-Type: text/html; charset=”UTF-8”
> Content-Type: text/html; charset=UTF-8
>
> This was not a big deal until I ran into a parameter that included an = in
> the value:
>
> Content-Type: multipart/alternative;
> boundary="----=_Part_150709_149622714.1370937621731"
>
> When confronted with this issue, I was told:
>
> >> there's a pretty simple
> >> fix: specify an escape scheme that says that anything inside quotes is
> >> not a delimiter. And fortunately your DefaultProperties.xsd file
> >> actually comes an escape scheme that does exactly that.
> >>
> >> So all you have to do is add this:
> >>
> >>       dfdl:escapeSchemeRef="DefaultPropertiesEscapeScheme"
> >>
> >> to this:
> >>
> >>        <xsd:element name="value" type="xsd:string" />
> >>
>
> You may well recognize this scheme, as it’s yours:
>
>                                     <dfdl:defineEscapeScheme name=
> "DefaultPropertiesEscapeScheme">
>                                                <dfdl:escapeSchemeescapeBlockEnd=
> """ escapeBlockStart="""
>                                                            escapeCharacter=
> """ escapeEscapeCharacter=""" escapeKind="escapeBlock"
>
>  extraEscapedCharacters=", %#x0D; %#x0A;" generateEscapeBlock="whenNeeded"
>                                                > </dfdl:escapeScheme>
>                                    </dfdl:defineEscapeScheme>
>
> I used this solution for the parameters of the Content-Type header, which
> are key/value pairs.
>
>         <xsd:sequence dfdl:separator="=">
>            <!-- this init is a workaround for Daffodil 0.10 bug (see
> ContentType element above) -->
>            <xsd:element name="key" dfdl:initiator="%WSP*;">
>                <xsd:annotation>
>                    <xsd:appinfo source="http://www.ogf.org/dfdl/dfdl-1.0/"
> >
>                        <dfdl:assert test="{ dfdl:checkConstraints(.) }"message="The
> parameter key must match one of the values on the enumerated list."/>
>                    </xsd:appinfo>
>                </xsd:annotation>
>                <xsd:simpleType>
>                    <xsd:restriction base="xsd:string">
>                        <xsd:enumeration value="charset"/>
>                        <xsd:enumeration value="name"/>
>                        <xsd:enumeration value="boundary"/>
>                    </xsd:restriction>
>                </xsd:simpleType>
>            </xsd:element>
>            <!-- Daffodil 0.10.1 fails here if there's an = in the value.
> -->
>            <xsd:element name="value" type="xsd:string"dfdl:escapeSchemeRef=
> "DefaultPropertiesEscapeScheme"/>
>        </xsd:sequence>
>
> Without the scheme, I get an error.  With it, it works great.
>
> So is this an inappropriate use of an escape scheme?
>
> *From:* Steve Hanson [mailto:smh at uk.ibm.com <smh at uk.ibm.com>] *
> Sent:* Wednesday, June 19, 2013 8:47 AM*
> To:* Garriss Jr., James P.*
> Cc:* dfdl-wg at ogf.org; Mike Beckerle*
> Subject:* RE: [DFDL-WG] clarification on when escape characters are needed
>
> James
>
> I don't see how an escape scheme helps here.  The "f82+=7&%q" is all
> data, there's no escape character.
>
> Regards
>
> Steve Hanson
> Architect, IBM Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>*
> **tel:+44-1962-815848* <+44-1962-815848>
>
>
>
> From:        "Garriss Jr., James P." <*jgarriss at mitre.org*<jgarriss at mitre.org>
> >
> To:        Steve Hanson/UK/IBM at IBMGB, Mike Beckerle <*
> mbeckerle.dfdl at gmail.com* <mbeckerle.dfdl at gmail.com>>,
> Cc:        "*dfdl-wg at ogf.org* <dfdl-wg at ogf.org>" <*dfdl-wg at ogf.org*<dfdl-wg at ogf.org>
> >
> Date:        19/06/2013 12:33
> Subject:        RE: [DFDL-WG] clarification on when escape characters are
> needed
> ------------------------------
>
>
>
>
> > The DFDL 1.0 spec implies the behaviour where you get…
>
> If this is the direction the WG goes, can you please make this explicit
> rather than implicit?  Using Mike’s excellent example below would go a long
> way to making the issue clear.
>
> As for a solution, would it not be better to use an escape scheme, like
> this?
>
> <sequence dfdl:separator="=" dfdl:separatorPosition="infix">
> <element name="a" type="xs:string"/>
> <element name="b" type="xs:string"
>  dfdl:escapeSchemeRef="DefaultPropertiesEscapeScheme"/>
> </sequence>
>
> (Cred to Taylor)
>
> If so, it would be helpful to include that in the example.
>  *
> From:* *dfdl-wg-bounces at ogf.org* <dfdl-wg-bounces at ogf.org> [*
> mailto:dfdl-wg-bounces at ogf.org* <dfdl-wg-bounces at ogf.org>] *On Behalf Of *Steve
> Hanson*
> Sent:* Wednesday, June 19, 2013 5:29 AM*
> To:* Mike Beckerle*
> Cc:* *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>*
> Subject:* Re: [DFDL-WG] clarification on when escape characters are needed
>
> The DFDL 1.0 spec implies the behaviour where you get:
>
> <a>password</a>
> <b>f82+</b>
>
> followed by a processing error.  There is no special casing of the last
> element in the group.
>
> Changing the model to the following achieves the desired infoset:
>
> <sequence dfdl:separator="=" dfdl:separatorPosition="infix">
> <element name="a" type="xs:string"/>
> <sequence dfdl:separator="">
>   <element name="b" type="xs:string"/>
> </sequence>
> </sequence>
>
>
> Regards
>
> Steve Hanson
> Architect, IBM Data Format Description Language (DFDL)
> Co-Chair, *OGF DFDL Working Group* <http://www.ogf.org/dfdl/>
> IBM SWG, Hursley, UK*
> **smh at uk.ibm.com* <smh at uk.ibm.com>*
> **tel:+44-1962-815848* <+44-1962-815848>
>
>
>
> From:        Tim Kimber/UK/IBM at IBMGB
> To:        *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>,
> Date:        19/06/2013 09:37
> Subject:        Re: [DFDL-WG] clarification on when escape characters are
> needed
> Sent by:        *dfdl-wg-bounces at ogf.org* <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
>
>
> In the IBM implementation we have taken the view that the separator
> defines the format for all of the group's content. That means that all
> separators are counted as being significant, even if they occur within the
> content region of the final group member.
> I agree that other interpretations are possible - the MRM parser in
> earlier versions of WebSphere Message Broker takes an infix separator out
> of scope when it encounters the final declared child of a group.
>
> I intend to address this point when I write up the rules for matching
> string literals and delimiters.
>
> regards,
>
> Tim Kimber, DFDL Team,
> Hursley, UK
> Internet:  *kimbert at uk.ibm.com* <kimbert at uk.ibm.com>
> Tel. 01962-816742
> Internal tel. 37246742
>
>
>
>
> From:        Mike Beckerle <*mbeckerle.dfdl at gmail.com*<mbeckerle.dfdl at gmail.com>
> >
> To:        *dfdl-wg at ogf.org* <dfdl-wg at ogf.org>,
> Date:        19/06/2013 03:52
> Subject:        [DFDL-WG] clarification on when escape characters are
> needed
> Sent by:        *dfdl-wg-bounces at ogf.org* <dfdl-wg-bounces at ogf.org>
> ------------------------------
>
>
>
>
>
>
> Suppose I have a sequence. It has an infix separator which is "=".
>
> <sequence dfdl:separator="=" dfdl:separatorPosition="infix">
> <element name="a" type="xs:string"/>
> <element name="b" type="xs:string"/>
> </sequence>
>
> Now, consider this data:
>
> password=f82+=7&%q
>
> I want
>
> <a>password</a>
> <b>f82+=7&%q</b>
>
> Notice how the b element contains an '=' which was not escaped in any way
> in the sequence. Element b is statically known to be last, the separator is
> infix; hence, things are unambiguous even if there is no escaping.
>
> However, there is an alternative interpretation, which is that the above
> data should fail, because it produces <a>password</a><b>f82+</b> but then
> does not find the expected stuff next. Rather it finds the '=7&%q' data. In
> other words, the sequence separator divides the sequence content into 3
> content regions, but there aren't 3 things to consume those, so it is a
> processing error.
>
> Which is correct?
>
> --
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | *
> www.tresys.com* <http://www.tresys.com/>
> --
> dfdl-wg mailing list*
> **dfdl-wg at ogf.org* <dfdl-wg at ogf.org>*
> **https://www.ogf.org/mailman/listinfo/dfdl-wg*<https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> --
> dfdl-wg mailing list*
> **dfdl-wg at ogf.org* <dfdl-wg at ogf.org>*
> **https://www.ogf.org/mailman/listinfo/dfdl-wg*<https://www.ogf.org/mailman/listinfo/dfdl-wg>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> --
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>  https://www.ogf.org/mailman/listinfo/dfdl-wg
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
> --
>   dfdl-wg mailing list
>   dfdl-wg at ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
>



-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130619/d3e889fc/attachment-0001.html>


More information about the dfdl-wg mailing list