[DFDL-WG] Action 226 (was: First draft of appendix describing string literal matching)

Wed Sep 11 09:19:30 EDT 2013

New action 226 raised to cover comment #3.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848

From:   Steve Hanson/UK/IBM
To:     Tim Kimber/UK/IBM at IBMGB, 
Cc:     dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
Date:   03/09/2013 14:49
Subject:        Re: [DFDL-WG] First draft of appendix describing string 
literal matching

Thinking on 1 again, the combined table should be in section 6.3.1. 
Appendices are usually non-normative.  And the syntax table for DFDL 
expressions is part of the main spec, as an analogy.

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848

From:   Tim Kimber/UK/IBM at IBMGB
To:     dfdl-wg at ogf.org, 
Date:   03/09/2013 09:49
Subject:        Re: [DFDL-WG] First draft of appendix describing string 
literal matching
Sent by:        dfdl-wg-bounces at ogf.org

Thanks for reviewing. 

1. Let's drop tables 2 and 4 and replace with refs to the appendix, as 
suggested 
2. Agreed 
3. Good point. I think the intention of %ES; was that it should be used on 
its own. I don't see any point in allowing it to be a part of a 
non-zero-length DFDL string literal. So I think your modification to the 
grammar should be put into the spec. 

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742

From:        Steve Hanson/UK/IBM 
To:        Mike Beckerle <mbeckerle.dfdl at gmail.com>, 
Cc:        Tim Kimber/UK/IBM at IBMGB 
Date:        02/09/2013 17:46 
Subject:        Re: First draft of appendix describing string literal 
matching 

Good description.  My comments: 

1) Apart from the first three rows, the grammar table is pretty much 
duplicating existing tables 2 and 4 in section 6.3.1. Suggest either that 
the table is dropped from the appendix and anything that is missing is 
added back into 6.3.1, or tables 2 and 4 are dropped and replaced by refs 
to appendix. I think the latter is preferable as everything is then in a 
single table. 

2) There is a bug in the grammar for DfdlStringLiteral - there should not 
be '{' and '}' - that's expression syntax. 

3) For recognising ES, you say "The string part is recognized if the data 
available for matching is zero-length".  That's true if we insist that ES, 
if present, must be present on its own. I'm not sure we actually say that. 
If that is the intent, we should police this in the grammar. (Note IBM 
DFDL does not give an error if it find '%ES;abc' ). 

For 2) and 3) that would give: 
DfdlStringLiteral 
::= 
 (DfdlStringLiteralPart)+ | DfdlESEntity 

DfdlCharClassName 
::= 
DfdlNLEntity | DfdlWSPEntity | DfdlWSPStarEntity | DfdlWSPPlusEntity

It still needs an errata, as it is a change to the spec document. 

Needs references from 6.3.1. 

Regards

Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 

From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        Tim Kimber/UK/IBM at IBMGB, Steve Hanson/UK/IBM at IBMGB, 
Date:        30/08/2013 00:16 
Subject:        Re: First draft of appendix describing string literal 
matching 

I added this in current form as appendix D.

Will be in draft r14.4.

I did not create an erratum for this. It's a whole new section, not an 
error correction or clarificatino. But we can add one if we think it 
useful to point out this section.

There are no cross references to this section currently in the document. 
We might find a few places we want to reference this from.

Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com 
Please note: Contributions to the DFDL Workgroup's email discussions are 
subject to the OGF Intellectual Property Policy 

On Wed, Aug 28, 2013 at 10:43 AM, Tim Kimber <KIMBERT at uk.ibm.com> wrote: 
Thanks Mike. 

I agree that the wording could be misinterpreted. Revised draft attached: 

regards,

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742

From:        Mike Beckerle <mbeckerle.dfdl at gmail.com> 
To:        Tim Kimber/UK/IBM at IBMGB, 
Cc:        Steve Hanson/UK/IBM at IBMGB 
Date:        20/08/2013 17:33 
Subject:        Re: First draft of appendix describing string literal 
matching 

I'm not sure I agree with the algorithm in the 1.3 section for the string 
literal part "LiteralString".

I believe this algorithm is independent of what encoding the schema itself 
is written in, i.e., what is on the <? xml encoding="..." ?> slug line at 
the top of the schema file.

What you write in the schema file is read into memory, all characters are 
converted to unicode codepoints by way of that reading process.

So these two statements in the Recognition Algorithm for LiteralString are 
of concern:

"The characters in the DFDL schema will be encoded using the defined 
encoding for the schema in which they appear."

I think this just muddies the waters. Elsewhere we should state that the 
encoding used when authoring a DFDL schema file does not affect the 
behavior of the schema. All schemas behave as if authored in utf-8, etc. 

"The recognition algorithm must be able to compare character sequences 
that are encoded using different encodings." 

To me that says if I write my schema in ebcdic, but the 
dfdl:encoding="ascii", that some algorithm other than mapping both into 
unicode codepoints first and then comparing them is needed. I don't think 
this is or should be true. 

I think the division of things into what you call string literal parts is 
needed due to raw byte, and due to character class entities. Outside of 
that I think translation of everything to unicode should be sufficient.

...mike 

On Thu, Aug 15, 2013 at 7:19 PM, Tim Kimber <KIMBERT at uk.ibm.com> wrote: 
Steve, Mike, 

Please take a look. Comments on high-level stuff like structure/level of 
detail are welcome. 

regards

Tim Kimber, DFDL Team,
Hursley, UK
Internet:  kimbert at uk.ibm.com
Tel. 01962-816742 
Internal tel. 37246742

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130911/2cf53d56/attachment-0001.html>