[DFDL-WG] delimited binary data - clarifications

Steve Hanson smh at uk.ibm.com
Wed Nov 8 12:35:47 EST 2017


1. Yes, all delimiters allowed
2. No restriction on the encoding of such delimiters
3. Escape schemes only apply to text representation

The dfdl:escapeSchemeRef property only appears in section  13.2 Properties 
Common to All Simple Types with Text representation.

When creating a DFDL schema that involves delimited binary data, you have 
to be careful that your data can't contain any bytes that match any 
in-scope delimiter. 

I believe that IBM DFDL's byte scanner converts in-scope delimiters into 
the equivalent bytes using the dfdl:encoding of the object, then matches 
the bytes.

Regards
 
Steve Hanson
IBM Hybrid Integration, Hursley, UK
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
smh at uk.ibm.com
tel:+44-1962-815848
mob:+44-7717-378890



From:   Mike Beckerle <mbeckerle.dfdl at gmail.com>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, Josh Adams 
<jadams at tresys.com>
Date:   08/11/2017 13:49
Subject:        [DFDL-WG] delimited binary data - clarifications
Sent by:        "dfdl-wg" <dfdl-wg-bounces at ogf.org>



Daffodil project is implementing various packed formats, and looking at 
the TLOG schema on the DFDL Schemas site.

The DFDL spec is clear that lengthKind delimited is allowed for packed 
formats (all variants of packed) and hexBinary.

My question is whether there is any restriction on the generality of this 
that was intended, but not stated in the spec, where we should be issuing 
a clarification.

E.g.,

1.      Can binary data have all of initiators, terminators, and 
separators?
2.      Is there a restriction on the charset encoding used to specify 
these, e.g., SBCS? Or do the byte patterns being used to scan for these 
require conversion of the specified delimiter to bytes from any supported 
encoding?
3.      Do escape schemes apply to delimited binary?
If, in fact, all these things are allowed, then I believe we should add a 
one-liner to section 12.3.2.2 specifying that all aspects of delimited 
parsing including the above, are specifically allowed.









--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ogf.org_mailman_listinfo_dfdl-2Dwg&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=AJa9ThEymJXYnOqu84mJuw&m=R0T7EfuN11XJCtCsg2SR2uygmOXAvBpa2q-Z5aWuazM&s=-ryCrn-ycFskwbf9Uv-Ewr56JAk2s4vV7T0uyCE088U&e=

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20171108/57548eac/attachment.html>


More information about the dfdl-wg mailing list