[DFDL-WG] How to determine the length of an element which has text representation
Stephanie Fetzer
sfetzer at us.ibm.com
Thu Nov 19 09:39:41 CST 2009
Yes - agreed. It makes sense that for parsing when delimiters are in scope
that if we hit a non-delimited length that we 'turn off scanning'. If
everyone is agreed on that then..
The decision to be made here is how we will handle elements with length
requirements while parsing when delimiters in scope:
1. We can allow and use dfdl:length for components with
lengthKind="delimited"...in a check that will occur after the element is
initially parsed (via delimiter)
2. We can disallow the use of dfdl:length for components with
lengthKind="delimited"...and require that any length constraints be placed
on such components via an assert. An error or a warning will be generated
if dfdl:length is defined explicitly on a component with
lengthKind="delimited"
3. We can ignore the use of dfdl:length for components with
lengthKind="delimited"...and require that any length constraints be placed
on such components via an assert.
Any other options? Which way are we leaning on this?
Cheers,
-Steph
WebSphere Transformation Extender
Industry Packs - Software Engineer
From:
DFDL <mbeckerle.dfdl at gmail.com>
To:
Tim Kimber <KIMBERT at uk.ibm.com>
Cc:
"dfdl-wg at ogf.org" <dfdl-wg at ogf.org>
Date:
11/18/2009 08:54 PM
Subject:
Re: [DFDL-WG] How to determine the length of an element which has text
representation
Sent by:
dfdl-wg-bounces at ogf.org
I support tim's view here. There needs to be an idiomatic way to shut off
scanning. Rep='binary' is much too obscure.
Question: which other length kinds should switch off scanning? Prefix?
Implicit? None of these?
...mikeb
On Nov 18, 2009, at 12:05 PM, Tim Kimber <KIMBERT at uk.ibm.com> wrote:
I'd like to record what was discussed and raise another point which Alan
pointed out after meeting,
Discussions in the meeting
- dfdl:lengthKind applies only to the element on which it is specified. It
has no effect whatever on the parsing of child elements/groups.
- there may be some value in tolerating simple elements of type xs:string
with dfdl:representation="binary". Might be useful for schemas where
dfdl:representation="binary" throughout.
- Currently, the position of the WG is that parsers should *always* scan
to extract the text representation if there is any terminating markup in
scope. Even if lengthKind='explicit'.
- TK proposed the scheme outlined in his previous email, in which
dfdl:lengthKind alone specifies how the parser should extract the text
representation.
If lengthKind="explicit", scanning is switched off and dfdl:length is
used. If lengthKind="delimited" the text rep is extracted by scanning and
length is ignored.
- A refinement was discussed whereby dfdl:length would be checked after a
scan has been performed if dfdl:lengthKind="delimited". This would make
the modeling of some common formats simpler, and avoid the need for a
dfdl:assert to enforce the length constraint.
- MB raised the possibility that we could actually disallow dfdl:length if
lengthKind='delimited'. This is the most conservative position, but
general opinion was that it would be too restrictive. There still might be
some value in disallowing dfdl:length for other lengthKinds.
Discussions after the meeting
- Alan pointed out that lengthKind="explicit" does not necessarily mean
that the length of the field is fixed. dfdl:length might be specified as a
DFDL expression. A common reason for doing that would be to obtain the
element's length from an earlier integer field. As currently specified, if
there was any markup in scope, the text rep would be extracted by
scanning.
Restatement of my position after today's meeting:
I'm now even more convinced that dfdl:lengthKind="explicit" should switch
off scanning. Here's why:
a) The enumerations of lengthKind are explicit, implicit, prefixed,
delimited, pattern, endOfParent. The presence of 'delimited' in that list
means that in some users' minds, the other enumerations are going to be
interpreted as *alternatives* to 'delimited'.
b) If there's markup in scope, scanning cannot be switched off by any
means. Not even by setting lengthKind='explicit' AND obtaining dfdl:length
from a previous integer field. I think that's very counter-intuitive.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20091119/78a2472a/attachment-0001.html
More information about the dfdl-wg
mailing list