[DFDL-WG] How to determine the length of an element which has text representation

Tue Nov 24 14:52:50 CST 2009

I like to use enums instead of booleans, so I suggest this property is
dfdl:textScanningMode as an enum with current values "scanned" and
"notScanned", but as an enum we have the ability to add some intelligent
mixed mode in the future (like "scanExceptFixedLength" - if that proves
useful)

One thought: we might try to think up terminology that is more declarative,
less parse centric. These properties about "scanning" would affect output
direction also, instructing the unparser to not bother inserting escape
characters if the logical element contains say, the parent delimiter.

I currently proceed under the assumption that not scanning turns off the
whole lexical analyzer, so escape sequences detected would also be
considered to be raw string content. You would still convert code points to
logical characters but characters would not be interpreted as delimiters,
escapes, quotation marks....

There's lots of potential for schema definition errors here of course. E.g.,
lengthKind='delimited', but textScanningMode="notScanned" clearly does not
work.

...mike

On Tue, Nov 24, 2009 at 11:48 AM, Alan Powell <alan_powell at uk.ibm.com>wrote:

>
> Stephanie
>
> 4. Have a separated property to 'turn off scanning' for
> dfdl:representation='text'
> 5. Introduce a new lengthKind. 'fixedLengthDelimited'
>
> Alan Powell
>
> MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
> Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com
> Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898
>
>
>
>  From: Stephanie Fetzer <sfetzer at us.ibm.com> To:
> DFDL <mbeckerle.dfdl at gmail.com>
> Cc: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, Tim Kimber/UK/IBM at IBMGB,
> dfdl-wg-bounces at ogf.org Date: 19/11/2009 15:40
>  Subject: Re: [DFDL-WG] How to determine the length of an element which
> has        text representation
> ------------------------------
>
>
>
>
> Yes - agreed. It makes sense that for parsing when delimiters are in scope
> that if we hit a non-delimited length that we 'turn off scanning'.  If
> everyone is agreed on that then..
>
> The decision to be made here is how we will handle elements with length
> requirements while parsing when delimiters in scope:
>
> 1. We can allow and use dfdl:length for components with
> lengthKind="delimited"...in a check that will occur after the element is
> initially parsed (via delimiter)
> 2. We can disallow the use of dfdl:length for components with
> lengthKind="delimited"...and require that any length constraints be placed
> on such components via an assert.  An error or a warning will be generated
> if dfdl:length is defined explicitly on a component with
> lengthKind="delimited"
> 3. We can ignore the use of dfdl:length for components with
> lengthKind="delimited"...and require that any length constraints be placed
> on such components via an assert.
>
> Any other options? Which way are we leaning on this?
>
> Cheers,
> -Steph
>
> WebSphere Transformation Extender
> Industry Packs - Software Engineer
>
>
>   From: DFDL <mbeckerle.dfdl at gmail.com>  To: Tim Kimber <
> KIMBERT at uk.ibm.com>  Cc: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>  Date: 11/18/2009
> 08:54 PM  Subject: Re: [DFDL-WG] How to determine the length of an element
> which has        text representation  Sent by: dfdl-wg-bounces at ogf.org
>
>  ------------------------------
>
>
>
> I support tim's view here. There needs to be an idiomatic way to shut off
> scanning. Rep='binary' is much too obscure.
>
> Question: which other length kinds should switch off scanning? Prefix?
> Implicit? None of these?
>
> ...mikeb
>
>
> On Nov 18, 2009, at 12:05 PM, Tim Kimber <*KIMBERT at uk.ibm.com*<KIMBERT at uk.ibm.com>>
> wrote:
>
>
> I'd like to record what was discussed and raise another point which Alan
> pointed out after meeting,
>
> Discussions in the meeting
> - dfdl:lengthKind applies only to the element on which it is specified. It
> has no effect whatever on the parsing of child elements/groups.
> - there may be some value in tolerating simple elements of type xs:string
> with dfdl:representation="binary". Might be useful for schemas where
> dfdl:representation="binary" throughout.
> - Currently, the position of the WG is that parsers should *always* scan to
> extract the text representation if there is any terminating markup in scope.
> Even if lengthKind='explicit'.
> - TK proposed the scheme outlined in his previous email, in which
> dfdl:lengthKind alone specifies how the parser should extract the text
> representation.
> If lengthKind="explicit", scanning is switched off and dfdl:length is used.
> If lengthKind="delimited" the text rep is extracted by scanning and length
> is ignored.
> - A refinement was discussed whereby dfdl:length would be checked after a
> scan has been performed if dfdl:lengthKind="delimited". This would make the
> modeling of some common formats simpler, and avoid the need for a
> dfdl:assert to enforce the length constraint.
> - MB raised the possibility that we could actually disallow dfdl:length if
> lengthKind='delimited'. This is the most conservative position, but general
> opinion was that it would be too restrictive. There still might be some
> value in disallowing dfdl:length for other lengthKinds.
>
> Discussions after the meeting
> - Alan pointed out that lengthKind="explicit" does not necessarily mean
> that the length of the field is fixed. dfdl:length might be specified as a
> DFDL expression. A common reason for doing that would be to obtain the
> element's length from an earlier integer field. As currently specified, if
> there was any markup in scope, the text rep would be extracted by scanning.
>
> Restatement of my position after today's meeting:
> I'm now even more convinced that dfdl:lengthKind="explicit" should switch
> off scanning. Here's why:
> a) The enumerations of lengthKind are *explicit, implicit, prefixed,
> delimited,  pattern, endOfParent*. The presence of 'delimited' in that
> list means that in some users' minds, the other enumerations are going to be
> interpreted as *alternatives* to 'delimited'.
> b) If there's markup in scope, scanning cannot be switched off by any
> means. Not even by setting lengthKind='explicit' AND obtaining dfdl:length
> from a previous integer field. I think that's very counter-intuitive.
>
> regards,
>
> Tim Kimber, Common Transformation Team,
> Hursley, UK
> Internet:   <kimbert at uk.ibm.com>*kimbert at uk.ibm.com* <kimbert at uk.ibm.com>
> Tel. 01962-816742
> Internal tel. 246742
>
>
>
>  ------------------------------
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>  --
> dfdl-wg mailing list*
> * <dfdl-wg at ogf.org>*dfdl-wg at ogf.org* <dfdl-wg at ogf.org>*
> * <http://www.ogf.org/mailman/listinfo/dfdl-wg>*
> http://www.ogf.org/mailman/listinfo/dfdl-wg*<http://www.ogf.org/mailman/listinfo/dfdl-wg>
> --
> dfdl-wg mailing list
> dfdl-wg at ogf.org
> *http://www.ogf.org/mailman/listinfo/dfdl-wg*<http://www.ogf.org/mailman/listinfo/dfdl-wg>
> --
>
>  dfdl-wg mailing list
>  dfdl-wg at ogf.org
>
> http://www.ogf.org/mailman/listinfo/dfdl-wg
>
>
>
>
>
>  ------------------------------
>
> *
> *
>
> *Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> *
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20091124/4001ddb0/attachment.html