[DFDL-WG] terminate by next field's initiator aka lengthKind="endAtStartOfNext" or something like that

Mike Beckerle mbeckerle.dfdl at gmail.com
Tue Jun 4 09:47:50 EDT 2013


I know we omitted this from DFDL v1.0 (I am quite sure I advocated that
position), and we're too late to add it back now, but while theoretically
possible I had never seen this before, but now I have seen it and I'm
wondering if it is more common than I originally thought.

The situation is this. I have an element. It wants to be delimited in that
it has an escape scheme, and it is delimited by something in the
common-sense of the word, but the terminator is actually what one thinks of
as the initiator of the next element.

It comes up in Internet Message Format headers as one example:

Reply-To: joe at foo.com
Reply-To: <joe at foo.com>
Reply-To: joe smith<joe at foo.com>
Reply-To: "joe <Mr. XML> smith"<joe at foo.com>
Reply-To: <>

In the 3rd and fourth case, there is no terminator, just the required <
which begins the next field.

Modeling this whole reply-to construct requires a choice of several
different elements which model the different formats. For example I see no
way to model a format which accepts either line one or line 2 of the above
without using a choice. That said, my real concern is with lines 3 and 4.

The natural model for lines 3 and 4 (and perhaps 5) seems like it should be
a display-name field followed by an email address field. The "<" really
does not want to be used in some situations as the terminator of the prior
field and in others as the initiator of the next field. That affects reuse
of the validation regex's, etc.

Right now the only way to model this is for the display name field to use a
regex which re-invents the escape-scheme-like behavior of the optional
quotation mark surround, and uses regex lookahead to sense the "<" when it
appears unescaped, without consuming it.

That's not too bad really, but I am curious what others have seen out there
in the world of data that also has this idiom where a string is delimited
by a unique structure at the beginning of the next element.

Do we have collective knowledge of several more such formats, or have we
all just seen this same IMF header example as the motivation.

-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130604/234b601e/attachment.html>


More information about the dfdl-wg mailing list