[DFDL-WG] can DFDL model this? (initiators, but no separators or terminators, plus optional elements)

Garriss Jr., James P. jgarriss at mitre.org
Tue Mar 5 15:16:35 EST 2013


Good point, thank you.

This is a good solution if your data follows nice, easily discerned patterns that can be captured with a regex.

But what do you do if there's no pattern?  What do you do if the only way to know you're at the next element is to find the next initiator?

From: Mike Beckerle [mailto:mbeckerle.dfdl at gmail.com]
Sent: Tuesday, March 05, 2013 3:06 PM
To: Garriss Jr., James P.
Cc: dfdl-wg at ogf.org
Subject: Re: [DFDL-WG] can DFDL model this? (initiators, but no separators or terminators, plus optional elements)


This is what lengthKind='pattern' is for. To give you the ability to use a regex with non-capturing lookahead.
On Tue, Mar 5, 2013 at 2:52 PM, Garriss Jr., James P. <jgarriss at mitre.org<mailto:jgarriss at mitre.org>> wrote:
Suppose I have this input data:

  FirstName James LastName Garriss Hometown Raleigh Company The MITRE Corporation CRLF

To the human eye, this is simple.  We have four elements, each of which has an initiator.  But to make things more interesting:


1.     The elements are all strings, and they do not have fixed lengths, set values, or any other terminator.  The only way you know them apart is by the initiator.  (And this implies that the initiators cannot be part of the elements.)

2.     There are no separators (spaces can be in the data).

3.     The third and fourth elements are optional.

So these are both valid data:

  FirstName John Mark LastName Smith
  FirstName Bob LastName Brown Company IBM

How do we model this?

Attempt #1:

I have four elements each with a unique initiator (FirstName, LastName, Hometown, Company).  The problem is that there's no way to know when the first element terminates, so everything after the "FirstName" initiator ends up in the FirstName element.  Oops.

Attempt #2:

I got funky with the terminators.  The first element has LastName as a terminator.  The second element has Hometown or Company as an element.  The third element has Company or %NL; as an element.  And the fourth one uses %NL;.  Works great, unless the optional third element isn't there.  IOW, if I have this input:

  FirstName Bob LastName Brown Company IBM

Then "IBM" winds up in Hometown element.  Oops.

So, what to do?  I don't know.  I don't know how to solve this.  Hopefully you're going to teach me about some feature I don't yet know.

If not, then I have a potential solution, an addition to the spec.  Add this option as a terminator:  "This element terminates when you find the initiator to the next element."  That's probably easier said than done, but it seems to make sense in this context.

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>
  https://www.ogf.org/mailman/listinfo/dfdl-wg



--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com<http://www.tresys.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130305/213130eb/attachment-0001.html>


More information about the dfdl-wg mailing list