[DFDL-WG] can DFDL model this? (initiators, but no separators or terminators, plus optional elements)

Steve Hanson smh at uk.ibm.com
Wed Mar 6 11:08:09 EST 2013


What the lengthPattern property consumes is taken to be the content of the 
element. So your approach B) is correct. The regex you need uses 
'lookahead' syntax:

.+(?=LastName)

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   "Garriss Jr., James P." <jgarriss at mitre.org>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:   06/03/2013 13:37
Subject:        Re: [DFDL-WG] can DFDL model this? (initiators, but no 
separators or terminators, plus optional elements)
Sent by:        dfdl-wg-bounces at ogf.org



So obviously, Mike, you gave me exactly the right answer previously, but I 
just didn’t get it.  With the extra info that you and Steve supplied, I 
think I’m getting it.  Thank you both!
 
Question:  What goes in the regex in the lengthPattern property? 
 
A) Is it just the next initiator, something like this? 
 
<element name=”FirstName” lengthKind=”pattern” lengthPattern=”(LastName)” 
initiator=”FirstName”/>
 
B) Is it the entire contents of the element along with the next initiator, 
something like this?
 
<element name=”FirstName” lengthKind=”pattern” 
lengthPattern=”[.]+(LastName)” initiator=”FirstName”/>
 
 
From: Steve Hanson [mailto:smh at uk.ibm.com] 
Sent: Wednesday, March 06, 2013 4:16 AM
To: Garriss Jr., James P.
Cc: dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org
Subject: Re: [DFDL-WG] can DFDL model this? (initiators, but no separators 
or terminators, plus optional elements)
 
To find the next initiator, you must know it, so you should be able to 
express this in a regex. 

A good example of a format like this is RTF.  The start of an embedded 
sequence is indicated by '{'. The field prior to that has no terminator, 
so you use lengthKind 'pattern' and a regex that consumes everything up to 
but not including a '{'. 

Adding initiators to the list of in-scope terminating delimiters has been 
discussed in the DFDL WG, but was rejected on complexity grounds. Knowing 
the full list of all possible initiators gets hairy when you have lots of 
optionality or unordered behaviour. 

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848 



From:        "Garriss Jr., James P." <jgarriss at mitre.org> 
To:        "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:        05/03/2013 20:28 
Subject:        Re: [DFDL-WG] can DFDL model this? (initiators, but no 
separators or terminators, plus optional elements) 
Sent by:        dfdl-wg-bounces at ogf.org 




Good point, thank you. 
  
This is a good solution if your data follows nice, easily discerned 
patterns that can be captured with a regex. 
  
But what do you do if there’s no pattern?  What do you do if the only way 
to know you’re at the next element is to find the next initiator? 
  
From: Mike Beckerle [mailto:mbeckerle.dfdl at gmail.com] 
Sent: Tuesday, March 05, 2013 3:06 PM
To: Garriss Jr., James P.
Cc: dfdl-wg at ogf.org
Subject: Re: [DFDL-WG] can DFDL model this? (initiators, but no separators 
or terminators, plus optional elements) 
 

This is what lengthKind='pattern' is for. To give you the ability to use a 
regex with non-capturing lookahead. 
On Tue, Mar 5, 2013 at 2:52 PM, Garriss Jr., James P. <jgarriss at mitre.org> 
wrote: 
Suppose I have this input data: 
 
  FirstName James LastName Garriss Hometown Raleigh Company The MITRE 
Corporation CRLF 
 
To the human eye, this is simple.  We have four elements, each of which 
has an initiator.  But to make things more interesting: 
 
1.     The elements are all strings, and they do not have fixed lengths, 
set values, or any other terminator.  The only way you know them apart is 
by the initiator.  (And this implies that the initiators cannot be part of 
the elements.) 
2.     There are no separators (spaces can be in the data). 
3.     The third and fourth elements are optional. 
 
So these are both valid data: 
 
  FirstName John Mark LastName Smith 
  FirstName Bob LastName Brown Company IBM 
 
How do we model this? 
 
Attempt #1: 
 
I have four elements each with a unique initiator (FirstName, LastName, 
Hometown, Company).  The problem is that there’s no way to know when the 
first element terminates, so everything after the “FirstName” initiator 
ends up in the FirstName element.  Oops. 
 
Attempt #2: 
 
I got funky with the terminators.  The first element has LastName as a 
terminator.  The second element has Hometown or Company as an element. The 
third element has Company or %NL; as an element.  And the fourth one uses 
%NL;.  Works great, unless the optional third element isn’t there.  IOW, 
if I have this input: 
 
  FirstName Bob LastName Brown Company IBM 
 
Then “IBM” winds up in Hometown element.  Oops. 
 
So, what to do?  I don’t know.  I don’t know how to solve this.  Hopefully 
you’re going to teach me about some feature I don’t yet know. 
 
If not, then I have a potential solution, an addition to the spec.  Add 
this option as a terminator:  “This element terminates when you find the 
initiator to the next element.”  That’s probably easier said than done, 
but it seems to make sense in this context. 

--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 



-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com--
 dfdl-wg mailing list
 dfdl-wg at ogf.org
 https://www.ogf.org/mailman/listinfo/dfdl-wg 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130306/b3be08a2/attachment-0001.html>


More information about the dfdl-wg mailing list