[DFDL-WG] Representing multiple spaces

Steve Hanson smh at uk.ibm.com
Fri Mar 1 09:22:51 EST 2013


An alternative is to specify the terminator as a single %SP; and then use 
DFDL's justification and trimming properties to remove the excess %SP;s 
before adding to the infoset. 

<xs:element name="Data" type="xs:string" dfdl:terminator="%SP;" 
dfdl:textStringJustification="left" 
                                            dfdl:textTrimKind="padChar" 
dfdl:textStringPadCharacter="%SP;" />


Allowing %SP*; is a slightly slippery slope. It can be argued that the * 
and + could be offered for any DFDL entity, or even bracketed group of 
entities. The danger is that DFDL entities become their own matching 
language. The WG has discussed in the past allowing a regular expression 
for delimiters, and this would be a candidate feature for a future DFDL 
2.0. 

(In case you are now wondering how come %WSP*; is allowed, it is because 
an existing IBM modelling language that DFDL supersedes had such a 
facility, and this enables a smooth migration to DFDL. Hence it is an 
exception, albeit a very useful one, rather than the norm.)

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848



From:   "Garriss Jr., James P." <jgarriss at mitre.org>
To:     "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>, 
Date:   01/03/2013 13:40
Subject:        Re: [DFDL-WG] Representing multiple spaces
Sent by:        dfdl-wg-bounces at ogf.org



It seems like %SP*; and %SP+; would be useful additions to the spec.  Can 
that be considered?
 
(And yes, I meant %SP*; not %ES*;.  Good catch, thank you.)
 
From: Mike Beckerle [mailto:mbeckerle.dfdl at gmail.com] 
Sent: Thursday, February 28, 2013 7:48 PM
To: Garriss Jr., James P.
Cc: dfdl-wg at ogf.org
Subject: Re: [DFDL-WG] Representing multiple spaces
 
%WSP+; is one or more whitespaces. that might be what you want.

The only way to do one or more %SP; (I think you meant %SP*; not %ES*; - 
ES is empty string) only is like this

dfdl:terminator="%SP; %SP;%SP; %SP;%SP;%SP; ..."
i.e, a whitepace separated list of one space, two spaces, three spaces, 
etc. up to as high as you would like to go.

If that just won't cut it, then you have to go to something i call 
modeling syntax as data:

You create a group

<group name="spaces">
   <sequence>
     <xs:element name="spaces" type="xs:string" dfdl:lengthKind="pattern" 
dfdl:lengthPattern="\s*"/>
  </sequence>
</group>

This spaces group is a model for something that is just a syntactic 
feature of your data.

Then you keep the group out of your logical infoset by using a hidden 
group ref like so

<sequence>
  <element name="beforeSpaces" type="xs:string" dfdl:terminator="%SP;"/> 
  <sequence dfdl:hiddenGroupRef="tns:spaces"/>
  <element name="afterSpaces" type="xs:string"dfdl:terminator="%SP;"/>
</sequence>

Here's what I'm not sure of....

In XSD, it would be ok to have multiple groups like this between elements, 
because the elements aren't named "spaces", so the various instances of 
the "spaces" element can't be confused. (There is no UPA problem.)

In DFDL, I'm not sure if we allow this:

<sequence dfdl:sequenceKind="ordered">
    <element name="foo" .../>
    <element name="spaces" .../>
    <element name="bar" .../>
    <element name="spaces".../>
     ....
</sequence>

I.e., more than one child element named "spaces" in the same sequence, but 
it's not an array using minOccurs/maxOccurs and dfdl:occursCountKind, etc.

XML Schema would not have a problem with this, so long as those elements 
are all required. (minOccurs >= 1). 

I've sent a separate email to the dfdl-wg to see what others' opinions are 
on this.


On Thu, Feb 28, 2013 at 3:03 PM, Garriss Jr., James P. <jgarriss at mitre.org
> wrote:
Suppose I have a terminator that can be multiple spaces, whether 0 spaces, 
1 space, 2 spaces, or more spaces.  No other types of whitespace allowed, 
just spaces.
 
Because there’s this entity:  %WSP*;
 
I assumed there would also be this entity:  %ES*;
 
But there’s not.  Why not?  How would I represent this terminator?

TIA

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg



-- 
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | 
www.tresys.com--
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130301/6853d4dd/attachment.html>


More information about the dfdl-wg mailing list