[DFDL-WG] Representing multiple spaces

Garriss Jr., James P. jgarriss at mitre.org
Fri Mar 1 09:00:56 EST 2013


> IBM DFDL does not yet support dfdl:hiddenGroupRef.

Does Daffodil?

From: Steve Hanson [mailto:smh at uk.ibm.com]
Sent: Friday, March 01, 2013 3:09 AM
To: Mike Beckerle
Cc: dfdl-wg at ogf.org; dfdl-wg-bounces at ogf.org; Garriss Jr., James P.
Subject: Re: [DFDL-WG] Representing multiple spaces

Mike, you can hit XSD UPA problems whether or not an element appears directly in a sequence or within a contained sequence. UPA is about unambiguously interpreting what you get in the data, so sequences are invisible to the rules. The technique(s) you suggest below will work ok only if all the 'data' elements do not have minOccurs '0', which allows XSD to unambiguously match an instance of the 'spaces' element.

Note to James - IBM DFDL does not yet support dfdl:hiddenGroupRef.

Regards

Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group<http://www.ogf.org/dfdl/>
IBM SWG, Hursley, UK
smh at uk.ibm.com<mailto:smh at uk.ibm.com>
tel:+44-1962-815848



From:        Mike Beckerle <mbeckerle.dfdl at gmail.com<mailto:mbeckerle.dfdl at gmail.com>>
To:        "Garriss Jr., James P." <jgarriss at mitre.org<mailto:jgarriss at mitre.org>>,
Cc:        "dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>" <dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>>
Date:        01/03/2013 00:56
Subject:        Re: [DFDL-WG] Representing multiple spaces
Sent by:        dfdl-wg-bounces at ogf.org<mailto:dfdl-wg-bounces at ogf.org>
________________________________



%WSP+; is one or more whitespaces. that might be what you want.

The only way to do one or more %SP; (I think you meant %SP*; not %ES*; - ES is empty string) only is like this

dfdl:terminator="%SP; %SP;%SP; %SP;%SP;%SP; ..."
i.e, a whitepace separated list of one space, two spaces, three spaces, etc. up to as high as you would like to go.

If that just won't cut it, then you have to go to something i call modeling syntax as data:

You create a group

<group name="spaces">
   <sequence>
     <xs:element name="spaces" type="xs:string" dfdl:lengthKind="pattern" dfdl:lengthPattern="\s*"/>
  </sequence>
</group>

This spaces group is a model for something that is just a syntactic feature of your data.

Then you keep the group out of your logical infoset by using a hidden group ref like so

<sequence>
  <element name="beforeSpaces" type="xs:string" dfdl:terminator="%SP;"/>
  <sequence dfdl:hiddenGroupRef="tns:spaces"/>
  <element name="afterSpaces" type="xs:string"dfdl:terminator="%SP;"/>
</sequence>

Here's what I'm not sure of....

In XSD, it would be ok to have multiple groups like this between elements, because the elements aren't named "spaces", so the various instances of the "spaces" element can't be confused. (There is no UPA problem.)

In DFDL, I'm not sure if we allow this:

<sequence dfdl:sequenceKind="ordered">
    <element name="foo" .../>
    <element name="spaces" .../>
    <element name="bar" .../>
    <element name="spaces".../>
     ....
</sequence>

I.e., more than one child element named "spaces" in the same sequence, but it's not an array using minOccurs/maxOccurs and dfdl:occursCountKind, etc.

XML Schema would not have a problem with this, so long as those elements are all required. (minOccurs >= 1).

I've sent a separate email to the dfdl-wg to see what others' opinions are on this.



On Thu, Feb 28, 2013 at 3:03 PM, Garriss Jr., James P. <jgarriss at mitre.org<mailto:jgarriss at mitre.org>> wrote:
Suppose I have a terminator that can be multiple spaces, whether 0 spaces, 1 space, 2 spaces, or more spaces.  No other types of whitespace allowed, just spaces.



Because there’s this entity:  %WSP*;



I assumed there would also be this entity:  %ES*;



But there’s not.  Why not?  How would I represent this terminator?

TIA

--
  dfdl-wg mailing list
  dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>
  https://www.ogf.org/mailman/listinfo/dfdl-wg



--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology | www.tresys.com<http://www.tresys.com/>
--
 dfdl-wg mailing list
 dfdl-wg at ogf.org<mailto:dfdl-wg at ogf.org>
 https://www.ogf.org/mailman/listinfo/dfdl-wg

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130301/5e75d94a/attachment.html>


More information about the dfdl-wg mailing list