[DFDL-WG] Grammar issue - simple and complex asymetry

Mike Beckerle mbeckerle.dfdl at gmail.com
Wed May 13 14:04:23 CDT 2009


 

The draft 034 grammar productions do not allow for a separate prefix/suffix
for a simple type as distinguished from the element having that type. 

 

Draft 034 does allow for an element of complex type to have a separate
prefix and suffix for the element itself and another one for the sequence or
choice inside it. 

 

I've come to believe this is a mistake and I suggest a fix below.

 

Right now the grammar is:

 

Element  = SimpleElement | ComplexElement

 

SimpleElement = Prefix SimpleContent Suffix 

 

SimpleContent = StringText // terminal. No more prefixes/suffixes

 

ComplexElement = Prefix ComplexContent Suffix

 

ComplexContent = Sequence | Choice 

 

Sequence = Prefix SequenceContent Suffix

Choice = Prefix ChoiceContent Suffix

 

So, if I do:

 

<complexType dfdl:initiator="[" dfdl:terminator="]">

...

<element name="y">

  <complexType>

  <sequence dfdl:separator="," >

     <element name="x" type="int"/>

     <element name="z" type="int"/>

  </sequence>

</complexType>

</element>

...

</complexType>

 

I have two prefix opportunities. I can flatten the productions above to:

 

ComplexElement = Prefix Prefix SequenceContent Suffix Suffix

 

An instance of this type would look like [[[5],[6]]]. That is, for complex
types, there are separate prefix and suffix regions for the element, and for
the model-group which makes up its content.

 

The first [ initiates element y.

The second [ initiates the sequence

The third [ initiates element x.

 

This same behavior is not true for simple types:

 

<complexType dfdl:initiator="[" dfdl:terminator="]">

...

 

<element name="y" >

  <simpleType>

    <restriction base="int"/>

  </simpleType>

</element>

...

</complexType>

 

This can only mean [5]. The grammar, as formulated in draft 034, does not
allow for more than one prefix or suffix.

The [ is the initiator of element y. 

 

 

I believe we should fix this as follows. New grammar:

 

Element  = SimpleElement | ComplexElement

 

SimpleElement = Prefix SimpleContent Suffix

 

SimpleContent = StringText 

 

ComplexElement = ComplexContent // Note: no more surrounding prefix suffix.

 

ComplexContent = Sequence | Choice 

 

Sequence = Prefix SequenceContent Suffix

Choice = Prefix ChoiceContent Suffix

 

The above grammar arranges for an element of complex type and its model
group to both taken together specify a single prefix and suffix.

 

Revisiting our example (just repeating it here):

 

<complexType dfdl:initiator="[" dfdl:terminator="]">

...

<element name="y">

  <complexType>

  <sequence dfdl:separator="," >

     <element name="x" type="int"/>

     <element name="z" type="int"/>

  </sequence>

</complexType>

</element>

...

</complexType>

 

An instance now would look like [[5],[6]]

 

The first [ is the initiator of element y, which is the same as the
initiator of the sequence that is its type.

The second [ is the initiator of element x. (which is the same as the
initiator of the int that is its type)

 

I believe this is more sensible, as it makes the behavior of simple and
complex types more similar.

 

It begs the question of how one combines conflicting properties on an
element with the properties on the type, and even the model group inside the
type in the complex case. Because all these properties are describing the
same syntax fields in the grammar.

 

That's a separate topic in a subsequent email.

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20090513/a5242d8d/attachment.html 


More information about the dfdl-wg mailing list