[DFDL-WG] Grammar issue - simple and complex asymetry

Alan Powell alan_powell at uk.ibm.com
Tue May 19 10:49:54 CDT 2009


Mike

That looks reasonable.

However as you must still be able to specify dfdl:initiator/terminator on 
the complexType for scoping we need to somehow make it clear that the 
grammar describes where the properties APPLY not where they are SPECIFIED. 


Do any properties APPLY to a complexType?

Alan Powell

 MP 211, IBM UK Labs, Hursley,  Winchester, SO21 2JN, England
 Notes Id: Alan Powell/UK/IBM     email: alan_powell at uk.ibm.com 
 Tel: +44 (0)1962 815073                  Fax: +44 (0)1962 816898




From:
"Mike Beckerle" <mbeckerle.dfdl at gmail.com>
To:
<dfdl-wg at ogf.org>
Date:
13/05/2009 20:09
Subject:
[DFDL-WG] Grammar issue - simple and complex asymetry



 
The draft 034 grammar productions do not allow for a separate 
prefix/suffix for a simple type as distinguished from the element having 
that type. 
 
Draft 034 does allow for an element of complex type to have a separate 
prefix and suffix for the element itself and another one for the sequence 
or choice inside it. 
 
I've come to believe this is a mistake and I suggest a fix below.
 
Right now the grammar is:
 
Element  = SimpleElement | ComplexElement
 
SimpleElement = Prefix SimpleContent Suffix 
 
SimpleContent = StringText // terminal. No more prefixes/suffixes
 
ComplexElement = Prefix ComplexContent Suffix
 
ComplexContent = Sequence | Choice 
 
Sequence = Prefix SequenceContent Suffix
Choice = Prefix ChoiceContent Suffix
 
So, if I do:
 
<complexType dfdl:initiator="[" dfdl:terminator="]">
...
<element name="y">
  <complexType>
  <sequence dfdl:separator="," >
     <element name="x" type="int"/>
     <element name="z" type="int"/>
  </sequence>
</complexType>
</element>
...
</complexType>
 
I have two prefix opportunities. I can flatten the productions above to:
 
ComplexElement = Prefix Prefix SequenceContent Suffix Suffix
 
An instance of this type would look like [[[5],[6]]]. That is, for complex 
types, there are separate prefix and suffix regions for the element, and 
for the model-group which makes up its content.
 
The first [ initiates element y.
The second [ initiates the sequence
The third [ initiates element x.
 
This same behavior is not true for simple types:
 
<complexType dfdl:initiator="[" dfdl:terminator="]">
...
 
<element name="y" >
  <simpleType>
    <restriction base="int"/>
  </simpleType>
</element>
...
</complexType>
 
This can only mean [5]. The grammar, as formulated in draft 034, does not 
allow for more than one prefix or suffix.
The [ is the initiator of element y. 
 
 
I believe we should fix this as follows. New grammar:
 
Element  = SimpleElement | ComplexElement
 
SimpleElement = Prefix SimpleContent Suffix
 
SimpleContent = StringText 
 
ComplexElement = ComplexContent // Note: no more surrounding prefix 
suffix.
 
ComplexContent = Sequence | Choice 
 
Sequence = Prefix SequenceContent Suffix
Choice = Prefix ChoiceContent Suffix
 
The above grammar arranges for an element of complex type and its model 
group to both taken together specify a single prefix and suffix.
 
Revisiting our example (just repeating it here):
 
<complexType dfdl:initiator="[" dfdl:terminator="]">
...
<element name="y">
  <complexType>
  <sequence dfdl:separator="," >
     <element name="x" type="int"/>
     <element name="z" type="int"/>
  </sequence>
</complexType>
</element>
...
</complexType>
 
An instance now would look like [[5],[6]]
 
The first [ is the initiator of element y, which is the same as the 
initiator of the sequence that is its type.
The second [ is the initiator of element x. (which is the same as the 
initiator of the int that is its type)
 
I believe this is more sensible, as it makes the behavior of simple and 
complex types more similar.
 
It begs the question of how one combines conflicting properties on an 
element with the properties on the type, and even the model group inside 
the type in the complex case. Because all these properties are describing 
the same syntax fields in the grammar.
 
That's a separate topic in a subsequent email.
 --
  dfdl-wg mailing list
  dfdl-wg at ogf.org
  http://www.ogf.org/mailman/listinfo/dfdl-wg







Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20090519/9eae12b2/attachment.html 


More information about the dfdl-wg mailing list