[dfdl-wg] DFDL subset of XML schema

Steve Hanson smh at uk.ibm.com
Wed Apr 13 12:26:12 CDT 2005





Hi Mike

Replies in-lined below thus >>SMH>>.

Regards, Steve

Steve Hanson
WebSphere Business Integration Brokers,
IBM Hursley, England
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848


                                                                           
             mike.beckerle at asc                                             
             entialsoftware.co                                             
             m                                                          To 
             Sent by:                  Steve Hanson/UK/IBM at IBMGB,          
             owner-dfdl-wg at ggf         dfdl-wg at gridforum.org               
             .org                                                       cc 
                                                                           
                                                                   Subject 
             13/04/2005 15:30          RE: [dfdl-wg] DFDL subset of XML    
                                       schema                              
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           





Hi Steve,

Yes, we're not really proposing that subset for the "final" language,
rather
we're trying to be as minimalist as we can up front to facilitate the
prototype.

I think it's fine if the prototype is a subset of what we specify for v1.0.
I just don't want it to be inconsistent with it. The prototype is
"non-normative" in w3c lingo, so there will be two documents. One
describing
what the prototype does and implements, and the draft standard document can
be different.

>>SMH>>: OK - it was not clear that this applied to the prototype only.

Per your point 2 below. Let's split this into "single top level global
element" and the others. The others cause no issue far as I can tell. In
fact in our prototype it turns out that we don't even see any difference
between an element with anonymous type and an element reference. The code
just sees an element with name, type, etc. So those are easy to support.
Attributes are easily supported also. We need to add a flag bit to our
prototype that's all.

>>SMH>> Good.

The single top level issue is a tiny bit deeper. In XML you can get away
with more than one global top level element because the documents are
always
tagged to make it unambiguous which one of the possible global top level
elements describes the file. In DFDL we need specific information about
which of the possible global element declarations applies to the actual
file
since there may be nothing in the data which makes it clear. We could do
this with an annotation that indicates "this is the one that actually
applies to the file". What we did in the prototype is just require there be
only one global element declaration to make this unambiguous.

>>SMH>> I would expect that one of the inputs to a DFDL parser would be the
name of a top-level global element as well as the name of a DFDL schema.
That
way a single schema can cope with multiple different files.

The primary reason we left out element references is to be minimal. There's
nothing you can do with them that you can't do with a type definition and
an
ordinary element declaration, so they seem simply unnecessary, and that
solved our ambiguity problem too.

>>SMH>> Not sure how this ties in with your answer re element references
above?

Resusable groups - yes these are easily supported also modulo that they can
have separate minOccurs/maxOccurs at point of use. Again I think our
prototype never even sees them. The EMF XSD library essentially forward
substitutes them for us so our code never deals with them. Right now we'd
miss any additional min/max occurs information though, so that is a bug.

>>SMH>> minOccurs and maxOccurs are also allowed on local groups, not just
group
references. The fact that the EMF XSD library loses embedded groups,
whether
local or named, is a big problem. When folk model data, we have found that
embedded groups are used a lot as a convenient way to change group-level
rep properties, without the need to create elements. For our existing
parser,
this means the EMF XSD library is insufficient and an alternative is
needed at runtime that preserves group structure. I would say this same
would
be true for any DFDL parser too.

Re: hexbinary - I don't understand your use of hexbinary. Can you clarify?

>>SMH>> A data structure includes a BLOB of known length. Real binary data
and
not subject to code page conversion. We would model that as having a type
of xsd:hexBinary.

Other simple types: yes we could put all of the date types in. We just
chose
to keep it minimal. I left out the obscure 'date fragment' types because
I've never seen data containing things like that, but it's a very minor
thing. If you think we need them, then we need them. However I would argue
against putting in things for the sake of having more of XSD "covered".

>>SMH>> Even if data were in 'date fragment' form most users would be happy
to treat it as string data. I agree that it could be excluded in 1.0.

Substitution groups - I agree these could be a pseudo choice construct, but
I prefer to make an XSD subset and be explicit about it being a subset
rather than go for a way to assign meaning to everything in XSD.

>>SMH>> I am ok with substitution groups being omitted in 1.0.



-----Original Message-----
From: owner-dfdl-wg at ggf.org [mailto:owner-dfdl-wg at ggf.org] On Behalf Of
Steve Hanson
Sent: Wednesday, April 13, 2005 6:59 AM
To: dfdl-wg at gridforum.org
Subject: [dfdl-wg] DFDL subset of XML schema





Mike, looking at your proposal working draft, I don't agree with the DFDL
subset you are proposing. I think it is too restrictive. Specifically:

1) xsd:all - we have discussed this as part of the unordered mail exchange
last week so I think we now agree this is needed.

2) Single top-level global element, global attributes, element references,
attribute references. This prevents re-use.

3) Reusable groups. Ditto.

4) Simple type hexBinary. This is the MRM model's default mapping for
binary
data.

5) Other simple types. Some of these could be discussed - eg, the date
restrictions.

6) Substitution groups. We basically treat these a choice in non-XML data.
But I would be ok with deferring support post 1.0.

Regards, Steve

Steve Hanson
WebSphere Business Integration Brokers,
IBM Hursley, England
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848







More information about the dfdl-wg mailing list