[DFDL-WG] XML Schema do type references have to be qualified?

Mike Beckerle mbeckerle.dfdl at gmail.com
Mon Mar 19 09:07:25 EDT 2012


>
> You said "They all have the default unprefixed namespace as XML Schema's
> namespace." Technically your schema doesn't, it is using a different
> namespace.
>
>         xmlns="http://www.ogf.org/dfdl/dfdl-1.0/XMLSchemaSubset"
>

Ah, you can ignore that. Should be the standard XML Schema URI. This
subset URI is a trick I've been using to get ordinary XSD tooling in
the public version of Eclipse to validate the DFDL annotation content,
and to enforce the subset.  Another trick is to shut off standard XSD
schema validation and set up Eclipse so that it validates XSD as if it
was a regular XML file being validated by a schema. Then it does the
"right thing" and highlights all your errors.

We have, for now, and probably for a while anyway, made daffodil
accept this URI (I guess we should add a warning) so that we can use
standard eclipse tooling and get interactive validation.

> I assume this is the standard 2001 XMLSchema namespace but cut-down so as to
> include just the constructs DFDL uses in its subset?

Exactly. I cut down the Schema for XML Schema. I changed to to strict
validate the DFDL annotations,... and a few other things that make the
interactive validation work better.

> Your namespace is not formally defined in the DFDL spec, and no such xsd is
> freely available at that URL, so your schema is not portable and fails to
> validate.
> It also means that you can't strip out all the DFDL stuff and leave a pure
> XML Schema that any schema processor can handle.


> Should we make your schema generally available at that URL, so it is
> resolved by schema processor?

No, actually, I'm thinking of changing all the internally-used "URIs"
to things that cannot be mistaken for internet URLs. I.e., more like
"--/ogf.org/dfdl/dfdl-1.0/XMLSchemaSubset". This would prevent the
ongoing disaster that most XML processors can't work when disconnected
from the Internet. Rather, they work, but are slowed down terribly by
timeouts when trying to connect to these resources. Most people
consider this "not working". We don't want that for DFDL. We can
provide any schemas in the form of documents, or in files that require
a login to retrieve or something.

The w3c has a massive set of servers on the internet just to serve up
schemas for things that actually probe the URIs for schemas, when they
were supposed to just be unique identifiers, and were not supposed to
be URLs for retrieval. They're (w3c) careful now and for any new
schemas, they don't put them on the web at those URLs at all. This
whole usage of URIs that are supposed to be just unique IDs, but get
interpreted as URLs to actual files has proven to be a huge disaster
for w3c.

There are a few famous outages of these w3c servers, and people
complain that it feels like the whole Internet grinds to a halt
anytime those servers go down or slow down, because so many pieces of
software suddenly wait for a timeout on the schema retrieval over the
net.

We should consider changing the official URI for DFDL schema to
something that cannot be mistaken for a URL. Let's face it, the ogf's
poor server is not going to hold up to any volume of retrieval traffic
on the DFDL schema.

>
> The IBM implementation does not define such a subset, it just uses the
> standard  2001 XMLSchema namespace "http://www.w3.org/2001/XMLSchema", and
> then does extra checking to flag constructs and types that are not in the
> DFDL subset. More work, but with all DFDL stuff removed the result is a pure
> XML Schema.
>
> If I change your schema below to use the standard  2001 XMLSchema namespace
> then the IBM schema validator gives the following error...
>
>         CTDX1100E : XSD: Type reference
> 'http://www.w3.org/2001/XMLSchema#bar' is unresolved
>
> ...because it is looking in the 2001 XMLSchema namespace xsd for "bar".
>

Excellent.  Except for the fact that I have a bunch of daffodil
project test schemas to fix... but at least we agree on what the
behavior should be.

Apache Xerces isn't giving this same error by the way, at least the
way I have it configured. I think that's a bug.  I suspect there's an
option to turn on the expensive checks like this referential integrity
stuff that we're not using.

I will be installing Message Broker myself soon for this kind of cross
checking, as I now have a computer where I can run a decent virtual
machine image to install it onto.

...mikeb


More information about the dfdl-wg mailing list