[Nml-wg] Multiple namespaces
Jason Zurawski
zurawski at internet2.edu
Tue Aug 23 08:01:06 CDT 2011
Hi Freek;
Answers inline:
On 8/23/11 5:36 AM, thus spake Freek Dijkstra:
> Jason Zurawski wrote:
>
>>> Last week's mail conversation drifted from XML syntax for NML relations
>>> to the use of namespaces in NML messages.
>>>
>>> An important difference in view was identified.
>>> Jason assumed that a single NML messages would only contain one namespace.
>>
>> I never said nor implied this in any way
>
> Sorry if you feel I jumped to conclusions. You indeed only wrote:
>
>> to my knowledge a parser can only verify against a
>> single schema at any given time.
>
> Perhaps we still need to take a few steps back.
>
> Do you think that a NML messages may contain multiple namespaces?
>
> Do you agree with the following requirement I wrote earlier:
> 1. Be extensible
> 2. It should be possible to create a specific validator for each
> relation type.
> 3. Parsers should be able to recognise an unknown relation type as a
> relation subclass (rather then simply an unknown element)
>
>
> If you have time to phone today, that would be great.
You are conflating several concepts, and using them interchangeably. I
believe this is what is bringing in confusion. To be clear, I am going
to ask once again that you please (*please*) attempt to read some of the
prior art from NMC/perfSONAR. The reason I keep bringing this up is two
fold:
a) the examples are short, and easy to understand. Instead of going
around and around on email we could make up a lot of ground starting
from known examples.
b) it is working in practice today, and mimics the needs of NML in the
extensibility space
Consider this "schema file":
https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/schema/SetupDataRequest-utilization.rnc
It represents the construction of one type of message (e.g. the
"SetupDataRequest" message, specifically for utilization data). Note
some interesting things about it:
- It represents a single 'schema', e.g. it is one file that contains
the definitions to verify one specific message type only.
- It incorporates several other 'schema' definitions through the
methods of inclusion (e.g. 'include xxx { ... }' )
- It features *several* namespaces, and elements in this same 'schema'
file (or the other files) may use these namespaces
- Example instances that can be verified against this schema can be
found here:
https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/requests/SetupDataRequest-utilization-1.xml
https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/requests/SetupDataRequest-utilization-2.xml
https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/requests/SetupDataRequest-utilization-3.xml
https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/requests/SetupDataRequest-utilization-4.xml
https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/requests/SetupDataRequest-utilization-5.xml
To address your concerns above: a parser, and when I say parser I am
imagining something like libxml, is allowed to verify an instance
against one "schema file" at at time. This schema file may feature
'includes', thus expanding the available definition space, but there are
not options (at least in my experience) that allow the programmer to
give the parser some set of files and let the parser know that *"any"*
of the possible files may contain the correct definition. In my opinion
this would really defeat the purpose of syntactic checking if there were
multiple options given.
If we are going to play the 'cut and paste' game using prior statements,
here is the entire context of what I said regarding this topic, so that
you can see that this is what I said before as well:
> On 8/16/11 4:54 PM, thus spake Jason Zurawski:
> [snip]
>>> to my knowledge a parser can only verify against a
>>> single schema at any given time.
>> To my knowledge it is possible for a parser to validate against multiple
>> schema at the same time.
> In my experience (libxml, some older Java libraries) a single schema is
> loaded into the parser. It is possible to reference schema from each
> other, e.g. in relax:
>
>> include "something.rnc" {
>> # include things ...
>> }
> Trying to validate the same instance against different schemata
> simultaneously does not seem like a very fruitful exercise for a parser,
> unless there are multiple parsing passes being applied. If the latter
> is true, I would argue that more time is being spent in syntax checking
> than in the real guts of semantic evaluation.
To address your final concerns:
> 1. Be extensible
Yes, and the methods of NMC/perfSONAR we have been talking about all
along enable this.
> 2. It should be possible to create a specific validator for each
> relation type.
Schema is schema, you can construct whatever type of validation system
you wish to implement. I would question how far you would want to take
this exercise because there are tradeoffs that sacrifice other desirable
qualities. My statement from prior conversation still stands - if you
wish to do strict syntactic validation, to the point of trying to use
the parser as a semantic analyzer as well, you give up a portion of #1;
this is the tradeoff that must be considered.
For example:
a)
<relation type="something">
<link />
<link />
</relation>
vs.
b)
<somethingrelation>
<link />
<link />
</somethingrelation>
vs
c)
<something:relation type="something">
<link />
<link />
</something:relation>
I would argue that a) is our base, it is generic and minimal. It allows
the construction of any number of relationship types that are required
for most situations. Someone who needs something different/special,
that cannot be done in the base, has 2 choices: b) or c).
The b) option is the creation of a new element, something that *does
not* derive from the base, and therefore cannot be cast into something
different. This is not extension. For the simultaneous strict
syntactic/semantic checking done by the parser alone, this allows
someone to claim that the 'somethingrelation' is very much different
than the 'relation', and perhaps this is what they need.
The c) option is an extension namespace of the a) element. There is the
opportunity to try and downcast this into the original element and the
ability to add 'new' things that were not thought of in the base.
Syntactic checking has the ability to add *some* semantics in this case,
perhaps not as much as b). This is much more extensible, and I would
claim desirable, for NML. It is what is used in NMC/pS today.
> 3. Parsers should be able to recognise an unknown relation type as a
> relation subclass (rather then simply an unknown element)
Every parser is different in this respect, and I am not going to be able
to give you a concrete answer.
This is the exact reason why perfSONAR does not do strict syntactic
checking at the parser level, and favors the use of semantic checks in
the service itself. Relying on a strict schema that mandates syntax
does not foster extensibility.
There are two outcomes when an 'unknown' element comes in:
a) Strict syntactic checking in most cases will reject the entire
instance without comment. E.g. you have constructed your schema, and
the parser knows of some number of elements, each having a possible
namespace (or namespaces, depending on how the schema is constructed).
If an unknwon element comes in, many parsers will simply reject the
entire document. Certain types of event driven parsers may be able to
panic parse around something like this, but I do not have much
experience with them. I would estimate more time will be spent
constructing a special parser in this case just to work with the strict
schema than is healthy.
b) Semantic checking, what we have the most experience with, takes all
documents as is, does some combination of syntactic/semantic checking
within the service itself, and can be made as permissive as required for
certain situations. E.g. an unknown namespace on a common element (e.g.
relation) can be rejected, or it can be downcast into the base schema -
we normally do the latter).
Hope this all helps;
-jason
More information about the nml-wg
mailing list