[Nml-wg] Multiple namespaces

Tue Aug 23 08:01:06 CDT 2011

Hi Freek;

Answers inline:

On 8/23/11 5:36 AM, thus spake Freek Dijkstra:
> Jason Zurawski wrote:
>
>>> Last week's mail conversation drifted from XML syntax for NML relations
>>> to the use of namespaces in NML messages.
>>>
>>> An important difference in view was identified.
>>> Jason assumed that a single NML messages would only contain one namespace.
>>
>> I never said nor implied this in any way
>
> Sorry if you feel I jumped to conclusions. You indeed only wrote:
>
>> to my knowledge a parser can only verify against a
>> single schema at any given time.
>
> Perhaps we still need to take a few steps back.
>
> Do you think that a NML messages may contain multiple namespaces?
>
> Do you agree with the following requirement I wrote earlier:
> 1. Be extensible
> 2. It should be possible to create a specific validator for each
> relation type.
> 3. Parsers should be able to recognise an unknown relation type as a
> relation subclass (rather then simply an unknown element)
>
>
> If you have time to phone today, that would be great.

You are conflating several concepts, and using them interchangeably.  I 
believe this is what is bringing in confusion.  To be clear, I am going 
to ask once again that you please (*please*) attempt to read some of the 
prior art from NMC/perfSONAR.  The reason I keep bringing this up is two 
fold:

  a) the examples are short, and easy to understand.  Instead of going 
around and around on email we could make up a lot of ground starting 
from known examples.

  b) it is working in practice today, and mimics the needs of NML in the 
extensibility space

Consider this "schema file":

https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/schema/SetupDataRequest-utilization.rnc

It represents the construction of one type of message (e.g. the 
"SetupDataRequest" message, specifically for utilization data).  Note 
some interesting things about it:

  - It represents a single 'schema', e.g. it is one file that contains 
the definitions to verify one specific message type only.

  - It incorporates several other 'schema' definitions through the 
methods of inclusion (e.g. 'include xxx { ... }' )

  - It features *several* namespaces, and elements in this same 'schema' 
file (or the other files) may use these namespaces

  - Example instances that can be verified against this schema can be 
found here:

https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/requests/SetupDataRequest-utilization-1.xml
https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/requests/SetupDataRequest-utilization-2.xml
https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/requests/SetupDataRequest-utilization-3.xml
https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/requests/SetupDataRequest-utilization-4.xml
https://svn.internet2.edu/svn/perfSONAR-PS/trunk/perfSONAR_PS-SNMPMA/etc/requests/SetupDataRequest-utilization-5.xml

To address your concerns above: a parser, and when I say parser I am 
imagining something like libxml, is allowed to verify an instance 
against one "schema file" at at time.  This schema file may feature 
'includes', thus expanding the available definition space, but there are 
not options (at least in my experience) that allow the programmer to 
give the parser some set of files and let the parser know that *"any"* 
of the possible files may contain the correct definition.  In my opinion 
this would really defeat the purpose of syntactic checking if there were 
multiple options given.

If we are going to play the 'cut and paste' game using prior statements, 
here is the entire context of what I said regarding this topic, so that 
you can see that this is what I said before as well:

> On 8/16/11 4:54 PM, thus spake Jason Zurawski:
> [snip]
>>> to my knowledge a parser can only verify against a
>>> single schema at any given time.
>> To my knowledge it is possible for a parser to validate against multiple
>> schema at the same time.
> In my experience (libxml, some older Java libraries) a single schema is
> loaded into the parser.  It is possible to reference schema from each
> other, e.g. in relax:
>
>> include "something.rnc" {
>> # include things ...
>> }
> Trying to validate the same instance against different schemata
> simultaneously does not seem like a very fruitful exercise for a parser,
> unless there are multiple parsing passes being applied.  If the latter
> is true, I would argue that more time is being spent in syntax checking
> than in the real guts of semantic evaluation.

To address your final concerns:

> 1. Be extensible

Yes, and the methods of NMC/perfSONAR we have been talking about all 
along enable this.

> 2. It should be possible to create a specific validator for each
> relation type.

Schema is schema, you can construct whatever type of validation system 
you wish to implement.  I would question how far you would want to take 
this exercise because there are tradeoffs that sacrifice other desirable 
qualities.  My statement from prior conversation still stands - if you 
wish to do strict syntactic validation, to the point of trying to use 
the parser as a semantic analyzer as well, you give up a portion of #1; 
this is the tradeoff that must be considered.

For example:

a)
<relation type="something">
   <link />
   <link />
</relation>

vs.

b)
<somethingrelation>
   <link />
   <link />
</somethingrelation>

vs

c)
<something:relation type="something">
   <link />
   <link />
</something:relation>

I would argue that a) is our base, it is generic and minimal.  It allows 
the construction of any number of relationship types that are required 
for most situations.  Someone who needs something different/special, 
that cannot be done in the base, has 2 choices: b) or c).

The b) option is the creation of a new element, something that *does 
not* derive from the base, and therefore cannot be cast into something 
different.  This is not extension.  For the simultaneous strict 
syntactic/semantic checking done by the parser alone, this allows 
someone to claim that the 'somethingrelation' is very much different 
than the 'relation', and perhaps this is what they need.

The c) option is an extension namespace of the a) element.  There is the 
opportunity to try and downcast this into the original element and the 
ability to add 'new' things that were not thought of in the base. 
Syntactic checking has the ability to add *some* semantics in this case, 
perhaps not as much as b).  This is much more extensible, and I would 
claim desirable, for NML.  It is what is used in NMC/pS today.

> 3. Parsers should be able to recognise an unknown relation type as a
> relation subclass (rather then simply an unknown element)

Every parser is different in this respect, and I am not going to be able 
to give you a concrete answer.

This is the exact reason why perfSONAR does not do strict syntactic 
checking at the parser level, and favors the use of semantic checks in 
the service itself.  Relying on a strict schema that mandates syntax 
does not foster extensibility.

There are two outcomes when an 'unknown' element comes in:

a) Strict syntactic checking in most cases will reject the entire 
instance without comment.  E.g. you have constructed your schema, and 
the parser knows of some number of elements, each having a possible 
namespace (or namespaces, depending on how the schema is constructed). 
If an unknwon element comes in, many parsers will simply reject the 
entire document.  Certain types of event driven parsers may be able to 
panic parse around something like this, but I do not have much 
experience with them.  I would estimate more time will be spent 
constructing a special parser in this case just to work with the strict 
schema than is healthy.

b) Semantic checking, what we have the most experience with, takes all 
documents as is, does some combination of syntactic/semantic checking 
within the service itself, and can be made as permissive as required for 
certain situations.  E.g. an unknown namespace on a common element (e.g. 
relation) can be rejected, or it can be downcast into the base schema - 
we normally do the latter).

Hope this all helps;

-jason