[glue-wg] XSD aligned to draft 27

Thu Mar 13 11:44:30 CDT 2008

Hi JP,

On Thursday 13 March 2008 15:52:56 JP Navarro wrote:
> I'm less familiar with the XML terminology you use, but I would
> second your suggestion using commoner terminology: we should be able to
> independently publish subsets of the GLUE schema hierarchy.  The ability to
> develop and run independent info-providers for subsets of information is a
> very useful design. Did I understand your proposal correctly?

More or less.

What I would like to avoid is that storage service infoProviders publish 
information like:

<Grid>
	<AdminDomain>
		<AdminDomain>
			<StorageService>
				<ImplementationName>foo</ImplementationName>
				<ImplementationVersion>1.0</ImplementationVersion>
				<!-- ...etc...  -->
			</StorageService>
		</AdminDomain>
	</AdminDomain>
</Grid>

as this requires the SE publisher to know it's part of a distributed Tier-2 
site (hence the two levels of AdminDomain elements).

That said, the current XSD doesn't seem to support nested AdminDomain 
elements, which would be needed to describe distributed sites.

An alternative would be for the SE to publish information like:

<StorageService>
	<ImplementationName>foo</ImplementationName>
	<ImplementationVersion>1.0</ImplementationVersion>
	<!-- ...etc...  -->
</StorageService>

and have the (site-level) aggregation happen at the site level, which would 
publish information like:

<AdminDomain>
	<Name>Example Site</Name>
	<Services>
		<ComputingService>
			<!-- CE information goes here -->
		</ComputingService>

		<StorageService>
			<ImplementationName>foo</ImplementationName>
			<ImplementationVersion>1.0</ImplementationVersion>
			<!-- ...etc...  -->
		</StorageService>
	</Services>
</AdminDomain>

The final aggregation would encapsulate multiple sites within a Grid element.

<Grid>
	<AdminDomain>
		<!-- Site 1 info -->
	</AdminDomain>

	<AdminDomain>
		<!-- Site 2 info -->
	</AdminDomain>
</Grid>

The disadvantage of this approach is one cannot query the primary SE info (the 
XML provided by the SE info provider) with exactly the same query one would 
use when querying top-level aggregation.  For example, to extract all 
StorageEndpoints for site Example Site, one could use the XPath:

/Grid/**/AdminDomain[Name='Example 
Site']/Services/StorageService/StorageEndpoint

Something like:

<xsl:styleshet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
		version="1.0">
	<xsl:variable name="href" select="http://glue.example.org/grid-glue"/>
	<xsl:variable name="site" select = "Example Site"/>

	<xsl:template match="/">
		<xsl:copy-of 
select="document($href)/Grid/**/AdminDomain[Name=$site]/Services/StorageService/StorageEndpoint"/>
	</xsl:template>
</xsl:stylesheet>

But, if one substituted the URI of the primary information (via the href 
variable), this particular query wouldn't work: the primary XML would not 
have the Grid and all AdminDomain elements.

I don't think this is a big deal, though: it makes sense that that query 
should return no replies when querying the SE info-provider directly, and 
there are other queries that would work (e.g., select all StorageEndpoints)

The advantage to publish with StorageService as the top-level element is that 
the SE info-provider need know nothing about the above Glue hierarchy.  This 
(should) simplify the info-provider and, at the same time, allow the same 
information to be (easily) published under different GLUE hierarchies.  For 
example, if a site is a member of more than one Grid.

To me, this advantage outweighs the disadvantage.

> One question this raises is how one binds or links these separately
> published subset documents to each other?  Would we need to introduce
> attributes in each subset that binds it to other related subsets?

I believe that, currently, how the documents are merged isn't defined.

One approach is to use XSLT to do the merging.  There's a (working) toy 
implementation that demonstrates that here:

http://www.ogf.org/pipermail/glue-wg/2007-December/000249.html

HTH,

Paul.