[glue-wg] Some questions... [WAS: choosing XML Document structure for GLUE 2.0 rendering]
Paul Millar
paul.millar at desy.de
Mon Dec 17 09:10:23 CST 2007
Hi Sergio,
I've interleaved my comments below.
For the most part, the comments are mildly in favour of using xml:id; but I'm
concerned that the primary information XML will be unnecessarily hard for the
information providers.
On Wednesday 12 December 2007 01:11:15 Sergio Andreozzi wrote:
> Paul Millar ha scritto:
> > First, I see that one of the rules is ID is an element. [....]
> > [3] http://www.w3.org/TR/2005/REC-xml-id-20050909
>
> the previous version of the XML rendering proposal had the ID as
> attribute, then after a discussion in the last telecon we agreed to
> change it to element.
> I actually do not have a strong opinion on this.
Yes, I too don't feel this is a big issue. I think there's an opportunity to
use an existing standard. There might get some leverage if GLUE/XML uses the
attribute-based xml:id ID, but it's certainly not essential.
> As regards your
> references, the most interesting to me is [3]. I would say that we are
> not reinventing the wheel because we are doing something different.
> [3] defines a way to attach a unique ID (unique within an XML document)
> to an XML element.
True, although I think the emphasis with xml:id is providing a unique
reference point in a schema-type invariant way.
GLUE could define an attribute within its namespace (via XSD, as glue:ID, for
example). But, by using xml:id, a XML parser (that supports xml:id) can
infer the attribute has type unique-ID without having to understand the
definition in the DTD / XSD / RelaxNG etc. Because of this, simple
(non-validating) parsers can still identify xml:id as a "global document
identifier" and treat it accordingly.
The benefit for us is we don't have to provide DTD and XSD and RelaxNG and ...
for an XML parser to understand what xml:id "means". The GLUE/XML
implementation(s) may choose to provide these, but it's optional.
> We are defining a property of a Grid concept (ID)
> which is supposed to be globally unique and is a URI.From a semantical
> viewpoint they are different. They sit in different namespaces,
> therefore there should be no problem for that (if you see problems,
> please let me know).
I'm not sure I completely follow you here. The two are separate namespaces,
but have similar properties. So, isn't mapping the GLUE ID as xml:id a
choice GLUE/XML is free to make?
The GLUE Schema's ID attribute ("GLUE-ID" to prevent confusion) is a globally
unique URI: a unique "name" within any aggregation of valid GLUE.
(Presumably it's a URI to allow easy delegation of the namespace within a
distributed community.) The ID attribute is a required value for certain
GLUE components (Service, UserDomian, AdminDomain, ...)
The current GLUE/XML mapping (as far as I understand it) provides an XML
element for major GLUE grid component; in particular, those components that
require a GLUE-ID are represented as XML elements.
The XML attribute xml:id describes a globally unique string: a unique "name"
within any aggregation of valid XML. So, one can injectively map GLUE-ID
into xml:id; i.e., any valid GLUE-ID can be written as a unique, valid
xml:id.
Whilst there is no requirement for GLUE/XML to use xml:id (as you say, the two
are separate), there's also no reason not to. GLUE/XML mapping is free to
define that xml:id is to be used or (as currently) to use a schema-specific
declaration: the ID element.
Here is a list of the advantages and disadvantages I could think of:
Advantages of using xml:id
o it's the W3C recommended way of doing "this sort of thing."
o ID-like semantics are built into parsers that support xml:id (which might
not support more general validation),
o potential "reuse" of GLUE-ID with other XML software and standards,
o There is not GLUE-specific behavior when combining different GLUE XML
files: no need to hard-coded the value or derive behavior from some
DTD/XSD/...
o ..others? ..
Disadvantages of using xml:id:
o the mapping between GLUE-ID and xml:id is no surjective: there are valid
xml:id values that are not valid GLUE-ID values (does this matter?)
o xml:id is an attribute rather than an element.
o some issues with Canonical XML (although xml:id considers xml-c14n to be
broken in this and some other respects)
o .. others? ..
> > Is the plan to render (nearly) everything as elements rather than
> > attributes?
>
> in the last telecon, we agreed that we'll use attributes only for
> metadata-like properties (basically CreationTime and Validity, see Sec.
> 4.1 of the spec), while all the rest will be mapped to XML elements.
[Maybe section 4.2 ("metadata"), rather than 4.1.]
> > GLUE has many items have "required" (1) or "optional" (0..1) cardinality
> > and contain no further markup, so I feel they would, for the most part,
> > be better rendered as an XML attributes.
>
> given my experience, this choice is mainly a matter of style. Attributes
> can be only of simple types and single-value.
> Going for elements gives more flexibility for future changes and also is
> probably more usable (people don't have to remember which properties are
> single value, i.e. attributes or multi-value. i.e. elements when writing
> queries).
Sure, this isn't a big deal and is largely a matter of style. Always using
elements does tend to inflate the document size, which may matter when
providing a large amount of information.
There are some GLUE attributes that could probably be rendered as XML
attributes, but it's no big deal.
> > [Problem with primary producer having to know too much]
>
> the proposal is intended to be used by both primary services (e.g.,
> OGSA-BES, SRM) which want to advertise their characteristics and by
> information services (both primary publishers and aggregators).
> For primary services, the only constraint is to know the ID of their
> AdminDomain. That's all. They are not supposed to publish other
> AdminDomain attributes.
OK, but the example primary document "A" (P.A option, when voting) contained
more information that this: it showed a complete hierarchy, as if the service
were alone in the Grid.
> The AdminDomain ID will be used to perform the aggregation at the
> higher-level.
>
> The reason for which I prefer Option A is because it looks easier to
> make queries by AdminDomain (no need for join). And at the aggregation
> level, you have all info under a certain AdminDomain aggregated under a
> single element.
N.B. Here, I'm referring to my option P.O [4], where the primary information
is presented as a sub-tree of the full GLUE/XML. This is analogous to how
DocBook provides aggregation where files may (individually) contain a Book
(or Article), Part, Chapter, and so on. Aggregation happens through "other
means" (with DocBook this is typically via XInclude, with the toy example [4]
it is included in the XSLT)
[4] http://www.ogf.org/pipermail/glue-wg/2007-December/000249.html
I'm not sure I follow how it is easier to make queries: the queries (against
the complete, aggregated GLUE/XML infoset) are just as easy.
However, the problem I see with this is that if the storage-element were to
provide information that is directly queryable (with identical queries as the
final GLUE/XML) is the info-provider will needs to know its ancestor
hierarchy (parent, parent's parent, etc); specifically, how many domains (and
of what type) are "above" it.
For example, suppose a Tier-2 site has three AdminDomains within their
combined Domain, the final (aggregated) published XML would look like:
<Grid>
<Domain>
<Name>SCOTGRID</Name>
<Description>Scotland's distributed grid site</Description>
<!-- Further Domain-level information here -->
<AdminDomain>
<Name>SCOTGRID-GLA</Name>
<Description>The ScotGrid site at University of Glasgow</Description>
<!-- Further AdminDomain-level information here -->
<StorageService>
<!-- Further StorageService information here -->
<StorageResource>
<ID>glue://gla.scotgrid.ac.uk/SE</ID>
<Name>ScotGrid-GLA DPM instance</Name>
<ImplementationName>DPM</ImplementationName>
<!-- ...etc... -->
</StorageResource>
</StorageService>
</AdminDomain>
</Domain>
</Grid>
So, if I've understood the primary information "A" option (P.A.) correctly,
the storage service would publish XML like:
<Grid>
<Domain>
<AdminDomain>
<StorageService>
<!-- Further StorageService information here -->
<StorageResource>
<ID>glue://gla.scotgrid.ac.uk/SE</ID>
<Name>ScotGrid-GLA DPM instance</Name>
<ImplementationName>DPM</ImplementationName>
</StorageResource>
</StorageService>
</AdminDomain>
</Domain>
</Grid>
What's bad here is that the info-provider must know its hierarchy: that it
inside an AdminDomain, within inside a Domain. This is ugly; it should not
need to know this!
In contrast, a Tier-1 site might have no containing Domain. A storage service
must then publish information like:
<Grid>
<AdminDomain>
<StorageService>
<!-- Storage Service info here -->
</StorageService>
</AdminDomain>
</Grid>
An alternative (option P.O, see [4]) allows services to provide only the
information they know (by directly examining the software) and a hint
(the "site-level" GLUE-ID), this can be avoided.
In fact the "parent" back-link isn't needed: it just makes configuring the
site-level aggregation a little easier. One could configure parent-child
links explicitly (e.g. Services within AdminDomains) and avoid having to
specify the Parent within the child.
To me, this makes much more sense: each service is (genuinely) providing only
the information it knows.
Admin sites would aggregate (as with site-level BDIIs currently) and Domains
then aggregate from multiple AdminSites, as necessary.
> I don't know how MDS 4 performs aggregations at higher
> level and if this is compatible with its strategies. This is something
> to be investigated.
Yes, it would be interested to compare: I don't know too much about MDS-4
> > As an alternative, suppose One-to-Many relationships be represented as
> > either an XML element hierarchy [...]
>
> yep, this is an option as well. Many options are available. Probably, we
> should make one step back and clarify what we want to optimize.
> In my opinion, we should concetrate on giving the final user the easiest
> and more intuitive way to query the properties.
OK. I've two additional (friendly) amendments:
a. adjust this to:
"[easiest and most intuitive way to query] the final, aggregated GLUE/XML
Schema."
b. also add:
"make it easy for components to provide the necessary information."
> For sure, we need more experience on this with a number of queries to be
> written for different approaches.
> One advantage that I like of option A. is that a query would remain
> valid if you query either the primary source of information or the
> aggregated layer.
Whilst I agree this would be nice, do we have a use-case for users querying
the primary source of information directly?
I skimmed through the use-case document and searched for keywords
("primary", "source", "provider", etc..), but couldn't find any requirement
for end-users to query information providers directly.
Given the flexible hierarchy by (potentially) nesting an AdminDomain within
multiple Domains, this could be difficult to achieve without requiring that
primary sources of information know something of the global structure.
> Consider this for instance. A simple XPath to ask for a service which
> type is org.glite.wms part of a certain adminDomain:
>
> /glue:Grid/AdminDomain[ID='urn:admindomain:t1.infn.it']/Service[Type='org.g
>lite.wms']
[sorry, v. minor point: assuming GLUE provides an XML-namespace, wouldn't the
query have to specify the namespace-uri at each level?
/glue:Grid/glue:AdminDomain[glue:ID='urn:...']/glue:Service[glue:Type=...
]
> this query works both at the primary source level and aggregated level
> and is also quite simple to me.
Again, do we really need to provide a service where end-users can query the
information provided by the primary sources in an identical fashion to the
complete (aggregated) resource?
I understand it would be nice (mostly for debugging reasons), but I don't see
how this can be done without *every* primary info-provider within a Grid
knowing (at least something of) the grid structure, in order to provide the
correct XML documents. I feel this would be quite an inflexible solution.
> Of course, we need a larger set of queries to be used for evaluation.
I suspect that XPath will be sufficient to query the aggregated GLUE/XML: once
you get your head around XPath, it's pretty intuitive and friendly.
> > [XML Schema balance...]
>
> we are trying to find the right balance and mainly preserving easy of
> use. In the rules, I mentioned the option of SubstitutionGroups for
> completeness, but this is not the current selected option.
> At the moment, we prefer to go for the annotation option
>
[snip: agreement on simple XML design over complicated, strongly validing
design]
> Thanks for your constructive feedback. I hope we can dedicate one more
> call before XMas to XML rendering so that we can refine all these
> choices and align about the rationale behind them.
> Please, keep contributing as opinion from different perspectives help us
> to make better choices.
I'll do my best!
Cheers,
Paul.
More information about the glue-wg
mailing list