[glue-wg] Comments on GLUE Schema 2.0
Paul Millar
paul.millar at desy.de
Tue May 6 04:33:03 CDT 2008
Hi Parag,
On Friday 02 May 2008 20:24:21 Parag Mhashilkar wrote:
> Gabriele and I went through the GLUE document.
Thanks for your comments; they are are all appreciated.
I'm concentrating on comment 6. as Stephen has replied to the others.
> We have following comments/questions
> [...]
> 6. Appendix A: UNKNOWN data
> Is there a particular reason to have multiple UNDEFINED types?
Yes.
The overall idea is to have well-defined "unknown" values for
that are specific to each attribute type (URI, integer, email address, etc...)
that are both valid (so any conforming client software can parse the data)
but have this precise additional semantic meaning. This is to satisfy the
two specific use-cases mentioned in the Appendix, but other use-cases may
well exist.
There are three main reasons for having multiple "unknown" values:
1. to allow the "unknown" value to propagate within the information system.
This is desirable as it prevents a site (or a service) from simply
disappearing when a single attribute is "unknown". It also reduces the
barriers to get something working and allows "intelligent agent" software to
check for problems grid-wide (which has several advantages over deploying
site-local checks).
Information systems may (and many *do*) implement validation of incoming data;
this requires that any "unknown" value must be valid for that attribute type.
Since no simple string is valid for all attribute types, there must be more
than one "unknown" value.
2. to provide a hint of the correct form of the missing data. With the first
scenario (no sane default config. value), this provides a hint to the
site-admin what the correct value should be; for example, if the unknown
value looks like a FQDN, instead of a URI, the site-admin knows what is
expected.
3. to allow a standard way of encoding of additional meaning within the
unknown value. People might want to specify why a particular value is
unavailable (indicating what is causing the problem) or to provide additional
hints to site-admin when configuring an info-provider. Trivially, this
encoding of additional information requires there to be multiple unknown
values.
> Can't we
> just have UNDEFINED instead of UNDEFINEDVALUE, UNDEFINEDPATH,
> UNDEFINEDUSER, etc? Are we really buying anything having multiple
> UNDEFINED types?
"UNDEFINED" is fine for an ASCII/UTF-8 string, but is invalid if referring
to:
a. an absolute path (doesn't start "/" or "\")
b. a FQDN
(or, at least, not very helpful. See RFC 2606 discussion on "example" FQDNs
and the invalid TLD)
c. an email address
d. a URI
e. an IPv4 (or v6) address
f. an integer value
g. longitude or latitude
etc
(we went with "UNDEFINEDVALUE" to remain consistent
with "UNDEFINEDUSER", "UNDEFINEDPATH", etc.)
The information system may simply reject some (or all) of the provided
information as it does not validate correctly. This will lead to difficult
to debug situations where it isn't clear what is wrong, only that information
isn't getting through (perhaps with a baffling validation error message).
Using "UNKNOWN" for a URI attribute as a specific example, some GLUE
implementations might allow this value to propagate whilst others would not
(c.f. URI as a datatype in SQL and XML-Schema). If the value is propagated
to the client software then the observed behaviour will be
implementation-specific: the URI parsers should reject the "UNKNOWN" string
as invalid (as per RFC 3986). This would require client software to also
handling invalid entries independent of their main code, increasing
complexity of the client software.
A related issue is that, for some attribute types, "UNKNOWN" is simple
unrepresentable; for example, if the attribute is a counter (represented as a
32-bit integer), how does one represent the UTF-8 string "UNKNOWN" ?
I hope this helps explain the motivation for Appendix A.
Cheers,
Paul.
More information about the glue-wg
mailing list