[glue-wg] Comments on GLUE Schema 2.0

Tue May 6 04:33:03 CDT 2008

Hi Parag,

On Friday 02 May 2008 20:24:21 Parag Mhashilkar wrote:
> Gabriele and I went through the GLUE document.

Thanks for your comments; they are are all appreciated.

I'm concentrating on comment 6. as Stephen has replied to the others.

> We have following comments/questions 
> [...] 
> 6. Appendix A: UNKNOWN data
> Is there a particular reason to have multiple UNDEFINED types?

Yes.

The overall idea is to have well-defined "unknown" values for 
that are specific to each attribute type (URI, integer, email address, etc...) 
that are both valid (so any conforming client software can parse the data) 
but have this precise additional semantic meaning.  This is to satisfy the 
two specific use-cases mentioned in the Appendix, but other use-cases may 
well exist.

There are three main reasons for having multiple "unknown" values:

1.  to allow the "unknown" value to propagate within the information system.  
This is desirable as it prevents a site (or a service) from simply 
disappearing when a single attribute is "unknown".  It also reduces the 
barriers to get something working and allows "intelligent agent" software to 
check for problems grid-wide (which has several advantages over deploying 
site-local checks).

Information systems may (and many *do*) implement validation of incoming data; 
this requires that any "unknown" value must be valid for that attribute type.  
Since no simple string is valid for all attribute types, there must be more 
than one "unknown" value.

2. to provide a hint of the correct form of the missing data.  With the first 
scenario (no sane default config. value), this provides a hint to the 
site-admin what the correct value should be; for example, if the unknown 
value looks like a FQDN, instead of a URI, the site-admin knows what is 
expected.

3. to allow a standard way of encoding of additional meaning within the 
unknown value.  People might want to specify why a particular value is 
unavailable (indicating what is causing the problem) or to provide additional 
hints to site-admin when configuring an info-provider.  Trivially, this 
encoding of additional information requires there to be multiple unknown 
values.

> Can't we 
> just have UNDEFINED instead of UNDEFINEDVALUE, UNDEFINEDPATH,
> UNDEFINEDUSER, etc? Are we really buying anything having multiple
> UNDEFINED types?

 "UNDEFINED" is fine for an ASCII/UTF-8 string, but is invalid if referring 
to:
	a. an absolute path (doesn't start "/" or "\")
	b. a FQDN
		(or, at least, not very helpful.  See RFC 2606 discussion on "example" FQDNs
		and the invalid TLD)
	c. an email address
	d. a URI
	e. an IPv4 (or v6) address
	f. an integer value
	g. longitude or latitude
	etc

(we went with "UNDEFINEDVALUE" to remain consistent 
with "UNDEFINEDUSER", "UNDEFINEDPATH", etc.)

The information system may simply reject some (or all) of the provided 
information as it does not validate correctly.  This will lead to difficult 
to debug situations where it isn't clear what is wrong, only that information 
isn't getting through (perhaps with a baffling validation error message).

Using "UNKNOWN" for a URI attribute as a specific example, some GLUE 
implementations might allow this value to propagate whilst others would not 
(c.f. URI as a datatype in SQL and XML-Schema).  If the value is propagated 
to the client software then the observed behaviour will be 
implementation-specific: the URI parsers should reject the "UNKNOWN" string 
as invalid (as per RFC 3986).  This would require client software to also 
handling invalid entries independent of their main code, increasing 
complexity of the client software.

A related issue is that, for some attribute types, "UNKNOWN" is simple 
unrepresentable; for example, if the attribute is a counter (represented as a 
32-bit integer), how does one represent the UTF-8 string "UNKNOWN" ?

I hope this helps explain the motivation for Appendix A.

Cheers,

Paul.