[glue-wg] Values with additional sematic meaning.
Paul Millar
paul.millar at desy.de
Wed Dec 5 06:18:36 CST 2007
Hi all,
One of the complaints I've heard from people is that GLUE schema doesn't have
a mechanism for knowing that certain results are invalid. In my "spare
time", I put together some material that might be of use here.
Of course, this is just a first draft; comments gratefully appreciated.
Cheers,
Paul.
PS.
my thanks to Graeme Steward for his comments and suggestions.
-------------
GLUE proposal on values with additional semantic meaning.
----
Introduction
---
GLUE Schema provides a common method of providing information about a
Grid such that specific knowledge of that Grid is not needed to
appreciate its current state.
The current GLUE schema assumes that all information is available and
that any static information has been correctly configured. This is not
always true: for whatever reason, information may be temporary or more
permanently unavailable or some static information may be left unconfigured.
These imply that some additional (semantic) meaning of values is
needed. This is most naturally accommodated by returning a special (or
"illegal") value: a value that should not occur during normal
operation.
This is already in practise; however it is not specified within GLUE, so
different implementations can choose different values. Knowledge of whether
a particular value is valid then becomes information-provider specific, going
against the spirit of the GLUE Schema.
This document aims to rationalises and standardises values that have some
additional semantic meaning. It is split into two further sections: use-cases
and proposed values.
Use-cases:
---
Two scenarios are considered:
Scenario 1. a static value has no (good) default value and has not
been configured.
Some static values have no good default value. This may be because
the configuration requires some knowledge of a site's configuration or
for some other reason. Whilst this value should be altered to reflect
the site's configuration this might not happen[1], so exposing the
application's default configuration. Given there is no (good) default
value, what should this value be?
[1] We note the available of tools such as YAIM to assist in
configuration of EGEE sites, which will reduce the likelihood of this
scenario. However, 1. not all sites are configured with YAIM, 2. not
all Grids have a tool like YAIM, 3. Upgrading components may result in
changes in configuration, so increasing the likelihood of this
problem.
Scenario 2. information provider is unable to obtain a dynamic value.
A dynamic value is provided by an information provider by querying the
Grid resource. This query will use a number of ancillary resources
(e.g., DNS, network hardware) that might fail; the service itself
might fail. Given a lack of information[2], what value should the
information provider return?
[2] If caching of previous results is available, temporary failures
may be mitigated. However, it is an open question for how long any
such cached information should be permitted.
Proposed illegal values:
---
This section describes a number of values that can be represented
within a given address space (e.g., UTF-8, Integers, FQDNs, IPv4
address space).
With Scenario 1, configuration SHOULD use these default values
wherever possible; likewise, dynamic information provides SHOULD use
these default values whenever they wish to indicate a problem
gathering information.
The semantic meaning SHOULD BE limited to simply that the value is
invalid and is not to be relied upon. A client using this information
SHOULD NOT draw any conclusions as to why the information is invalid
from the information presented here.
1. All GLUE-specific enumerated types:
SHOULD use the designated "unknown" value.
Rational:
If a value is unknown, it should be specified as such. Either this
or the GLUE schema clearly state that, should some values be unknown
that the whole entry should not be reported.
2. Simple strings (UTF-8): SHOULD use "ILLEGALVALUE".
Rational:
Upper-case letters make it easier to spot and a single word avoids
white-space issues.
This is a default option for strings that have no widely-known
structure. If a value is from a more restrictive sub-type
(e.g. FQDNs), then the rules for more restrictive form SHOULD be
used.
3. Fully qualified domain names: SHOULD use a hostname ending
"invalid." for scenario 2. and either "example.org."
or "example.com." for scenario 1.
Rational:
RFC 2606 reserves the ".invalid" TLD as always invalid and clearly
so. For dynamic information a value of "unknown.invalid." MAY be
used. If an alternative is used, it SHOULD use the TLD ".invalid".
RFC 2606 also defines the ".example" TLD and two second-level
domains: "example.org" and "example.com". These domains have the
advantage of ending with a recognisable TLD, so looking like a DNS
name. Default configuration SHOULD use DNS names that end
".example.com." or ".example.org."
The final dot at the end of the DNS name SHOULD be included as it
prevents local DNS expansion.
4. IPv4 addr: SHOULD use 192.0.2.250
Rational:
There are several portions of IPv4 addresses that should not appear
on a network, but none that are documented as being illegal. Using
an arbitrary address leads to the risk of side-effects.
The best option is an IP address from the 192.0.2.0/24 subnet. This
subnet is defined in RFC 3330 as "TEST-NET" for use in documentation
and example code. Although any IPv4 address from 192.0.2.0/24 MAY
be used, the above address SHOULD use for consistency.
5. IPv6 addr: SHOULD use 2001:DB8::FFFF
Rational:
There is no documented illegal IPv6 address. RFC 3849 reserves the
address prefix 2001:DB8::/32 for documentation. For consistency,
the address SHOULD BE the one noted above.
6. Counting/Natural numbers:
SHOULD use 0
Rational:
Counting- (also known as Natural-) numbers exclude the number zero,
so information provider SHOULD use this value to indicate an illegal
value.
7. Integers:
SHOULD use MAXINT (maximum value representable in the domain).
For int32 this is 2,147,483,647
" uint32 this is 4,294,967,295
" int64 this is 9,223,372,036,854,775,807
" uint64 this is 18,446,744,073,709,551,615
Rational:
For non-negative integers, all numbers expressible within the
encoding (int32/uint32/etc...) are valid so there is no safe choice.
Although any value may be chosen, the value least likely to be
encountered in any given domain is that domain's maximum value.
If an unsigned integer is encoded as a signed integer, it is
possible to use negative numbers safely. However, these numbers
will be unrepresentable if the number is stored as an unsigned
integer.
For values that might be a positive, zero or a negative integer, all
numbers expressible within the encoding (unsigned int) are valid so
there is no safe choice. For consistency, the MAXINT value SHOULD
be used.
Information providers SHOULD NOT attempt to conveying further
semantic distinction by using more than one illegal number.
8. Filenames:
SHOULD use "ILLEGALPATH".
Rational:
As with the simple string, a single upper-case word is recommended.
9. Uniform Resource Identifier (URI): schema-specific
Rational:
RFC 3986 defines URIs as a "federated and extensible naming system."
All URIs start with a schema-name part and no schema-name has been
reserved for illegal or example values. For any given URI schema,
it may be possible to define an illegal value within that
name-space. If a value has only one valid schema, the illegal value
should be taken from that schema. If several schemata are possible,
one should be chosen.
For schemata that include a reference to an Internet host
(e.g. http, httpg, mailto), an illegal value SHOULD be derived by
using an illegal FQDN (see above).
For other schemata, some element should indicate that the value is
illegal. This is subject to further work.
10. X509 Distinguished Names:
SHOULD include a RDN of CN=ILLEGALUSER
Rational:
X509 uses a X500 namespace, represented as several RDNs concatinated
by commas. The final RDN is usually a single common name (CN).
It is possible for more than one CN to be present, allowing
inclusion of additional sematic meaning. This is outwith the scope
of the document.
Definition of words:
---
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are meant as described in RFC 2119. A brief summary is given
here.
1. MUST (or "REQUIRED" or "SHALL") means that no deviation is allowed
from conforming software.
2. MUST NOT (or "SHALL NOT") means complete prohibition of this
behaviour with conforming software.
3. SHOULD (or "RECOMMENDED") means that there may be reasons why
conforming software does not to adopt this behaviour, but all the
effects of an alternative behaviour must be understood and
considered before choosing a different course.
4. SHOULD NOT (or "NOT RECOMMENDED") means that there may be reasons
why conforming software adopts this behaviour, but all the
effects of an alternative behaviour must be understood and
considered before choosing a different course.
5. MAY (or "OPTIONAL") means an item is completely optional.
More information about the glue-wg
mailing list