[glue-wg] Updated version of appendix D.
Paul Millar
paul.millar at desy.de
Thu Jan 24 06:50:28 CST 2008
Hi all,
I've tried to incorporate the comments from the meeting in the new version of
Appendix D. Specifically:
o included Stephen's ideas on how to embed additional information,
o updated the section on whether metrics are required or not (para 2 of
intro),
o added the initial slash to the filepath,
o included multiple examples,
o removed the section on counting numbers,
o added a new section on email addresses (previously missing),
o tidied up the sections on URIs and DNs,
o changed the integer section so it's using "all nines" instead
(inc. a mention of Benford's law).
Please let me know if I've missed anything.
Cheers,
Paul.
---
Appendix D : place-holder values for unknown data (v1.0)
----
Introduction
---
Whilst people endeavour to provide accurate information, there may be
situations where specific GLUE values may be assigned place-holder (or
dummy) values. These place-holder values carry some additional
semantic meaning; specifically, that the correct value is currently
unknown and the presented value should be ignored. This appendix
describes a recommended set of place-holder values to use.
Some metrics within the GLUE schema are required whilst others are
optional. If the metric is optional and the corresponding information
is unavailable, the information provider may choose to publish a
place-holder or it may choose not to publish the metric. If the
metric is required, then the information must either publish a
place-holder value or refrain from publishing the GLUE object.
If a place-holder value is published, it must conform to the scheme
described in this appendix. This is to increase the likelihood that
software will understand the nature of the information it receives.
To avoid confusion, these place-holder values have be chosen so they
are obvious "wrong" to humans, unlikely to occur under normal
operation and valid within the metric type. This also allows for
detection of failing information provider components.
Use-cases:
---
There are two principle use-cases for place-holder values, although
others may exist.
Scenario 1. a static value has no good default value and has not been
configured for a particular site.
Some provisions for GLUE Schema provide templates. These templates
may contain static values that have no good default value; for
example, a value may require some detailed knowledge of a site.
Whilst there may be the expectation that value be configured it is
possible that this did not happen, so exposing the application's
default configuration.
Scenario 2. information provider is unable to obtain a dynamic value.
A dynamic value is provided by an information provider by querying the
underlying grid resources. This query will use a number of ancillary
resources (e.g., DNS, network hardware) that might fail; the grid
services might also fail. If a metric is required and the current
value is unobtainable, a place-holder must be used.
Place-holder values:
---
This section describes a number of values that can be represented
within a given address space (e.g., UTF-8, Integers, FQDNs, IPv4
address space). Each of the different types are introduced along with
the proposed value and a brief discussion on the rational and any
other considerations.
1. Simple strings (ASCII/UTF-8) should use "UNDEFINEDVALUE" or should
start "UNDEFINEDVALUE:"
Upper-case letters make it easier to spot and a single word avoids
any white-space issues.
A short error message can be incorporated into the message by
appending the message after the colon.
Examples:
UNDEFINEDVALUE
UNDEFINEDVALUE: Unable to contact torque daemon.
Using UNDEFINEDVALUE is a default option for strings that have no
widely-known structure. If a value is of a more restrictive
sub-type (e.g., FQDNs, FQANs) described below, then the rules for
more restrictive form must be used.
2. Fully qualified domain names: must use a hostname ending either
"example.org" for scenario 1, or "invalid" for scenario 2.
RFC 2606 reserves the "invalid" Top-Level-Domain (TLD) as always
invalid and clearly so. For dynamic information gathering, a value
ending "invalid" must be used. It is recommended that this is
"unknown.invalid" be used unless the class of machine is known.
RFC 2606 also defines two second-level domains: "example.org" and
"example.com". These domains have the advantage of ending with a
recognisable TLD, so are recognisable as a DNS name. Default
configuration (scenario 1, above) must use DNS names that end
"example.com" or "example.org"
Additional information can be included by specifying a prefix to the
more broad part; for example, "your-CE" can be appended to
"example.org" in a configuration file to form "your-CE.example.org".
This may be used to specify the class of machine that should be
present.
Examples:
www.example.org
your-CE.example.org
unknown.invalid
site-local-BDII.invalid
3. IPv4 addr: should use 192.0.2.250
There are several portions of IPv4 addresses that should not appear
on a network, but none that are reserved for documentation or to
specify a non-existent address. Using any address leads to the risk
of side-effects, should this value be used.
The best option is an IP address from the 192.0.2.0/24 subnet. This
subnet is defined in RFC 3330 as "TEST-NET" for use in documentation
and example code. For consistency, the value 192.0.2.250 must be
used.
5. IPv6 addr: should use 2001:DB8::FFFF
There is no documented undefined IPv6 address. RFC 3849 reserves the
address prefix 2001:DB8::/32 for documentation. For consistency,
the address 2001:DB8::FFFF must be used.
6. Integers: must use "all nines"
For uint32/int32 this is 999,999,999
" int64/int64 this is 999,999,999,999,999,999
For integers, all numbers expressible within the encoding
(int32/uint32/etc...) are valid so there is no safe choice.
If an unsigned integer is encoded as a signed integer, it is
possible to use negative numbers safely. However, these numbers
will be unrepresentable if the number is stored as an unsigned
integer. For this reason a negative number place-holder must not be
used.
The number was chosen for three reasons. First, metric scales are
often chosen to reduce the likelihood of overflow: numbers towards
MAXINT (the large number representable in an integer domain) are
less likely to appear. Second, repeated numbers stand out more
clearly to humans. Finally, the statistical frequency of measured
values often follows Benford's law, which indicates that numbers
starting with "1" occur far more frequently than those starting with
"9" (about six times the probability). For these reasons,
information providers must use all-nines to indicate an unknown
value.
7. Filepath: must start either "/UNDEFINEDPATH" or "\UNDEFINEDPATH".
As with the simple string, a single upper-case word is recommended.
The initial slash indicates that the value is a path.
Implementations must use whichever slash is most appropriate for the
corresponding system (Unix-like systems use a forward-slash).
Software should accept either value as an unknown-value
place-holder.
Additional information can be encoded as data beyond the initial
UNDEFINEDPATH, separated by the same slash as started the value.
Additional comments should not use any of the following characters:
\ [ ] ; = " \ : | , * .
Examples:
/UNDEFINEDPATH
\UNDEFINEDPATH
/UNDEFINEDPATH/Broker unavailable
8. Email addresses: must use an undefined FQDN for the domain.
RFC 2822 defines emails addresses to have the form:
<local-part> '@' <domain>
The <domain> must be an undefined FQDN; see above for a complete
description. For email addresses, information providers should use
"example.org" for scenario 1. and "unknown.invalid" for scenario 2.
The <local-part> may be used to encode a small amount of additional
information, for example, the class of user to whom the email
address should be delivered. If no such information is to be
encoded the value "user" should be used.
Examples:
site-local-contact at example.org
local-admin at example.org
user at unknown.invalid
9. Uniform Resource Identifier (URI): schema-specific
RFC 3986 defines URIs as a "federated and extensible naming system."
All URIs start with a schema-name part and no schema-name has been
reserved for undefined or documenting example values.
For any given URI schema ("http", for example), it may be possible
to define an undefined value within that name-space. If a GLUE
value has only one valid schema, the undefined value must be taken
from that schema. If several schemata are possible, one must be
chosen from the available options, which should be the most commonly
used.
Take care with the URI encoding. All unknown URI values must be
valid URIs. If additional information is included, it must be
encoded so the resulting URI is valid.
For schemata that include a FQDN (e.g., a reference to an Internet
host), an undefined URI must use an undefined FQDN; see above for
details on undefined FQDNs.
URI schemata that reference a remote file (e.g., "http", "https"),
additional information may be included as the path. The FQDN
indicates that the value is a place-holder, indicating an unknown
value, so information providers need not specify "UNDEFINEDPATH".
For "file" URIs, the path part must identify the value as unknown
and must use the forward-slash variant; see above for details on
undefined paths.
For "mailto" URIs [RFC 2368] encapsulates valid email addresses with
additional information (such as email headers and message body).
Unknown mailto URIs must use an unknown email address (see above).
Any additional information must be included in the email body.
There may be other schemata in use that are not explicitly covered
in this section. A place-holder value should be agreed upon within
whichever domain such schemata are used. This place-holder value
should be in the spirit of the place-holder values described so far.
Examples:
http://www.example.org/
httpg://your-CE.example.org/path/to/end-point
mailto:site-admin at example.org
mailto:user at maildomain.invalid?body=Problem%20connecting%20to%20RB
file:///UNDEFINEDPATH
file:///UNDEFINEDPATH/path%20to%20some%20directory
10. X509 Distinguished Names: must include a RDN of CN=UNDEFINEDUSER
X509 uses a X500 namespace, represented as several Relative
Domain-Names (RDNs) concatenated by forward-slashes. The final RDN
is usually a single common name (CN), although multiple CNs are
allowed.
Unknown DN values must have at least two entries: an initial O=Grid
followed immediately by CN=UNDEFINEDUSER.
Additional information can be encoded using extra CN entries. These
must come after CN=UNDEFINEDUSER.
Examples:
/O=Grid/CN=UNDEFINEDUSE
/O=Grid/CN=UNDEFINEDUSER/CN=Your Grid certificate DN here
/O=Grid/CN=UNDEFINEDUSER/CN=Cannot access SE
Definition of words:
---
The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" in this document are used
deliberately and take their meaning from RFC 2119. A brief summary is
given here.
1. MUST (or "REQUIRED") means that no deviation is allowed from
conforming software.
2. MUST NOT means complete prohibition of this behaviour with
conforming software.
3. SHOULD (or "RECOMMENDED") means that there may be reasons why
conforming software does not to adopt this behaviour, but all the
effects of an alternative behaviour must be understood and
considered before choosing a different course.
4. SHOULD NOT (or "NOT RECOMMENDED") means that there may be reasons
why conforming software adopts this behaviour, but all the
effects of an alternative behaviour must be understood and
considered before choosing a different course.
5. MAY (or "OPTIONAL") means an item is completely optional.
More information about the glue-wg
mailing list