[glue-wg] Draft version of "undefined values" appendix

Paul Millar paul.millar at desy.de
Tue Jan 22 09:10:12 CST 2008


Hi all,

Here is the draft version of the appendix.  I've tried to incorporate the 
comments made at the last meeting, cut down the word-count slightly and 
changed several mentions of "illegal" to "undefined".

Cheers,

Paul.


Appendix D : place-holder values for unknown data.
----


Introduction
---

Whilst people endeavor to provide accurate information, there may be
situations where specific GLUE values may be assigned place-holder (or
dummy) values.  These place-holder values carry some additional
semantic meaning; specifically, that the correct value is currently
unknown and the presented value should be ignored.  This appendix
describes a recommended set of place-holder values to use.

GLUE makes no requirement on how much data is published and renderings
(of GLUE Schema into concrete type, such as LDAP) may allow publishing
of partial information.  However, there may be situations where a
value is provided that should not be considered valid.  Two likely
scenarios are discussed below, although others may exist.

To avoid confusion, these values have be chosen so they are unlikely
to occur under normal operation, whilst still being valid values.
Wherever possible, best-practice has been followed.  It is not a
requirement to use place-holder values in general or to use these
values in particular; however, if place-holder values are used, using
these values increases the likelihood that others will understand that
the value are to be ignored.


Use-cases:
---

Scenario 1. a static value has no good default value and has not been
configured for a particular site.

Some provisions for GLUE Schema provide templates.  These templates
may contain static values that have no good default value; for
example, a value might require some detailed knowledge of a site.
Whilst there may be the expectation that value be configured it is
possible that this did not happen, so exposing the application's
default configuration.


Scenario 2. information provider is unable to obtain a dynamic value.

A dynamic value is provided by an information provider by querying the
underlying grid resources.  This query will use a number of ancillary
resources (e.g., DNS, network hardware) that might fail; the grid
services might also fail.  If the system collecting the information
requires a value, a place-holder will be required.


Place-holder values:
---

This section describes a number of values that can be represented
within a given address space (e.g., UTF-8, Integers, FQDNs, IPv4
address space).  Each of the different types are introduced along with
the proposed value and a brief discussion on the rational and any
other considerations.

1. Simple strings (ASCII/UTF-8) should use "UNDEFINEDVALUE".

  Upper-case letters make it easier to spot and a single word avoids
  any white-space issues.

  This is a default option for strings that have no widely-known
  structure.  If a value is of a more restrictive sub-type
  (e.g. FQDNs, FQANs), then the rules for more restrictive form should
  be used.


2. Fully qualified domain names: should use a hostname ending either
	"example.org" or "example.com" for scenario 1, or "invalid"
	for scenario 2.

  RFC 2606 reserves the ".invalid" Top-Level-Domain (TLD) as always
  invalid and clearly so.  For dynamic information a value of
  "unknown.invalid." may be used.  If an alternative is used, it
  should use the TLD ".invalid".

  RFC 2606 also defines the ".example" TLD and the two second-level
  domains: "example.org" and "example.com".  These domains have the
  advantage of ending with a recognisable TLD, so are more immediately
  recognisable as a DNS name.  Default configuration should use DNS
  names that end ".example.com" or ".example.org"


3. IPv4 addr: should use 192.0.2.250

  There are several portions of IPv4 addresses that should not appear
  on a network, but none that are reserved for a non-existent address.
  Using an arbitrary address leads to the risk of side-effects.

  The best option is an IP address from the 192.0.2.0/24 subnet.  This
  subnet is defined in RFC 3330 as "TEST-NET" for use in documentation
  and example code.  Although any IPv4 address from 192.0.2.0/24 may
  be used, the recommended address above should use for consistency.


5. IPv6 addr: should use 2001:DB8::FFFF

  There is no documented undefined IPv6 address.  RFC 3849 reserves the
  address prefix 2001:DB8::/32 for documentation.  For consistency,
  the address SHOULD BE the one noted above.


6. Counting/Natural numbers: should use 0

  Counting- (also known as Natural-) numbers exclude the number zero,
  so information provider may use this value to indicate an undefined
  value.

  Some counting-number metrics include 0 as a valid value.  If so, the
  metric should be considered an Integer and zero not be used.


7. Integers: should use MAXINT (maximum value representable in the
             domain).
		For int32 this is 2,147,483,647
		 "  uint32 this is 4,294,967,295
		 "  int64 this is 9,223,372,036,854,775,807
		 "  uint64 this is 18,446,744,073,709,551,615

  For non-negative integers, all numbers expressible within the
  encoding (int32/uint32/etc...) are valid so there is no safe choice.
  Although any value may be chosen, perhaps the value least likely to
  be encountered in any given domain is that domain's maximum value.

  If an unsigned integer is encoded as a signed integer, it is
  possible to use negative numbers safely.  However, these numbers
  will be unrepresentable if the number is stored as an unsigned
  integer.

  For values that might be a positive, zero or a negative integer, all
  numbers expressible within the encoding (unsigned int) are valid so
  there is no safe choice.  For consistency, the MAXINT value should
  be used.

  Information providers should not attempt to conveying further
  semantic distinction, for example by using more than one "undefined"
  number.


8. Filenames: should be "UNDEFINEDPATH".

  As with the simple string, a single upper-case word is recommended.


9. Uniform Resource Identifier (URI): schema-specific

  RFC 3986 defines URIs as a "federated and extensible naming system."
  All URIs start with a schema-name part and no schema-name has been
  reserved for undefined or example values.

  For any given URI schema ("http", for example), it may be possible
  to define an undefined value within that name-space.  If a GLUE
  value has only one valid schema, the undefined value should be taken
  from that schema.  If several schemata are possible, one should be
  chosen from the available options, which should be the most commonly
  used.

  For schemata that include a FQDN (e.g., a reference to an Internet
  host), an undefined value should be derived by using an undefined
  FQDN in an otherwise valid URI.  See above for details on undefined
  FQDNs.  Examples of such schemata include: http, httpg and mailto.

  For other schemata, some element of the naming system should
  indicate that the value is undefined.  This is subject to further
  work.


10. X509 Distinguished Names: should include a RDN of CN=UNDEFINEDUSER

  Rational:

  X509 uses a X500 namespace, represented as several RDNs concatenated
  by commas.  The final RDN is usually a single common name (CN).

  It is possible for more than one CN to be present, allowing
  inclusion of additional semantic meaning.  However, this is outwith
  the scope of the document.



Definition of words:
---

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are used deliberately and take their meaning from RFC 2119.
A brief summary is given here.


1. MUST (or "REQUIRED" or "SHALL") means that no deviation is allowed
   from conforming software.

2. MUST NOT (or "SHALL NOT") means complete prohibition of this
   behaviour with conforming software.

3. SHOULD (or "RECOMMENDED") means that there may be reasons why
   conforming software does not to adopt this behaviour, but all the
   effects of an alternative behaviour must be understood and
   considered before choosing a different course.

4. SHOULD NOT (or "NOT RECOMMENDED") means that there may be reasons
   why conforming software adopts this behaviour, but all the
   effects of an alternative behaviour must be understood and
   considered before choosing a different course.

5. MAY (or "OPTIONAL") means an item is completely optional.


More information about the glue-wg mailing list