[glue-wg] Strings

Paul Millar paul.millar at desy.de
Mon Oct 26 06:34:15 CDT 2009


Hi Stephen, others,

On Friday 23 October 2009 21:54:39 stephen.burke at stfc.ac.uk wrote:
> A point just came up about the representation of strings. In the GLUE 2
> specification it seems we have no definition of the "string" type, which
> looks like an oversight (other than an implication in the placeholder
> section that strings may be UTF-8).

Agreed.  This does seem to be an oversight.

My humble suggestion:

	GLUE 2 strings are Unicode (see ISO/IEC 10646), but GLUE does not
	specifying how a string is to be represented.  A GLUE binding MUST
	describe which encodings are available for strings.

	If the underlying storage has one or more encodings that allow
	round-trip (decoding an encoded Unicode string) without any
	collisions (a collision is when two distinct Unicode strings that,
	after round-trip, are the same) then the binding MUST allow only
	these encodings.  If the underlying storage allows multiple
	collision-less round-trip string encodings then the GLUE
	binding MAY allow alternative encoding.  

	If the one or more of these encodings is a Unicode standard encoding
	then the GLUE binding SHOULD allow at least one of the available
	standard Unicode encodings.  If UTF-8 is an available encoding then
	the binding SHOULD allow UTF-8 encoded strings.

	If none of the string encodings available from the underlying storage
	support a collision-less Unicode round-trip then the binding SHOULD
	use the encoding that minimises the number of string collisions.  The
	GLUE binding MUST document which Unicode strings have a collision-less
	round-trip and SHOULD document the expected encoded value for the
	remaining Unicode strings.

The text to be included in Glue 2.0 errata and included in the next revision.

> In the GLUE 2 LDAP schema as currently implemented for EGEE, strings seem
> to be typed as IA5String as they were in glue 1, which is
> basically 7-bit ascii so special characters are not allowed. Should we be
> allowing UTF-8 strings?

I've had one user complain that they couldn't include a German sharp-s 
(double-s or ß) in a name attribute.  However, this doesn't really matter for 
German names: all German "weird" letters have 7-bit ASCII encoded versions 
("ß" --> "ss", "ë" --> "ae", etc).

I believe the same isn't true for all other languages, so I would support 
adopting UTF-8 encoded strings for the GLUE LDAP binding.  (IIRC, they are 
called "DirectoryString" in LDAP speak.)

Cheers,

Paul.



More information about the glue-wg mailing list