[glue-wg] Strings
Paul Millar
paul.millar at desy.de
Mon Oct 26 06:34:15 CDT 2009
Hi Stephen, others,
On Friday 23 October 2009 21:54:39 stephen.burke at stfc.ac.uk wrote:
> A point just came up about the representation of strings. In the GLUE 2
> specification it seems we have no definition of the "string" type, which
> looks like an oversight (other than an implication in the placeholder
> section that strings may be UTF-8).
Agreed. This does seem to be an oversight.
My humble suggestion:
GLUE 2 strings are Unicode (see ISO/IEC 10646), but GLUE does not
specifying how a string is to be represented. A GLUE binding MUST
describe which encodings are available for strings.
If the underlying storage has one or more encodings that allow
round-trip (decoding an encoded Unicode string) without any
collisions (a collision is when two distinct Unicode strings that,
after round-trip, are the same) then the binding MUST allow only
these encodings. If the underlying storage allows multiple
collision-less round-trip string encodings then the GLUE
binding MAY allow alternative encoding.
If the one or more of these encodings is a Unicode standard encoding
then the GLUE binding SHOULD allow at least one of the available
standard Unicode encodings. If UTF-8 is an available encoding then
the binding SHOULD allow UTF-8 encoded strings.
If none of the string encodings available from the underlying storage
support a collision-less Unicode round-trip then the binding SHOULD
use the encoding that minimises the number of string collisions. The
GLUE binding MUST document which Unicode strings have a collision-less
round-trip and SHOULD document the expected encoded value for the
remaining Unicode strings.
The text to be included in Glue 2.0 errata and included in the next revision.
> In the GLUE 2 LDAP schema as currently implemented for EGEE, strings seem
> to be typed as IA5String as they were in glue 1, which is
> basically 7-bit ascii so special characters are not allowed. Should we be
> allowing UTF-8 strings?
I've had one user complain that they couldn't include a German sharp-s
(double-s or ß) in a name attribute. However, this doesn't really matter for
German names: all German "weird" letters have 7-bit ASCII encoded versions
("ß" --> "ss", "ë" --> "ae", etc).
I believe the same isn't true for all other languages, so I would support
adopting UTF-8 encoded strings for the GLUE LDAP binding. (IIRC, they are
called "DirectoryString" in LDAP speak.)
Cheers,
Paul.
More information about the glue-wg
mailing list