[SAGA-RG] URLs and wildcards (was: More confusion)

Thilo Kielmann kielmann at cs.vu.nl
Mon Dec 3 09:03:10 CST 2007


> 
> And to permissions_allow / permissions_deny.

Yep.

> And in list and find of course, but those take strings, not
> URLs, no problem here.  Here we can (and should) leave the
> full wildcards IMHO.

Yes, but that is an unrelated story.

> > 2. In ns_directory, method list has a parameter pattern, while method find
> >    has name_pattern. This should both be "pattern". It refers to the same kind
> >    of thing.
> 
> Right.  The parameter for find is called name_pattern to
> distinguish it from the additional attrib_pattern pattern in
> the overloaded find method in the replica package...  But
> yes, they are the same thing.  So, if you want to have the
> same parameter name, it should be name_pattern I guess?

OK.

> > Further thoughts about URLs and wildcards.

Another take on "why NOT having wildcards in URLs denoting files and 
directories":

1. the reason for having wildcards in the first place is to have something
   with the "look and feel" of POSIX shell wildcards in SAGA calls.

   ==> everything that contradicts this look-and-feel is to be ruled OUT

1.a. this means that all character sequences requiring octet-encoding of
   wildcard characters are OUT.

1.b. this further means that everything that can not be used in a straight
   forward way is "OUT" (meaning: everything that is NOT simple to use)

2. when using URLs we MUST conform to RFC1738

Let's look into RFC1738: (http://www.ietf.org/rfc/rfc1738.txt)

2.2 URL Character Encoding Issues

Unsafe:

Other characters are unsafe ...
These characters are "{", "}", "|", "\", "^", "~",  "[", "]", and "`".

All unsafe characters must always be encoded within a URL. 


Reserved:

The characters ";",
   "/", "?", ":", "@", "=" and "&" are the characters which may be
   reserved for special meaning within a scheme.

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used
   unencoded within a URL.


Let's look into reserved characters per protocol:

FTP:
Within a name or CWD component, the characters "/" and ";" are
   reserved and must be encoded.

HTTP:
Within the <path> and <searchpart> components, "/", ";", "?" are
   reserved.

file:
(no reserved characters mentioned)


aside: the use of the '*' in the NEWS scheme is irrelevant here because
this only applies to NNTP news, NOT to files or directories



Summary: POSIX shell-like wildcards in URLs:
- some characters like [ ] must be encoded
- depending on the protocol, other characters MUST be encoded or not

This means, we can NOT provide wildcards in URLs with an intuitive,
obvious to use (e.g., protocol-independent) way, without violating RFC1738.

We could, however, restrict ourselves to the '*' wildcard only, but this
is a very limited form of wildcards, although freqeuntly used, not really
worth being called "POSIX shell wildcards".


> Hmm, a mail from me seem to have gone astray?  A while ago
> in this thread I wrote:
> 
> So, unless my interpretation is wrong, I'd say that '*' is
> explicitely allowed as wildcards.

Your interpretation IS wrong (see above, this is ONLY applicable to NNTP)


> | And here are two other options actually for dealing with
> | wildcards:
> |
> |  - allow only *, not the full blown shell wirldcards

Too limited (see above).

> |
> |  - or use different characters for wildcards, e.g.
> |
> |    data_[a-z].bin -> data_((a-z)).bin
> |    image.?pg      -> image01.#pg

Not just slightly but stringly confusing, no no.

> B: why the limitation to relative path names?

Idea: keep URLs for absolute, global identifiers. Have strings with POSIX
shell wildcards as local names, relative to the directory the operation is
working on.


> For C speaks that '*' is, probably, the most commonly used
> wildcard - so using that in the standard URL calls would
> help a lot.  As for the other wildcards, a detour via expand
> does not sound too bad anymore...

It leaves us wild the feelilng of a "hack" while we could also have a clean
solution: URLs without and relative strings with wildcards.

Thilo
-- 
Thilo Kielmann                                 http://www.cs.vu.nl/~kielmann/


More information about the saga-rg mailing list