[SAGA-RG] URLs and wildcards (was: More confusion)
Thilo Kielmann
kielmann at cs.vu.nl
Mon Dec 3 09:03:10 CST 2007
>
> And to permissions_allow / permissions_deny.
Yep.
> And in list and find of course, but those take strings, not
> URLs, no problem here. Here we can (and should) leave the
> full wildcards IMHO.
Yes, but that is an unrelated story.
> > 2. In ns_directory, method list has a parameter pattern, while method find
> > has name_pattern. This should both be "pattern". It refers to the same kind
> > of thing.
>
> Right. The parameter for find is called name_pattern to
> distinguish it from the additional attrib_pattern pattern in
> the overloaded find method in the replica package... But
> yes, they are the same thing. So, if you want to have the
> same parameter name, it should be name_pattern I guess?
OK.
> > Further thoughts about URLs and wildcards.
Another take on "why NOT having wildcards in URLs denoting files and
directories":
1. the reason for having wildcards in the first place is to have something
with the "look and feel" of POSIX shell wildcards in SAGA calls.
==> everything that contradicts this look-and-feel is to be ruled OUT
1.a. this means that all character sequences requiring octet-encoding of
wildcard characters are OUT.
1.b. this further means that everything that can not be used in a straight
forward way is "OUT" (meaning: everything that is NOT simple to use)
2. when using URLs we MUST conform to RFC1738
Let's look into RFC1738: (http://www.ietf.org/rfc/rfc1738.txt)
2.2 URL Character Encoding Issues
Unsafe:
Other characters are unsafe ...
These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".
All unsafe characters must always be encoded within a URL.
Reserved:
The characters ";",
"/", "?", ":", "@", "=" and "&" are the characters which may be
reserved for special meaning within a scheme.
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.
Let's look into reserved characters per protocol:
FTP:
Within a name or CWD component, the characters "/" and ";" are
reserved and must be encoded.
HTTP:
Within the <path> and <searchpart> components, "/", ";", "?" are
reserved.
file:
(no reserved characters mentioned)
aside: the use of the '*' in the NEWS scheme is irrelevant here because
this only applies to NNTP news, NOT to files or directories
Summary: POSIX shell-like wildcards in URLs:
- some characters like [ ] must be encoded
- depending on the protocol, other characters MUST be encoded or not
This means, we can NOT provide wildcards in URLs with an intuitive,
obvious to use (e.g., protocol-independent) way, without violating RFC1738.
We could, however, restrict ourselves to the '*' wildcard only, but this
is a very limited form of wildcards, although freqeuntly used, not really
worth being called "POSIX shell wildcards".
> Hmm, a mail from me seem to have gone astray? A while ago
> in this thread I wrote:
>
> So, unless my interpretation is wrong, I'd say that '*' is
> explicitely allowed as wildcards.
Your interpretation IS wrong (see above, this is ONLY applicable to NNTP)
> | And here are two other options actually for dealing with
> | wildcards:
> |
> | - allow only *, not the full blown shell wirldcards
Too limited (see above).
> |
> | - or use different characters for wildcards, e.g.
> |
> | data_[a-z].bin -> data_((a-z)).bin
> | image.?pg -> image01.#pg
Not just slightly but stringly confusing, no no.
> B: why the limitation to relative path names?
Idea: keep URLs for absolute, global identifiers. Have strings with POSIX
shell wildcards as local names, relative to the directory the operation is
working on.
> For C speaks that '*' is, probably, the most commonly used
> wildcard - so using that in the standard URL calls would
> help a lot. As for the other wildcards, a detour via expand
> does not sound too bad anymore...
It leaves us wild the feelilng of a "hack" while we could also have a clean
solution: URLs without and relative strings with wildcards.
Thilo
--
Thilo Kielmann http://www.cs.vu.nl/~kielmann/
More information about the saga-rg
mailing list