[SAGA-RG] URLs and wildcards (was: More confusion)

Sun Dec 2 09:18:56 CST 2007

Hi Thilo, all, 

Quoting [Thilo Kielmann] (Nov 29 2007):
> 
> Ceriel and I have been chatting about this issue, producing a proposal for
> a solution.
> 
> Two observations (Thilo only) up front:
> 
> 1. wildcards are ONLY applicable to the methods copy, link, move, and remove
>    in class ns_directory, and to nothing else in the whole name space package.

And to permissions_allow / permissions_deny.

And in list and find of course, but those take strings, not
URLs, no problem here.  Here we can (and should) leave the
full wildcards IMHO.

> 2. In ns_directory, method list has a parameter pattern, while method find
>    has name_pattern. This should both be "pattern". It refers to the same kind
>    of thing.

Right.  The parameter for find is called name_pattern to
distinguish it from the additional attrib_pattern pattern in
the overloaded find method in the replica package...  But
yes, they are the same thing.  So, if you want to have the
same parameter name, it should be name_pattern I guess?

> Further thoughts about URLs and wildcards.
> 
> 3. In ns_directory, list and find with their "pattern" parameter actually
>    refer to pathnames, relative to the current working directory (CWD).
>    We should say that explicitly in the spec.
> 
> 4. URLs, according to the RFC do NOT provide wildcards for files.

Hmm, a mail from me seem to have gone astray?  A while ago
in this thread I wrote:

| Quoting [Thilo Kielmann] (Nov 26 2007):
|
|| URLs, however, do not allow for wildcards, according to RFC1738.
| 
| Well, RFC1738 actually refers wildcards explicitely, e.g. in
| Section 3.6. NEWS:
| 
|     If <newsgroup-name> is "*" (as in <URL:news:*>), it is
|     used to refer to "all available news groups".

So, unless my interpretation is wrong, I'd say that '*' is
explicitely allowed as wildcards.

>    (Non-)options:
> 
>    a) add specific wildcards (like '*') to the URLs we use.
>       This would not be corformant to the RFC, so it would no longer be URLs.

See above.

>    b) "Use" the query mechanism for http to express wildcards for files.
>       While possible "in theory" this would be far from obvious, so this would
>       NOT be anything "simple" to use. (remember the "S" in SAGA)

Yep, I agree.

>    c) Wildcard characters could be brought into URLs by %-escape sequences.
>       Argument as with query: non-intuitive, not simple for the user.

I agree.  Another options would be (also from my previous
mail):

>    Summary: we MUST NOT introduce file wildcards to URLs.

Hhmmmm... ;-)

> This leaves us (IOHO - Ceriel and me) with two possible options for wildcards
> for namespace entries (as expressed for operations on ns_directories):
> 
> A. Have an additional method expand that takes a string parameter describing
>    a pathname, relative to the CWD, (possibly) containing POSIX-style shell
>    wildcards.
>    expand() has an output parameter, an array of URLs, the expansion.
> 
>    In addition to expand(), we add versions of the methods 
>    copy, link, move, and remove from ns_directory that accept arrays of URLs
>    instead of single URLs. (If we do not add these versions, we force the
>    users to resort to bulk execution of tasks for a simple thing like
>    "remove *.doc")
> 
> B. Add versions of the methods copy, link, move, and remove from ns_directory
>    that accept a string parameter describing a pathname, relative to the CWD,
>    (possibly) containing POSIX-style shell wildcards.

C.   - allow * as wildcard in URLs (in the path element part)
     - allow normal wildcards for the string pattern in list and find
     - for all other wildcards ([a-z], ?, {one,two,three}) use
        expand(), and require user level loops over te result.

A and B both have the problem of bloat -- not too badly though (6 calls).

B: why the limitation to relative path names?

> Comparing both options, Ceriel and myself are in favour of B.
> It comes with less methods and a simpler and more obvious-to-use interface.

I vote for C *blush*.
<F2>

>
> A is a very indirect solution where a user first has to build a list of URLs
> from a wildcard string, and then has to pass this list of URLs to, e.g., copy.

Agree.

> With B, the user can directly pass the wildcard string to, e.g., copy.
> The "trick" is that the string is restricted in its expressiveness, namely to
> pathnames relative to the CWD.

For C speaks that '*' is, probably, the most commonly used
wildcard - so using that in the standard URL calls would
help a lot.  As for the other wildcards, a detour via expand
does not sound too bad anymore...

Cheers, Andre.

> Any opinions on the proposal of implementing solution B ???
> 
> 
> Thilo
-- 
No trees were destroyed in the sending of this message, however,
a significant number of electrons were terribly inconvenienced.