[SAGA-RG] URLs and wildcards

Mon Dec 3 01:50:25 CST 2007

Hi all,

Andre Merzky wrote:
> Hi Thilo, all, 
> 
> Quoting [Thilo Kielmann] (Nov 29 2007):
>> Ceriel and I have been chatting about this issue, producing a proposal for
>> a solution.
>>
>> Two observations (Thilo only) up front:
>>
>> 1. wildcards are ONLY applicable to the methods copy, link, move, and remove
>>    in class ns_directory, and to nothing else in the whole name space package.
> 
> And to permissions_allow / permissions_deny.

Agreed.

> And in list and find of course, but those take strings, not
> URLs, no problem here.  Here we can (and should) leave the
> full wildcards IMHO.

Agreed.

> 
>> 2. In ns_directory, method list has a parameter pattern, while method find
>>    has name_pattern. This should both be "pattern". It refers to the same kind
>>    of thing.
> 
> Right.  The parameter for find is called name_pattern to
> distinguish it from the additional attrib_pattern pattern in
> the overloaded find method in the replica package...  But
> yes, they are the same thing.  So, if you want to have the
> same parameter name, it should be name_pattern I guess?
> 
> 
>> Further thoughts about URLs and wildcards.
>>
>> 3. In ns_directory, list and find with their "pattern" parameter actually
>>    refer to pathnames, relative to the current working directory (CWD).
>>    We should say that explicitly in the spec.
>>
>> 4. URLs, according to the RFC do NOT provide wildcards for files.
> 
> Hmm, a mail from me seem to have gone astray?  A while ago
> in this thread I wrote:
> 
> | Quoting [Thilo Kielmann] (Nov 26 2007):
> |
> || URLs, however, do not allow for wildcards, according to RFC1738.
> | 
> | Well, RFC1738 actually refers wildcards explicitely, e.g. in
> | Section 3.6. NEWS:
> | 
> |     If <newsgroup-name> is "*" (as in <URL:news:*>), it is
> |     used to refer to "all available news groups".
> 
> 
> So, unless my interpretation is wrong, I'd say that '*' is
> explicitely allowed as wildcards.

True, I think I mentioned in my original mail that '*' was OK, but
the other wildcard characters are not.

>>    (Non-)options:
>>
>>    a) add specific wildcards (like '*') to the URLs we use.
>>       This would not be corformant to the RFC, so it would no longer be URLs.
> 
> See above.
> 
> 
>>    b) "Use" the query mechanism for http to express wildcards for files.
>>       While possible "in theory" this would be far from obvious, so this would
>>       NOT be anything "simple" to use. (remember the "S" in SAGA)
> 
> Yep, I agree.
> 
> 
>>    c) Wildcard characters could be brought into URLs by %-escape sequences.
>>       Argument as with query: non-intuitive, not simple for the user.
> 
> I agree.  Another options would be (also from my previous
> mail):
> 
> | And here are two other options actually for dealing with
> | wildcards:
> |
> |  - allow only *, not the full blown shell wirldcards
> |
> |  - or use different characters for wildcards, e.g.
> |
> |    data_[a-z].bin -> data_((a-z)).bin
> |    image.?pg      -> image01.#pg
> |
> | I would find the second one slightly confusing, but an
> | option it is.
> 
> 
>>    Summary: we MUST NOT introduce file wildcards to URLs.
> 
> Hhmmmm... ;-)
> 
> 
>> This leaves us (IOHO - Ceriel and me) with two possible options for wildcards
>> for namespace entries (as expressed for operations on ns_directories):
>>
>> A. Have an additional method expand that takes a string parameter describing
>>    a pathname, relative to the CWD, (possibly) containing POSIX-style shell
>>    wildcards.
>>    expand() has an output parameter, an array of URLs, the expansion.
>>
>>    In addition to expand(), we add versions of the methods 
>>    copy, link, move, and remove from ns_directory that accept arrays of URLs
>>    instead of single URLs. (If we do not add these versions, we force the
>>    users to resort to bulk execution of tasks for a simple thing like
>>    "remove *.doc")
>>
>> B. Add versions of the methods copy, link, move, and remove from ns_directory
>>    that accept a string parameter describing a pathname, relative to the CWD,
>>    (possibly) containing POSIX-style shell wildcards.
> 
> C.   - allow * as wildcard in URLs (in the path element part)
>      - allow normal wildcards for the string pattern in list and find
>      - for all other wildcards ([a-z], ?, {one,two,three}) use
>         expand(), and require user level loops over te result.
> 
> A and B both have the problem of bloat -- not too badly though (6 calls).
> 
> B: why the limitation to relative path names?

Not really needed, indeed, but conceptually, wildcard expansion operates
on a directory, and we are talking about methods on directories here.

>> Comparing both options, Ceriel and myself are in favour of B.
>> It comes with less methods and a simpler and more obvious-to-use interface.
> 
> I vote for C *blush*.
> <F2>
> 
>> A is a very indirect solution where a user first has to build a list of URLs
>> from a wildcard string, and then has to pass this list of URLs to, e.g., copy.
> 
> Agree.
> 
> 
>> With B, the user can directly pass the wildcard string to, e.g., copy.
>> The "trick" is that the string is restricted in its expressiveness, namely to
>> pathnames relative to the CWD.
> 
> For C speaks that '*' is, probably, the most commonly used
> wildcard - so using that in the standard URL calls would
> help a lot.  As for the other wildcards, a detour via expand
> does not sound too bad anymore...

I can live with C :-) although it is a bit of an ad-hoc solution.
I like B a bit better, because it is more explicit about which methods
accept wildcards.

Ceriel