[SAGA-RG] URLs and wildcards
Ceriel Jacobs
ceriel at cs.vu.nl
Mon Dec 3 01:50:25 CST 2007
Hi all,
Andre Merzky wrote:
> Hi Thilo, all,
>
> Quoting [Thilo Kielmann] (Nov 29 2007):
>> Ceriel and I have been chatting about this issue, producing a proposal for
>> a solution.
>>
>> Two observations (Thilo only) up front:
>>
>> 1. wildcards are ONLY applicable to the methods copy, link, move, and remove
>> in class ns_directory, and to nothing else in the whole name space package.
>
> And to permissions_allow / permissions_deny.
Agreed.
> And in list and find of course, but those take strings, not
> URLs, no problem here. Here we can (and should) leave the
> full wildcards IMHO.
Agreed.
>
>> 2. In ns_directory, method list has a parameter pattern, while method find
>> has name_pattern. This should both be "pattern". It refers to the same kind
>> of thing.
>
> Right. The parameter for find is called name_pattern to
> distinguish it from the additional attrib_pattern pattern in
> the overloaded find method in the replica package... But
> yes, they are the same thing. So, if you want to have the
> same parameter name, it should be name_pattern I guess?
>
>
>> Further thoughts about URLs and wildcards.
>>
>> 3. In ns_directory, list and find with their "pattern" parameter actually
>> refer to pathnames, relative to the current working directory (CWD).
>> We should say that explicitly in the spec.
>>
>> 4. URLs, according to the RFC do NOT provide wildcards for files.
>
> Hmm, a mail from me seem to have gone astray? A while ago
> in this thread I wrote:
>
> | Quoting [Thilo Kielmann] (Nov 26 2007):
> |
> || URLs, however, do not allow for wildcards, according to RFC1738.
> |
> | Well, RFC1738 actually refers wildcards explicitely, e.g. in
> | Section 3.6. NEWS:
> |
> | If <newsgroup-name> is "*" (as in <URL:news:*>), it is
> | used to refer to "all available news groups".
>
>
> So, unless my interpretation is wrong, I'd say that '*' is
> explicitely allowed as wildcards.
True, I think I mentioned in my original mail that '*' was OK, but
the other wildcard characters are not.
>> (Non-)options:
>>
>> a) add specific wildcards (like '*') to the URLs we use.
>> This would not be corformant to the RFC, so it would no longer be URLs.
>
> See above.
>
>
>> b) "Use" the query mechanism for http to express wildcards for files.
>> While possible "in theory" this would be far from obvious, so this would
>> NOT be anything "simple" to use. (remember the "S" in SAGA)
>
> Yep, I agree.
>
>
>> c) Wildcard characters could be brought into URLs by %-escape sequences.
>> Argument as with query: non-intuitive, not simple for the user.
>
> I agree. Another options would be (also from my previous
> mail):
>
> | And here are two other options actually for dealing with
> | wildcards:
> |
> | - allow only *, not the full blown shell wirldcards
> |
> | - or use different characters for wildcards, e.g.
> |
> | data_[a-z].bin -> data_((a-z)).bin
> | image.?pg -> image01.#pg
> |
> | I would find the second one slightly confusing, but an
> | option it is.
>
>
>> Summary: we MUST NOT introduce file wildcards to URLs.
>
> Hhmmmm... ;-)
>
>
>> This leaves us (IOHO - Ceriel and me) with two possible options for wildcards
>> for namespace entries (as expressed for operations on ns_directories):
>>
>> A. Have an additional method expand that takes a string parameter describing
>> a pathname, relative to the CWD, (possibly) containing POSIX-style shell
>> wildcards.
>> expand() has an output parameter, an array of URLs, the expansion.
>>
>> In addition to expand(), we add versions of the methods
>> copy, link, move, and remove from ns_directory that accept arrays of URLs
>> instead of single URLs. (If we do not add these versions, we force the
>> users to resort to bulk execution of tasks for a simple thing like
>> "remove *.doc")
>>
>> B. Add versions of the methods copy, link, move, and remove from ns_directory
>> that accept a string parameter describing a pathname, relative to the CWD,
>> (possibly) containing POSIX-style shell wildcards.
>
> C. - allow * as wildcard in URLs (in the path element part)
> - allow normal wildcards for the string pattern in list and find
> - for all other wildcards ([a-z], ?, {one,two,three}) use
> expand(), and require user level loops over te result.
>
> A and B both have the problem of bloat -- not too badly though (6 calls).
>
> B: why the limitation to relative path names?
Not really needed, indeed, but conceptually, wildcard expansion operates
on a directory, and we are talking about methods on directories here.
>> Comparing both options, Ceriel and myself are in favour of B.
>> It comes with less methods and a simpler and more obvious-to-use interface.
>
> I vote for C *blush*.
> <F2>
>
>> A is a very indirect solution where a user first has to build a list of URLs
>> from a wildcard string, and then has to pass this list of URLs to, e.g., copy.
>
> Agree.
>
>
>> With B, the user can directly pass the wildcard string to, e.g., copy.
>> The "trick" is that the string is restricted in its expressiveness, namely to
>> pathnames relative to the CWD.
>
> For C speaks that '*' is, probably, the most commonly used
> wildcard - so using that in the standard URL calls would
> help a lot. As for the other wildcards, a detour via expand
> does not sound too bad anymore...
I can live with C :-) although it is a bit of an ad-hoc solution.
I like B a bit better, because it is more explicit about which methods
accept wildcards.
Ceriel
More information about the saga-rg
mailing list