[SAGA-RG] spec

Andre Merzky andre at merzky.net
Sun Dec 16 13:35:45 CST 2007


Good arguments.  I don't agree with some points as you know,
but your line of argumentation makes sense.  So, lets do
that finally.

Big thanks, 

  Andre.


Quoting [Thilo Kielmann] (Dec 15 2007):
> From: Thilo Kielmann <kielmann at cs.vu.nl>
> To: Ceriel Jacobs <ceriel at cs.vu.nl>
> Cc: Andre Merzky <andre at merzky.net>, Thilo Kielmann <kielmann at cs.vu.nl>,
> 	Shantenu Jha <sjha at cct.lsu.edu>,
> 	Hartmut Kaiser <hartmut.kaiser at gmail.com>,
> 	SAGA RG <saga-rg at ogf.org>
> Subject: Re: spec
> 
> I've spent some more time studying RFC's and thinking about this
> thread of discussion.
> 
> I hope we can at least agree upon our design goals:
> whatever we specify into SAGA has to be "simple to use", and as such
> has to "do the obvious thing" in its respective context.
> 
> The aim of the exercise is to provide "POSIX shell wild cards" for files
> (well, actually for name space entries.)
> While trying to do so, we came across two sub topics:
> 
> a) wild cards possibly in URLs
> b) wild cards in strings
> 
> 
> About wild cards in URLs.
> 
> The valid RFC for URLs is
> RFC3986 "Uniform Resource Identifier (URI): Generic Syntax"
> 
> It says (Introduction, second paragraph):
> 
>    "This document obsoletes [RFC2396], which merged "Uniform Resource
>    Locators" [RFC1738] and "Relative Uniform Resource Locators"
>    [RFC1808] in order to define a single, generic syntax for all URIs.
>    It obsoletes [RFC2732], which introduced syntax for an IPv6 address.
>    It excludes portions of RFC 1738 that defined the specific syntax of
>    individual URI schemes; those portions will be updated as separate
>    documents. ..."
> 
> I have checked IETF's site with RFCs and could not find any RFC documents
> that would desribe new schemes for "file", "ftp", or "http".
> This means, RFC3986 describes the general URI syntax, while the relevant
> URL types for us (file, ftp, http) are still valid as described in RFC1738.
> 
> 
> Having said this, I made the following two observations:
> 
> 1. RFC3986 says (Introduction, first paragraph, first sentence):
> "A Uniform Resource Identifier (URI) provides a simple and extensible
> means for identifying a resource." I'd like to put the emphasis here on
> "a resource", rather than "a resource or a group of resources".
> Besides, RFC3986 does NOT contain the terms "wild card", nor "wildcard",
> not even "pattern".
> 
> 2. In RFC1738, the character '*' is not required to be used in escape
> sequences. (While other special characters from POSIX shell wild cards are).
> In a previous discussion we had already ruled out such wild card characters
> that would require to be escaped as too complicated and non-obvious to use.
> However, the URL schemes for "file", "ftp", and "http" do not define any
> wild card patterns. (Only the "news" schema uses the '*' character as a
> simple wild card. But this is not relevant for us.)
> 
> From both observations I am drawing the conclusion that we MUST NOT use
> any wild cards, not even the '*' character in URLs. This is because adding
> a wild-card semantics to these URLs would deviate from both the definitions
> in RFC3986 and RFC1738, and also from "common use" of URLs, namely for 
> "identifying a single resource."
> 
> 
> This leaves us with option b) "wild cards in strings".
> 
> We do have consensus about using wild cards for name-space entries in
> strings. More specifically: in path elements, expressed in strings.
> However, we do not yet fully agree on the proposal to limit
> these to path elements that are relative to the name space (read: directory)
> on which the wild-card enabled functions operate.
> Both camps argue with simplicity for the user.
> 
> The argument AGAINST restricting strings to relative paths is the possible
> confusion of parts of syntactically valid paths (absolute ones) not beeing
> valid by the semantical restriction to relative paths.
> 
> The argument FOR restricting strings to relative paths is that absolute
> paths coincide with URLs and that this would give a second (string) 
> representation for URLs, however with wild cards allowed (see discussion
> above), having two representations for (almost) the same thing is considered
> confusing for the user.
> 
> Argument by Andre:
> 
> > >>>  tmp/data.bin   <-- relative
> > >>>  /tmp/data.bin  <-- absolute
> 
> Well, I would say that this "absolute" path still is relative, namely to
> the base URI "file://localhost/".
> Absolute paths on the same machine form a corner case in grids. Really
> "absolute" paths identify the machine on which a file/directory resides.
> 
> As pointed out by Ceriel, URI's according to RFC 3986 always contain
> absolute paths, especially after "normalization" has been applied. This means,
> URI's can not hold relative paths, not in the general case. (And we are 
> asking for problems if we require implementations to NEVER normalize a URI...)
> 
> This argument goes like:
> A string with an absolute path coincides with a URI, where wild cards are
> not allowed/desirable. A string with a relative path is "sufficiently 
> different" from a URL such that it is obvious for the user where wild cards
> are allowed and where they are not (in URLs).
> 
> 
> If we agree to restrict strings to relative paths, which use cases are we
> missing? What can NOT be expresed then? We can still do the following:
> 
> saga::directory dir(url);
> dir.copy("sub/*/bla[1-9].doc",target-url);
> 
> Which can be, for the running aplication, a third-party copy, honoring 
> wild cards.
> 
> I can currently not think of any use case where it would be a problem
> to first create the dir object first (and instead do the same copy
> with two URLs directly, but then on which directory object???)
> 
> 
> To summarize:
> 
> I hereby propose to limit the use of wild cards to strings, and in there
> to relative paths, because this:
> - is sufficiently different from absolute URLs to avoid confusion
> - is sufficiently expressive
> 
> 
> 
> Regards,
> 
> 
> Thilo
-- 
No trees were destroyed in the sending of this message, however,
a significant number of electrons were terribly inconvenienced.


More information about the saga-rg mailing list