[SAGA-RG] SAGA Resource Package Comments

Sat Dec 8 04:07:48 EST 2012

Hi Ole,

On Tue, Nov 13, 2012 at 4:54 PM, Ole Weidner <ole.weidner at rutgers.edu> wrote:
> Hi Andre,
>
> here's some quick comments w.r.t. the resource package. It's more for the records - we can talk about this in detail when we discuss the resource package on the 28th (journal club).

Sorry this took so long, I am trying to catch up with things now...

> I fundamentally disagree that 'Compute' inherits from 'saga.job.Service' and that 'Storage' inherits from saga.filesystem.Directory' because this:
>
> 1. breaks SAGA's horizontally independent package model: it would mean that I can't implement a resource-package only implementation of SAGA. I would have to implement the Job and the Filesystem package as well!

A compute resource is useless if you can't submit jobs, a storage
resource is useless if you can't store files.  That does not change if
you replace inheritance with get_job_service / get_filesystem -- in
both cases, you will need to implement the job package / file package,
too.

While we indeed try to avoid too many cross dependencies between
functional API packages, we do have them in some places, most notably
for the namespace derivates.

FWIW, another reason why compute resource inherits from job.service is
that we intented to fix some shortcomings of the job service, in
particular wanted to add the ability to directly submit JSDL.
Inheritance provides a very simple means to do so.  I agree though
that this should not be the foremost concern for API design - but
anyway.

Another point though I want to make: I don't like the idea to have a
job service, which is not stateful, depending on the state of a
compute resource (and same for filesystem / storage resource) -- on
API level, there are no means to infer if the job service is valid for
job submission at any point in time (you can't get a resource handle
from a job service instance) - so it always boils down to try/error.
We so far managed to avoid those implicit state dependencies, and I
would like to keep it this way.  [Yes, a decoupling maps better to the
Pilot API, but I would rather like to fix that in the Pilot API ;-)]

Don't get me wrong: I understand that inheritance is a pretty strong
coupling, and it does not necessarily reflect how DCIs are architected
internally -- from the end user perspective though, I find this
rendering simple, intuitive, and easy to use...

> 2. it mixes separate concerns: resource management and job submission!

I kind of agree, but think that this is set off by ease of use: get a
compute resource, submit jobs to it - bang.  This is, by far, the
dominant use case, so I would like to see this rendered exceedingly
simple.

> 3. I don't think that 'Directory' necessarily provides the right abstraction for all 'Storage' types. Certainly for most, but not for all. It's unnecessarily confining.

Yes, that is a limitation -- but unless we have a compelling use case
for other storage abstractions, and those use cases do not imply an
overly complicated approach to storage resources, that is the best
abstraction we have, right?  Even if the backend storage resources
have a limited / constraint namespace (think Amazon S3), the
filesystem abstraction still holds up nicely IMHO.  Also, I am not
concerned about provisioning of databases etc. -- we don't have decent
(or any) abstractions for those in SAGA, nor do we have use cases that
I know of -- so that would be out of scope for now.

> Furthermore,
>
> - class manager -> Manager

fixed, thanks.

> - what does manager.describe_resource() do? why can't it be manager.resources[x].get_description()

Hmm, probably right - but while that works nicely in python, you would have

  manager.get_resource (id).get_description ()

and chaining is something we do not promote in the API so far.  Thus,
I would like to keep the method in the API, but I agree that your
version is (in Python) the more intuitive one.

> - speaking of resources[x] - there's no 'non-property' version, i.e., get_resource()

thanks, I'll fix that.

> - I would prefer explicit list/get_compute(), list/get_storage(), list/get_network() and so on, so that we don't have to do type checking all over the place.

The list / get calls have a 'type' parameter, so you can filter for
specific resource types:

  compute_resources = manager.list_resources (saga.resource.Compute)
  storage_resources = manager.list_resources (saga.resource.Storage)

The default is 'Any' though, which obviously gives you all types.

> - why are there two Pool.add() methods? Why do we want to be able to add resources as strings?

Alas, we have that in a few places in the API.  There was a very long
discussion, a long time ago, where people argued that only using
handles would have too much of a performance impact (you'd always need
to create handles, which is *at least* one round-trip), and that only
using IDs would be too unwieldy to handle in many cases.  While I
agree with the first point, I do not think that the second one is very
valid.  That is one item I would like to clean up across SAGA in an
eventual API revision (if that ever happens).  So, for now that is in
the resource API as well, for consistency, but I personally do not
care much about it.  If we limit that, then I would be in favor of the
id version.

Cheers, Andre.

> Cheers!
> Ole

-- 
Nothing is really difficult...