[saga-rg] Jobs and SandBoxing

Thu Aug 11 17:27:42 CDT 2005

On 11/8/05 13:31, "Andre Merzky" <andre at merzky.net> wrote:

> Quoting [Christopher Smith] (Aug 04 2005):
>> 
>> On 29/7/05 10:47, "Andre Merzky" <andre at merzky.net> wrote:
>> 
>>> Another comment I got about jobs in SAGA is: how is
>>> sandboxing supported?  Can I at least determine if my
>>> job runs in a sandbox?  Or at least what it's 'cwd' is?
>> 
>> You can specify the job's CWD as submission time, but
>> there is no attribute to retrieve to fill in this
>> information. Perhaps it should be added to the JobInfo
>> class? 
> 
> Yes, perhaps.  I think the basic use case is:
> 
>  - you run a job
>  - job write data file ./out.dat
>  - job finishes
>  - you want to retrieve the data file
> 
> w/o knowing the cwd, you have trouble finding the file.
> Setting it beforehand does not help if the scheduler creates
> a sandbox.
> 
> So adding that info to the jobinfo seems to make sense.
> 
Ok ... I'll add this to the JobInfo. It's on a list of updates I need to
make to the docs ... that I haven't got to yet. *blush*

> 
>> As for supporting sandboxes, what does that actually mean?
>> In a chroot jail?  With a restricted user id (whatever
>> that means)? Why should I care? What's the use case?
> 
> I guess you are right: sandbox is by definition transparent
> to the end user, isn't it?  So while it might be useful to
> know where your job runs (see above), it may no make sense
> to enforce sandboxing (either its used or it isn't - what
> can SAGA do about this? nothing).
> 
Right. In a sandboxed environment, you basically need to a) stage files in
and out in-line with the job using relative paths, or b) use some kind of
"third party" storage service that you can then retrieve files from
(basically you use fully qualified paths and service endpoints).

I'm not sure there is much in between.

> 
>>> Does a job have a unique job ID I can use to identify
>>> it?  (That question is related to the session
>>> persistency discussed in another thread I think).
>> 
>> There is a getJobId method on the Job interface for this
>> purpose. It's up to the backend to provide the ID, so
>> uniqueness is not something SAGA can guarantee.
> 
> I semi-agree.  For finding your job again, you need more
> then the backend job-id - you need also the contact point
> for the backend.  Your SAGA implementation might know about
> that, so it may be able to create a 'better' job id.
> 
> In GAT, we did that, and had the distinction between a
> Native-JobID (the backends), and GAT-JobID (globally
> unique).  That might be overkill to mandate for SAGA at this
> point, unless we have a clear use case wanting so I guess.
> 
> So, bottom line, I guess you are right, backend-id should be
> sufficient unless we run into problems with that.
> 
I think that the idea of a SAGA-JobID that is some kind of composite of the
backend ID and some "SAGA decoration" is a good idea ... especially if a
SAGA session is used to access multiple back ends. Generating global IDs
within one implementation is easy enough, but do we want to take a stab at
defining a format that all implementations should support? How hard do you
think it would be? The idea is that two SAGA implementations (running
concurrently) would have globally unique job id spaces.

-- Chris