[saga-core-wg] SAGA Job scheduling features

Andre Merzky andre at merzky.net
Wed Aug 6 02:16:26 CDT 2008


Hi Jeremie, 

I just realized that our posts went to the saga-core-wg
mailing list.  I am afraid that this list is not very
active: most spec related discussions are on the
saga-rg at ogf.org mailing list, and implementation based
threads go to the saga-devel at cct.lsu.edu mailing list.

Anyway, I also Cc'ed Hartmut on this thread, as he may have
more insight into the status of bulk operations.  

Hartmut, is that work still alive? 

Cheers, Andre.



Quoting [J?r?mie Chevalier] (Aug 06 2008):
> 
>    Dear Andre,
>    Thank you very much for your answers.
>    Indeed that is what I was looking for. Your answers are very helpful to
>    me.
>    Thanks again !
>    Regards,
>    Jeremie.
>    Andre Merzky a écrit :
> 
> Hi,
> 
> Quoting [J?r?mie Chevalier] (Aug 05 2008):
> 
> 
> Hello,
> 
> I'm Jeremie, R&D Engineer at the CETIC research center (Belgium).
> In the future, it is likely that I will develop SAGA adaptors (Job
> management) for some Grid middlewares.
> 
> I have a couple of questions regarding the Job Management
> implementation of SAGA:
>   * Does it support the launch of several jobs in one single operation,
>     like the drmaa_run_bulk_jobs() of the DRMAA API ?
> 
> 
> Yes, but the mechanism for that needs some explaining, so
> please bear with me if the answer is not short...
> 
> First, there is no direct (i.e. explicit) API call for bulk
> job submission.  However, SAGA hase some notion of bulk job
> submission, and in fact of bulk operations, which is
> expressed as follows:
> 
>   saga::job::service     js;
>   saga::job::description jd; // needs to be filled
>   saga::task_container   tc;
> 
>   // create 100 jobs, and add them to the task container.
>   // Note that the jobs are not running, but in New state
>   for ( int i = 0; i < 100; i ++ )
>   {
>     saga::job::job j = js.create_job (jd);
>     tc.add_task (j);
>   }
> 
>   // run all tasks and jobs in the task container
>   // this is the point where the adaptor can perform bulk
>   // optimization.
>   tc.run ();
> 
> The same mechanism is also available for any other async
> operation,
> 
>   saga::job::file f (url);
>   saga::task_container tc;
> 
>   // create 100 copy tasks, and add them to the task container.
>   // Note that the tasks are not running, but in New state
>   for ( int i = 0; i < 100; i ++ )
>   {
>     saga::task t = f.copy <saga::task::Async> (target[i]);
>     tc.add_task (t);
>   }
> 
>   // run all tasks and jobs in the task container
>   // this is the point where the adaptor can perform bulk
>   // optimization.
>   tc.run ();
> 
> 
> If run is performed on a task container, the saga engine is
> parsing all tasks in the task container (remember that job
> inherits from the task class, thus is a task, too).  If
> multiple tasks are found which can be handled by the same
> adaptor, then a bulk method in that adaptor is invoked,
> which can perform all of them at once.
> 
> This mechanism was implemented quite a while ago, and has
> been shown to work, but I am not sure about its status at
> the moment - Hartmut may be more up to date.  Anyway, we
> should be able to revive it, if that is what you need.
> 
> On adaptor level, it would just require the implementation
> of another set of operations, which get a set of
> instructions to perform, instead of a single instruction.
> 
> 
> 
> 
>   * Is there any class in the SAGA specification that permits the
>     retrieval of the return code of the job launched ?
> 
> 
> Yes, that works as follows:
> 
> 
>   saga::job::service     js;
>   saga::job::description jd; // needs to be filled
>   saga::job::job         j = js.create_job (jd);
> 
>   j.run ();
>   j.wait (); // job is in final state now
> 
>   saga::job::state state = job.get_state ();
> 
>   if ( saga::job::Failed == state )
>   {
>     std::string exitcode = job.get_attribute (saga::job::attributes::exitcode);
> 
>     std::cout << "Job failed with exitcode:"
>               << exitcode
>               << std::endl;
> 
>     exit (atoi (exitcode));
>   }
> 
> 
> Hope that is what you where looking for.
> 
> 
> 
> 
> I have to admit that I haven't read the whole SAGA API specifications
> yet, but I wanted to get and idea about the two points mentioned above.
> 
> 
> It is a long and tedious read, we know.  But I am afraid
> that you need to read most of it if you want to implement an
> adaptor, e.g.  Section 1 to 3, and the job part of section 4.
> 
> Cheers, Andre.
> 
> 
> 
> Thanks for your help.
> Best regards,
> Jeremie.
-- 
Nothing is ever easy.


More information about the saga-core-wg mailing list