[saga-core-wg] SAGA Job scheduling features
Andre Merzky
andre at merzky.net
Wed Aug 6 02:16:26 CDT 2008
Hi Jeremie,
I just realized that our posts went to the saga-core-wg
mailing list. I am afraid that this list is not very
active: most spec related discussions are on the
saga-rg at ogf.org mailing list, and implementation based
threads go to the saga-devel at cct.lsu.edu mailing list.
Anyway, I also Cc'ed Hartmut on this thread, as he may have
more insight into the status of bulk operations.
Hartmut, is that work still alive?
Cheers, Andre.
Quoting [J?r?mie Chevalier] (Aug 06 2008):
>
> Dear Andre,
> Thank you very much for your answers.
> Indeed that is what I was looking for. Your answers are very helpful to
> me.
> Thanks again !
> Regards,
> Jeremie.
> Andre Merzky a écrit :
>
> Hi,
>
> Quoting [J?r?mie Chevalier] (Aug 05 2008):
>
>
> Hello,
>
> I'm Jeremie, R&D Engineer at the CETIC research center (Belgium).
> In the future, it is likely that I will develop SAGA adaptors (Job
> management) for some Grid middlewares.
>
> I have a couple of questions regarding the Job Management
> implementation of SAGA:
> * Does it support the launch of several jobs in one single operation,
> like the drmaa_run_bulk_jobs() of the DRMAA API ?
>
>
> Yes, but the mechanism for that needs some explaining, so
> please bear with me if the answer is not short...
>
> First, there is no direct (i.e. explicit) API call for bulk
> job submission. However, SAGA hase some notion of bulk job
> submission, and in fact of bulk operations, which is
> expressed as follows:
>
> saga::job::service js;
> saga::job::description jd; // needs to be filled
> saga::task_container tc;
>
> // create 100 jobs, and add them to the task container.
> // Note that the jobs are not running, but in New state
> for ( int i = 0; i < 100; i ++ )
> {
> saga::job::job j = js.create_job (jd);
> tc.add_task (j);
> }
>
> // run all tasks and jobs in the task container
> // this is the point where the adaptor can perform bulk
> // optimization.
> tc.run ();
>
> The same mechanism is also available for any other async
> operation,
>
> saga::job::file f (url);
> saga::task_container tc;
>
> // create 100 copy tasks, and add them to the task container.
> // Note that the tasks are not running, but in New state
> for ( int i = 0; i < 100; i ++ )
> {
> saga::task t = f.copy <saga::task::Async> (target[i]);
> tc.add_task (t);
> }
>
> // run all tasks and jobs in the task container
> // this is the point where the adaptor can perform bulk
> // optimization.
> tc.run ();
>
>
> If run is performed on a task container, the saga engine is
> parsing all tasks in the task container (remember that job
> inherits from the task class, thus is a task, too). If
> multiple tasks are found which can be handled by the same
> adaptor, then a bulk method in that adaptor is invoked,
> which can perform all of them at once.
>
> This mechanism was implemented quite a while ago, and has
> been shown to work, but I am not sure about its status at
> the moment - Hartmut may be more up to date. Anyway, we
> should be able to revive it, if that is what you need.
>
> On adaptor level, it would just require the implementation
> of another set of operations, which get a set of
> instructions to perform, instead of a single instruction.
>
>
>
>
> * Is there any class in the SAGA specification that permits the
> retrieval of the return code of the job launched ?
>
>
> Yes, that works as follows:
>
>
> saga::job::service js;
> saga::job::description jd; // needs to be filled
> saga::job::job j = js.create_job (jd);
>
> j.run ();
> j.wait (); // job is in final state now
>
> saga::job::state state = job.get_state ();
>
> if ( saga::job::Failed == state )
> {
> std::string exitcode = job.get_attribute (saga::job::attributes::exitcode);
>
> std::cout << "Job failed with exitcode:"
> << exitcode
> << std::endl;
>
> exit (atoi (exitcode));
> }
>
>
> Hope that is what you where looking for.
>
>
>
>
> I have to admit that I haven't read the whole SAGA API specifications
> yet, but I wanted to get and idea about the two points mentioned above.
>
>
> It is a long and tedious read, we know. But I am afraid
> that you need to read most of it if you want to implement an
> adaptor, e.g. Section 1 to 3, and the job part of section 4.
>
> Cheers, Andre.
>
>
>
> Thanks for your help.
> Best regards,
> Jeremie.
--
Nothing is ever easy.
More information about the saga-core-wg
mailing list