[saga-core-wg] SAGA Job scheduling features

Andre Merzky andre at merzky.net
Tue Aug 5 13:47:17 CDT 2008


Hi, 

Quoting [J?r?mie Chevalier] (Aug 05 2008):
> 
> Hello,
>
> I'm Jeremie, R&D Engineer at the CETIC research center (Belgium).
> In the future, it is likely that I will develop SAGA adaptors (Job
> management) for some Grid middlewares.
>
> I have a couple of questions regarding the Job Management
> implementation of SAGA:
>   * Does it support the launch of several jobs in one single operation,
>     like the drmaa_run_bulk_jobs() of the DRMAA API ?

Yes, but the mechanism for that needs some explaining, so
please bear with me if the answer is not short...

First, there is no direct (i.e. explicit) API call for bulk
job submission.  However, SAGA hase some notion of bulk job
submission, and in fact of bulk operations, which is
expressed as follows:

  saga::job::service     js;
  saga::job::description jd; // needs to be filled
  saga::task_container   tc;

  // create 100 jobs, and add them to the task container.
  // Note that the jobs are not running, but in New state
  for ( int i = 0; i < 100; i ++ )
  {
    saga::job::job j = js.create_job (jd);
    tc.add_task (j);
  }

  // run all tasks and jobs in the task container
  // this is the point where the adaptor can perform bulk
  // optimization.
  tc.run ();

The same mechanism is also available for any other async
operation, 

  saga::job::file f (url);
  saga::task_container tc;

  // create 100 copy tasks, and add them to the task container.
  // Note that the tasks are not running, but in New state
  for ( int i = 0; i < 100; i ++ )
  {
    saga::task t = f.copy <saga::task::Async> (target[i]);
    tc.add_task (t);
  }

  // run all tasks and jobs in the task container
  // this is the point where the adaptor can perform bulk
  // optimization.
  tc.run ();


If run is performed on a task container, the saga engine is
parsing all tasks in the task container (remember that job
inherits from the task class, thus is a task, too).  If
multiple tasks are found which can be handled by the same
adaptor, then a bulk method in that adaptor is invoked,
which can perform all of them at once.

This mechanism was implemented quite a while ago, and has
been shown to work, but I am not sure about its status at
the moment - Hartmut may be more up to date.  Anyway, we
should be able to revive it, if that is what you need.

On adaptor level, it would just require the implementation
of another set of operations, which get a set of
instructions to perform, instead of a single instruction.


>   * Is there any class in the SAGA specification that permits the
>     retrieval of the return code of the job launched ?

Yes, that works as follows:

  
  saga::job::service     js;
  saga::job::description jd; // needs to be filled
  saga::job::job         j = js.create_job (jd);

  j.run ();
  j.wait (); // job is in final state now

  saga::job::state state = job.get_state ();

  if ( saga::job::Failed == state ) 
  { 
    std::string exitcode = job.get_attribute (saga::job::attributes::exitcode); 
    
    std::cout << "Job failed with exitcode:" 
              << exitcode 
              << std::endl; 

    exit (atoi (exitcode)); 
  } 


Hope that is what you where looking for.


> I have to admit that I haven't read the whole SAGA API specifications
> yet, but I wanted to get and idea about the two points mentioned above.

It is a long and tedious read, we know.  But I am afraid
that you need to read most of it if you want to implement an
adaptor, e.g.  Section 1 to 3, and the job part of section 4.

Cheers, Andre.

> Thanks for your help.
> Best regards,
> Jeremie.
-- 
Nothing is ever easy.


More information about the saga-core-wg mailing list