[saga-rg] Task model...

Fri Oct 28 23:06:01 CDT 2005

Hi Group, 

as you have seen in the notes from last GGF, there was some discussion
about the SAGA task model.  This mail summarizes this discussion, and
proposes some changes to the task model to accommodate the suggestions
made in Boston.  Sorry that this mail comes late, I pushed that off
far too long...

As a reminder, the current task model in action looks like this:

---------------------------------------------------------------------
Example 1:
  saga::directory d ("/tmp/");
  saga::directory::task_factory dtf = d.get_task_factory ();
  saga::task t = dtf.mkdir ("test/");

  t.run  ();
  t.wait ();
---------------------------------------------------------------------

So, the minimal number of code lines needed for running a task is 4. 
Note that the asynchroneous mkdir method has the same signature as the
synchroneous, but has a different return value.

Compare that to the mechanism used in MPI or used in GridRPC:

---------------------------------------------------------------------
Example 2:
  saga::directory d ("/tmp/");
  saga::task t  = d.mkdir_async ("test/");

  t.wait ();
---------------------------------------------------------------------

The number of lines is 2, and there is no explicit additional object
(the task factory).  Please note that the created task is
automatically run on creation time.

The discussion in Boston showed that people favor the similar
mechanism of example 2.  Also, SAGA has not a single use case which
requires (or even motivates) the task_factories as explicit objects in
the SAGA API.

But, we do have use cases which motivate that tasks are NOT
autimatically run.  In particular the following scenario of bulk
operations shows that:

---------------------------------------------------------------------
Example 3:
  saga::file f ("/tmp/t.dat");
  saga::file::task_factory dtf = d.get_task_factory ();
  saga::task_container tc;

  for ( int i = 0; i < 1000;  i++ ) 
  {
    saga::task t = ftf.read (i, 1, buffer[i]);
    tc.add_task (t);
  }

  tc.run  ();
  tc.wait ();
---------------------------------------------------------------------

As the tc.run starts all tasks at the same time, the implementation
has the chance to optimize that, and to perform one remote operation
instead of 1000 independend ones.  With the model shown in example 2,
this optimization would be impossible, and the bulk related use cases
in SAGA would be difficult to implement (efficiently).

In order to accommodate the explicit run() method on tasks, there are
two possibilities (well, there are more, but two seem particularily
obvious):

---------------------------------------------------------------------
Example 4a: distinguish task from async call
  saga::directory d ("/tmp/");
  saga::task t_1 = d.mkdir_async ("test/");
  saga::task t_2 = d.mkdir_task  ("test/");

  t_2.run ()

  t_1.wait ();
  t_2.wait ();

Example 4b: flag to start task
  saga::directory d ("/tmp/");
  saga::task t_1 = d.mkdir_async ("test/", TRUE); // TRUE optional
  saga::task t_2 = d.mkdir_async ("test/", FALSE);

  t_2.run ()

  t_1.wait ();
  t_2.wait ();
---------------------------------------------------------------------

Although the version 4b seems somewhat simplier, most people
(including me) seem to oppose a seemingly artificial flag on all async
calls to flag the runnnin of the task.  Also, this would change the
signature if compared to the synchroneous version of the method.

Version 4a OTOH increases the number of methods.  Then again, that might
not be sooooo bad.  The current task model doubles the number of
methods as well.  But as we describe the model in text, and do not
actually add all signatures twice in the spec, the spec stays terse
(for a very specific definition of terse, to be sure).  So, version 4a
would add only one paragraph to the spec :-)

Also, 4a is strictly spoken just a marker to the sync methods, which
makes them asynchroneous.  In different language bindings, this could
look very different, and convenient:

---------------------------------------------------------------------
Example 4a: more versions in C++

4a1:               d.mkdir         ("test/");
                   d.mkdir_sync    ("test/"); // same
  saga::task t_1 = d.mkdir_async   ("test/");
  saga::task t_2 = d.mkdir_task    ("test/");

4a2:               d.mkdir         ("test/");
                   d.sync ::mkdir  ("test/"); // same
  saga::task t_1 = d.async::mkdir  ("test/");
  saga::task t_2 = d.task ::mkdir  ("test/");

4a3:               d.mkdir         ("test/");
                   d.mkdir <sync>  ("test/"); // same
  saga::task t_1 = d.mkdir <async> ("test/");
  saga::task t_2 = d.mkdir <task>  ("test/");
---------------------------------------------------------------------

All these versions are equivalent.  I personally like 4a2 best, Thilo
likes 4a1 (close to MPI), Hartmut favors 4a3 (nice to implement in
C++).

Well, here we are really: the proposal is following:

  P1) get rid of the task factory (there seems no need for it)
  P2) allow tasks which do     run() on creation
  P3) allow tasks which do NOT run() on creation

All both 4a and 4b deliver that, with only small changes in the API
spec (but obviously larger changes in the language bindings).  For 4a,
the versions 4a1 to 4a3 show different posssible C++ language
bindings.

Questions:

   Q1) any comments to the Boston discussion?  I hope I reflected that
       correctly...
   Q2) are there any drawbacks by introducing 4a or 4b, apart from the
       additional number of calls?  
   Q3) Are the advantages (simplicity for most usages, powerfull for 
       bulk operations and others) good enough to introduce 4a or 4b?
   Q4) You like 4a or 4b better?
   Q5) Any comments to 4a1, 4a2 or 4a3? (not part of the Strawman!)

Please answer before November 10th: firstly, we would like to finish
the task model soon, in order to get the Strawman stabilizwed before
GGF16; secondly, we would like to come to a decision during the F2F
meeting at SuperComputing in Seattle (November 12-18).

Cheers, Andre.

-- 
+-----------------------------------------------------------------+
| Andre Merzky                      | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science         | mail: merzky at cs.vu.nl       |
| De Boelelaan 1083a                | www:  http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands    |                             |
+-----------------------------------------------------------------+