[jsdl-wg] SPMD woes...

Andre Merzky andre at merzky.net
Fri Mar 5 05:33:14 CST 2010


Hia guys, 

I'd like to share some thoughts and discussion points from the SAGA
group, in respect to the JSDL SPMD spec.  Sorry for the long post...

At the moment, SAGA has the following job description attributes to
specify job multiplicity, which are derived from the JSDL core and
JSDL SPMD extension:

  - SPMDVariation
    - MPI type etc.

  - TotalCPUCount
    - total number of CPUs required by the job

  - NumberOfProcesses
    - number of instances of the Executable that the consuming
      system MUST start

  - ProcessesPerHost
    - number of instances of the Executable that the consuming
      system MUST start per host.

  - ThreadsPerProcess
    - number of threads per process (i.e., per instance of the
      Executable


There is not much discussion about SPMDVariation (anymore), but the
others are appearently somewhat inconsistent:

  - processes can be started by the backend, threads can't, so why
    specify ThreadsPerProcess?  How is it to be used by the backend?

  - hosts can have multiple CPUs - this is not reflected

  - CPUs can have multiple cores - this is not reflected

  - cores can have multiple hardware threads - this is not
    reflected, unless this is what is meant by 'ThreadsPerProcess'?


In general, the attributes seem somewhat inconsistent, and not eay
to use.  Examples:


  limit the number of hosts to 2, no matter the number of CPUs
    NumberOfProcesses = 10
    ProcessesPerHost  =  5   // specify proc per resource

  limit the number of CPUs to 2, no matter the number of CPUs per host
    NumberOfProcesses = 10
    TotalCPUCount     =  2   // specify total resource num

  
Also, there is no way to ensure that an application instance obtains
exactly one CPU, unless one limits ProcessesPerHost to 1, which will
waste significant resources on multi-CPU systems.



I guess the problem really is that JSDL tries to stay out of the
resource description business, and thats probably a wise decision.
That is what Glue&Co are about to deal with.

Nevertheless, the current specs are cumbersome from an *application*
perspective.

We are curently considering to change our attributes, to

    SPMDVariation
    NumberOfProcesses
    TotalNodeCount
    TotalSocketCount
    TotalCoreCount
    TotalHWThreadCount
    ProcessesPerNode
    ProcessesPerSocket (*)
    ProcessesPerCore
    ProcessesPerHWThread

The 'ProcessesPerXYZ' attributes are here interpreted as upper
limits: the backend MUST NOT start more processes per XYZ, but MAY
start less.  Those attributes are not fully translatable into the
JSDL attributes, but map pretty well to other job descriptions so
far.

The typical use case for us would then boil down to

  SPMDVariation        = MPI
  NumberOfProcesses    = 32
  ProcessesPerHWThread = 1

which seems to be what most users want.  


Other use cases we would be interested in, for example to start one
additional IO process per node, would require more attributes and
semantics than we are willing to introduce right now, like

  additional attribs:
    HWThreadsPerCore
    HWThreadsPerSocket
    HWThreadsPerNode
    CoresPerSocket
    CoresPerNode
    SocketsPerNode

  specification:
    SPMDVariation     = MPI
    NumberOfProcesses = 32 + TotalNodeCount
    ProcessesPerNode  = HWThreadsPerNode + 1


So, here is the biggie: Is JSDL at some point considering to revise
the SPMD spec?  If so, can we expect something along the lines
above, or is that, aehem, 'out of scope'?  If not, how would you
propose to align the SAGA use cases with JSDL/Glue/..., so that we
still end up with an implementable standards stack?

We don't really want to invent our own schemas, as its most likely
that it will not map to JSDL then - but we need to cater our use
cases one way or the other...

Best, Andre.




(*) Socket basically stands for CPU, but is supposed to clarify we
are not talking about cores.





-- 
Nothing is ever easy.


More information about the jsdl-wg mailing list