[SAGA-RG] quick poll: SPMD attributes

Ole Weidner oweidner at cct.lsu.edu
Thu Dec 10 16:16:05 CST 2009


Hi,

On Dec 10, 2009, at 3:34 PM, Andre Merzky wrote:

> Hi all, 
> 
> we had a very productive meeting with the DRMAA guys today, and will
> continue tomorrow.  We'll send around notes next week I think, but
> for now I'd love to have some quick feedback on a very specific
> topic.
> 
> At the moment, SAGA has the following job description attributes to
> specify job mnultiplicity, which are derived from the JSDL core and
> JSDL SPMD extension:
> 
>  - SPMDVariation
>    - MPI type etc.
> 
>  - TotalCPUCount
>    - total number of CPUs required by the job
> 
>  - NumberOfProcesses
>    - number of instances of the Executable that the consuming
>      system MUST start
> 
>  - ProcessesPerHost
>    - number of instances of the Executable that the consuming
>      system MUST start per host.
> 
>  - ThreadsPerProcess
>    - number of threads per process (i.e., per instance of the
>      Executable
> 
> There is not much discussion about SPMDVariation (anymore), but the
> others are appearently somewhat inconsistent:
> 
>  - processes can be started by the backend, threads can't, so why
>    specify ThreadsPerProcess?
>  - hosts can have multiple CPUs - this is not reflected
>  - CPUs can have multiple cores - this is not reflected
>  - in general, the attributes seem somewhat inconsistent, and its
>    hard to specify the values for specific use cases.

I agree. The current set of attributes is somewhat clumsy and not always intuitive to use. The notion of a thread should not be reflected in the job description - it should be handled on application side. 

> 
> Some of the concerns may get addressed by resource discovery and
> reservation, but not all.  So, we would like to propose the
> following replacement list for these attributes:
> 
>  SPMDVariation
>  NumberOfProcesses
>  ProcessesPerMachine
>  ProcessesPerSlot
>  ProcessesPerCore
> 
> (Slot and Machine are the DRMAA terms for CPU and Host).

Let's go with CPU and host then. Especially slot is IMHO a rather weird term - cpu is so much easier to understand. 

> 
> The NumberOfProcesses is to be interpreted as exact number by the
> backend.  The ProcessesPerXYZ are to be interpreted as upper bounds
> by the backend.  
> 
> So, for example, one could specify a 16 process MPICH job as
> 
>  SPMDVariation     = MPICH
>  NumberOfProcesses = 16
>  ProcessesPerCore  = 2

Ok, but what if the backend doesn't understand these things, (RSL for example only understands number of nodes and number of processes). In this case the adaptor itself would have to (a) figure out the number of cores per cpu (b) the number of cpus per node and (c) make a reservation for (16/2/#cores_per_cpu/#cores_per_node) nodes. The globus GRAM adaptor already does something similar to this due to the shortcomings of RSL. Not pretty. 

And the more possible attributes you have, the more options you give the user to define the same thing. E.g



> 
> That would allow on to run on a 2-way QuadCore host, placing two
> processes on each of the 8 cores.
> 
> It would also allow to run on two  4-way SingleCore hosts, placing
> one process on each core.
> 
> Well, try to specify that with the current attributes - very
> difficult!
> 
> So, the questions we have is
> 
>  - does the above proposal indeed capture the SAGA MPI use cases?
>  - if not, which use cases are not covered?  How can they be
>    addressed?
> 
> To be clear: this is not intented to be an errata to the current
> SAGA spec, but rather a consideration for the next SAGA Version...
> 
> Thanks, Andre.

Cheers,
Ole

> 
> 
> -- 
> Nothing is ever easy.
> --
>  saga-rg mailing list
>  saga-rg at ogf.org
>  http://www.ogf.org/mailman/listinfo/saga-rg



More information about the saga-rg mailing list