[jsdl-wg] SPMD woes...

Andreas Savva andreas.savva at jp.fujitsu.com
Sun Mar 7 23:02:55 CST 2010


Hi Andre,

Thank you for the detailed email about SMPD. If you don't mind I'd like to
answer the last points first and then comment on the rest.

> So, here is the biggie: Is JSDL at some point considering to revise
> the SPMD spec?

Yes, it's definitely possible given new use cases and sufficient
enthusiasm.

> If so, can we expect something along the lines
> above, or is that, aehem, 'out of scope'?  If not, how would you
> propose to align the SAGA use cases with JSDL/Glue/..., so that we
> still end up with an implementable standards stack?

We don't have an automatic 'out of scope' filter, contrary to popular
belief. :-) But for obvious reasons I cannot guarantee you a specific
outcome ahead of time. The end result depends on what use cases people
think important. From what I heard in the past a number of people seem
interested in adding 'core' support to spmd. If cores are adequately
represented in the Glue schema, which we said we are going to use for
future jsdl resource requirements, I would imagine it would not be a big
deal; once other pieces are in place.

> We don't really want to invent our own schemas, as its most likely
> that it will not map to JSDL then - but we need to cater our use
> cases one way or the other...

I hope you don't feel that you have to produce your own schemas. Let's
talk about your use cases at OGF. I can try to arrange some time for this
topic in the jsdl general session if you like.

Rest inline:

On Fri, 05 Mar 2010 20:33:14 +0900, Andre Merzky <andre at merzky.net> wrote:

>
> Hia guys,
>
> I'd like to share some thoughts and discussion points from the SAGA
> group, in respect to the JSDL SPMD spec.  Sorry for the long post...
>
> At the moment, SAGA has the following job description attributes to
> specify job multiplicity, which are derived from the JSDL core and
> JSDL SPMD extension:
>
>   - SPMDVariation
>     - MPI type etc.
>
>   - TotalCPUCount
>     - total number of CPUs required by the job
>
>   - NumberOfProcesses
>     - number of instances of the Executable that the consuming
>       system MUST start
>
>   - ProcessesPerHost
>     - number of instances of the Executable that the consuming
>       system MUST start per host.
>
>   - ThreadsPerProcess
>     - number of threads per process (i.e., per instance of the
>       Executable
>
>
> There is not much discussion about SPMDVariation (anymore), but the
> others are appearently somewhat inconsistent:
>
>   - processes can be started by the backend, threads can't, so why
>     specify ThreadsPerProcess?  How is it to be used by the backend?

Looking back at the SPMD tracker we introduced ThreadsPerProcess for the
OpenMP use case, as an indicator of computational weight. One expected
usage pattern is described on page 9, 5.4.4. Sec. Attributes:

     actualIndividualCPUCount—An optional attribute.
       If true, the value of the individual number of CPUs allocated
       to the job on each host is used as the value of the ThreadsPerProcess
element.

Nowadays I would imagine we would have specified this as cores instead of
cpus. In any case for a straight MPI application you would not need to use
this element.

>
>   - hosts can have multiple CPUs - this is not reflected

I think this can be expressed in the JSDL resource requirements.

>
>   - CPUs can have multiple cores - this is not reflected
>
>   - cores can have multiple hardware threads - this is not
>     reflected, unless this is what is meant by 'ThreadsPerProcess'?

The JSDL specs were done before cores were widely available. So I think it
is to be expected that support for multi-cores is not present. (Unless you
want to think of fractional CPU values as a way of expressing cores; let's
not.)


> In general, the attributes seem somewhat inconsistent, and not eay
> to use.  Examples:

I won't argue the 'not easy to use' but in the examples below you should
not be using the spmd elements for resource allocation. The spec (p4 & p8)
is clear that spmd elements are intended to describe the application not
its resources.


I'll use jsdl schema examples below:

>   limit the number of hosts to 2, no matter the number of CPUs
>     NumberOfProcesses = 10
>     ProcessesPerHost  =  5   // specify proc per resource

The above does not guarantee only two hosts. The following does

<jsdl:TotalResourceCount>
     <jsdl:Exact>2.0</jsdl:Exact>
</jsdl:TotalResourceCount>

>
>   limit the number of CPUs to 2, no matter the number of CPUs per host
>     NumberOfProcesses = 10
>     TotalCPUCount     =  2   // specify total resource num

Only this part is effective
<jsdl:TotalCPUCount>
     <jsdl:Exact>2.0</jsdl:Exact>
</jsdl:TotalCPUCount>


> Also, there is no way to ensure that an application instance obtains
> exactly one CPU, unless one limits ProcessesPerHost to 1, which will
> waste significant resources on multi-CPU systems.

No, exactly one CPU does not depend on ProcessesPerHost. It is simply

<jsdl:TotalCPUCount>
     <jsdl:Exact>1.0</jsdl:Exact>
</jsdl:TotalCPUCount>

Unless you use <ExclusiveExecution> the default semantics are to share
resources so you shouldn't be wasting resources on multi-cpu systems.

>
> I guess the problem really is that JSDL tries to stay out of the
> resource description business, and thats probably a wise decision.
> That is what Glue&Co are about to deal with.
>
> Nevertheless, the current specs are cumbersome from an *application*
> perspective.
>
> We are curently considering to change our attributes, to
>
>     SPMDVariation
>     NumberOfProcesses
>     TotalNodeCount
>     TotalSocketCount
>     TotalCoreCount
>     TotalHWThreadCount
>     ProcessesPerNode
>     ProcessesPerSocket (*)
>     ProcessesPerCore
>     ProcessesPerHWThread
>
> The 'ProcessesPerXYZ' attributes are here interpreted as upper
> limits: the backend MUST NOT start more processes per XYZ, but MAY
> start less.  Those attributes are not fully translatable into the
> JSDL attributes, but map pretty well to other job descriptions so
> far.
>
> The typical use case for us would then boil down to
>
>   SPMDVariation        = MPI
>   NumberOfProcesses    = 32
>   ProcessesPerHWThread = 1
>
> which seems to be what most users want.
>
>
> Other use cases we would be interested in, for example to start one
> additional IO process per node, would require more attributes and
> semantics than we are willing to introduce right now, like
>
>   additional attribs:
>     HWThreadsPerCore
>     HWThreadsPerSocket
>     HWThreadsPerNode
>     CoresPerSocket
>     CoresPerNode
>     SocketsPerNode
>
>   specification:
>     SPMDVariation     = MPI
>     NumberOfProcesses = 32 + TotalNodeCount
>     ProcessesPerNode  = HWThreadsPerNode + 1
>
> So, here is the biggie: Is JSDL at some point considering to revise
> the SPMD spec?  If so, can we expect something along the lines
> above, or is that, aehem, 'out of scope'?  If not, how would you
> propose to align the SAGA use cases with JSDL/Glue/..., so that we
> still end up with an implementable standards stack?
>
> We don't really want to invent our own schemas, as its most likely
> that it will not map to JSDL then - but we need to cater our use
> cases one way or the other...
>
> Best, Andre.
>
>
>
>
> (*) Socket basically stands for CPU, but is supposed to clarify we
> are not talking about cores.


Take care,
-- 
Andreas Savva
Fujitsu Laboratories Ltd


More information about the jsdl-wg mailing list