[jsdl-wg] Process Topology

Christopher Smith csmith at platform.com
Tue Apr 19 11:40:34 CDT 2005


Ok ... here are my thoughts on process topology and what's currently
expressible in JSDL.

First, I'll list some use cases (they're all parallel jobs):

1. Simple MPI job. Wants 32 processors with 1 processor per resource (in
JSDL, a host is a "resource").

2. OpenMPI job. Wants 32 processors with 8 processors per resource.

3. An OpenMP job. Wants 32 processors. Shared mem of course, so one
resource.

4. A "homegrown" master/slave parallel job (say a ligand docking job). Wants
32 processors. No tiling constraints at all.

* Note that I'm specifically leaving out the Naregi "coupled simulation" use
case (sorry guys), since we determined at the last GGF that it was a case
which could be decomposed into multiple JSDL documents.

Second ... what is process topology? It provides the user a way to express
how resources should be _allocated_ given the characteristics of the job
(usually in terms of IO patterns ... e.g. network communication, disk IO
channel contention, etc). Thus, it's used when the resource manager is
_allocating_ the resources, not when the job is being started/launched.
Therefore, none of the elements used to express process topology should be
in the POSIXApplication section

What we have in JSDL now:

ResourceCount (how many "resources" i.e. hosts I want)
CPUCount (how many processors _per resource_)
TileSize (how many processors to allocate per resource as a unit)
ProcessCount (total number of _processes_ that the job will use to execute
the job)

I will argue that ProcessCount is useless for the purposes of process
topology, since a) it isn't about allocation, and b) there isn't enough
information to tell me how to start/launch a parallel job. It isn't about
allocation since it is irrelevant to the scheduler whether I'll be computing
using threads or processes. It isn't useful for launching because it doesn't
tell me how to spread the ProcessCount processes given a particular
allocated topology.

So that leaves the rest of them.

TileSize and CPUCount are pretty much the same thing. At least for 80% (or
more) of the uses I've seen. The only thing that might cause them to differ
is that I could possibly allocate more than one tile on a host. Given that
CPUCount is a range and that we could express step values in the range (we
can express step values in the range, right?), we don't need TileSize any
more. 

Note: I'm making an assumption here that CPUCount is the number of cpus that
I want from the resource, rather than an expression of how many cpus the
host needs to have configured. If it is the latter, then we do need
TileSize, and replace CPUCount in my examples below with TileSize.

So let's see how these map to the use cases.

1. ResourceCount == 32, CPUCount == 1
  -> LSF : "-n 32 -R span[ptile=1]"
  -> PBS : "-l nodes=32:ppn=1"     (ppn=1 might be the default)

2. ResourceCount == 4,  CPUCount == 8
  -> LSF : "-n 32 -R span[ptile=8]"
  -> PBS : "-l nodes=4:ppn=8"

3. ResourceCount == 1, CPUCount == 32
  -> LSF : "-n 32 -R span[hosts=1]"  (hosts=1 equivalent to ptile=<-n val>)
  -> PBS : "-l nodes=1:ppn=32"

4. ResourceCount == 32, CPUCount == 1
  -> oops ... it doesn't care about tiling
   ResourceCount == 1, CPUCount == 32
  -> hmm ... artificial constraint ... would suck on a blade cluster
   ResourceCount == 1-32, CPUCount == 1,32
  -> oops again ... I might get a total allocation of 32*32 cpus

  * there seems to be a gap!

If we had a term called "TotalCPUCount" for the entire job, I could do:

4. TotalCPUCount == 32
  -> LSF : "-n 32"
  -> PBS : "not sure how to express"

It basically means to grab 32 cpus, regardless of how they are spread.
Basically I just need cpus. This is used a whole hell of a lot within our
customer base. 

So ... in summary ... I propose:

CPUCount (as is if it's allocated cpus per resource)
TileSize (iff CPUCount is an expression of configured cpus in a host)
ResourceCount (as is ... hmmm ... maybe the default value needs to change)
TotalCPUCount (how many cpus this jobs needs to run in total)

-- Chris





More information about the jsdl-wg mailing list