[jsdl-wg] Process Topology

Christopher Smith csmith at platform.com
Tue Apr 19 23:04:39 CDT 2005


If we need to specify different mechanisms for starting up tasks of a
parallel job a la the RSL jobType, then I'd like that to be separate from
the description of the resource allocation required.

For what it's worth, queuing systems like LSF/PBS/SGE don't handle this
startup phase (it's up to the job), so I'd like to see some example terms
describing job process topology (basically simple|multi|mpi use cases),
since I'm not too sure what they would look like, or what semantics would be
required. 

Allocate "as a unit" just means that if I'm going to allocate any cpus from
a resource, I have to allocate "tileSize" cpus.

-- Chris

On 19/4/05 19:04, "Karl Czajkowski" <karlcz at univa.com> wrote:

> Chris:
> 
> One concern I have is that in typical use with GRAM our users expect a
> certain number of job processes to be launched across all
> resources. In other words, their typical job is a fixed-layout
> SPMD-style executable or a master/slave executable and they do not do
> any explicit process startup in the program.
> 
> I am nearly certain that we pass the GRAM "count" attribute as the
> allocation parameter (-n in your example) to the local scheduler but
> depending on the value of our jobType=single|multi|mpi attribute we
> also construct a job script which does the launching of all processes.
> We also have a "hostCount" which we use to sort out SMP allocation
> issues in a simple round-robin fashion.  So, we view a single set of
> declarative parameters as having a projection into both allocation and
> job-control functions.
> 
> We do not handle fancier topology options except through some
> site-specific extensions that people have tried. I am not sure what
> your note means when it says to allocate "as a unit", so I'd like
> clarification on that before we move on.
> 
> I think it is a little too extreme to say JSDL shouldn't express job
> process topology.  That seems just important to me as resource
> topology, and the "isomorphism" or mapping between the two is also
> important.  What you rightly point out is that the operational
> behavior to _get_ the mapping is dependent on some combination of the
> job and local scheduler runtime environments, e.g. does the manager or
> the job take care of task launch.
> 
> I never much liked our jobType attribute, but it is there because this
> problem had to be addressed somehow. I would accept an alternate
> rendering where separate logical managers accepted a homogeneous job
> type and had a fixed behavior, so this would be implicit context. This
> would also better handle the esoteric "which mpirun to use" problem we
> can have with "mpi" type jobs on systems with native MPI and MPICH-G,
> etc.
> 
> So to reiterate: shouldn't JSDL express job process topology or
> mapping into resources even if it is underspecified "how" the
> processes will get that way?  Is it necessary to express two
> topologies, or just to expect that they have identical expression in
> one place in JSDL, and it requires context and/or extended content to
> define how the processes get that way?
> 
> 
> karl





More information about the jsdl-wg mailing list