[SAGA-RG] quick poll: SPMD attributes

Andre Merzky andre at merzky.net
Fri Dec 11 04:26:27 CST 2009


hi Andre, 

Quoting [Andre Luckow] (Dec 11 2009):
> 
> Hi,
> > Quoting [Ole Christian Weidner] (Dec 10 2009):
> >> 
> >>> Some of the concerns may get addressed by resource discovery and
> >>> reservation, but not all.  So, we would like to propose the
> >>> following replacement list for these attributes:
> >>> 
> >>> SPMDVariation
> >>> NumberOfProcesses
> >>> ProcessesPerMachine
> >>> ProcessesPerSlot
> >>> ProcessesPerCore
> >>> 
> >>> (Slot and Machine are the DRMAA terms for CPU and Host).
> >> 
> >> Let's go with CPU and host then. Especially slot is IMHO a rather weird term - cpu is so much easier to understand. 
> > 
> > Peter corrected me: its socket, not slot.  Anyway, I am by no means
> > sold to those terms, but rather want to nail down concept and
> > hierarchy.  Machine and CPU would be fine with me, as would be
> > 
> >  node
> >  host
> >  machine
> > 
> > and
> > 
> >  cpu
> >  socket
> >  processor
> 
> I agree in the point that the above attributes are better than the existing SAGA one. However, I 
> believe the way these attributes can be used can become quite complicated. It requires detailed advanced knowledge
> of the Grid resources that will be used. Further, the user can make contradicting specifications, e.g.
> if the ProcessesPerMachine attribute does not fit to the ProcessesPerSockets attribute.

Correct.  For knowledge: that system would only make sense (or at
least make more sense) if combined with a resource discovery and
description.

Contradiction: well, that is often the case anyway, like you specify
an interactive job AND IO/redirection into files, or specify 4
processes but no SPMDVariation.  Not sure what to do about that...


> On a space sharing system a user usually cannot influence the number of cpus/cores he gets 
> per node. E.g. on LONI you must use 8 cores per node on QB and 4 cores per node on all other machines.
> Thus, the most common usage mode will probably be:
> 
> NumberOfProcesses = x
> ProcessesPerMachine == number_of_cores_per_cpu * number_of_sockets
> ProcessesPerSockets == number_of_cores_per_sockets
> ProcessesPerCore = 1
> 
> The question is: How can this 80% case be efficiently supported without loosing the flexibility 
> in the other cases? Does it make sense to declare default values for the ProcessesXXX attributes 
> and to declare them optional?

Agree, they absolutely should be optional, and default to

  ProcessesPerMachine = unspecified
  ProcessesPerSockets = unspecified
  ProcessesPerCore    = 1

And right, you can't enforce number of cpus/cores, but you can set
upper bounds, to make either sure you are not getting on a core
where xxx processes are already running, or, on the other end, allow
the DRM to use a core for multiple process instances.

Makes sense?


> >>> The NumberOfProcesses is to be interpreted as exact number by the
> >>> backend.  The ProcessesPerXYZ are to be interpreted as upper bounds
> >>> by the backend.  
> >>> 
> >>> So, for example, one could specify a 16 process MPICH job as
> >>> 
> >>> SPMDVariation     = MPICH
> >>> NumberOfProcesses = 16
> >>> ProcessesPerCore  = 2
> >> 
> >> Ok, but what if the backend doesn't understand these things, (RSL
> >> for example only understands number of nodes and number of
> >> processes). In this case the adaptor itself would have to (a)
> >> figure out the number of cores per cpu (b) the number of cpus per
> >> node and (c) make a reservation for
> >> (16/2/#cores_per_cpu/#cores_per_node) nodes. The globus GRAM
> >> adaptor already does something similar to this due to the
> >> shortcomings of RSL. Not pretty. 
> > 
> > Right - but it is even worse if the user has to do it on application
> > space.  So, I guess it is like with all other JD attributes: if it
> > is specified, it must be honored - which may effectively limit the
> > number of available backends...
>
> Even worse, different Globus installations differently interpret
> the "count" RSL attribute. Just try to use Abe, Ranger and QB with
> the same RSL file you will see a different behavior. It's just too
> easy to "hack" the JobManager Perl script of Globus.

:-(  Can't do nothin about that I guess... 

Thanks!  Andre.


> Regards, Andre


-- 
Nothing is ever easy.


More information about the saga-rg mailing list