[DRMAA-WG] maxSlots attribute

Peter Tröger peter at troeger.eu
Tue Mar 23 15:24:36 CDT 2010


Hi,

> As long as the MonitoringSession::drmsQueueNames is nothing more  
> than an opaque set of strings that are the valid values for  
> JobTemplate::queueName, I can live with it.  I can see where that  
> would be useful for a portal.  I thought, however, that we had come  
> to the conclusion previously that portals and user interfaces were  
> not really our target applications.  (Anyone remember what feature  
> spawned that conclusion?)  I thought that DRMAA was specifically  
> focused on applications integrating with clusters.  If so, a list of  
> opaque strings is useless.

We dropped the portal example, that's true. The most convincing DRMAA  
applications at the moment are high-level APIs and meta-schedulers on  
top of / with DRMAA support.

I did some field study to get the picture right. LSF, PBS, SGE,  
LoadLeveler, SAGA, Globus and GridWay can submit jobs to particular  
queues. In LoadLeveler, queues are called "classes". Condor, JSDL and  
OGSA-BES seem to have no queue concept - correct me if I am wrong. The  
retrieval of the list of queue names is only supported in:

LSF: bqueues (http://www.vub.ac.be/BFUCC/LSF/man/bqueues.1.html)
PBS: qstat -q (http://linux.die.net/man/1/qstat)
SGE: qstat -f
LoadLeveler: llclass -l (http://www.ccs.ornl.gov/Cheetah/ 
LL.html#Classes)

So if we add the monitoring facility, an empty return value must be  
still valid.

> By the way, you'll also have to give a little thought to reconciling  
> the 1:1 queue/host model with the 1:n and n:m models, as far as  
> identifying them in a list goes.

This is the true counter argument. If DRMAA monitoring gives no  
additional hints here, invalid combinations of valid machine / queue  
names in the job template could occur.
Let's wait if any defender of queue list monitoring stands up.  
Otherwise, I propose to keep only the queue name attribute in the job  
template.

/Peter.


> Daniel
>
> On 3/23/10 10:27 AM, Peter Tröger wrote:
>>> As I said in the email I just wrote, I'm willing to be convinced  
>>> of the
>>> value of adding queues to the job submission side of things.  I am,
>>> however, fundamentally opposed to adding queues to the monitoring  
>>> side.
>>
>> I will heavily insist on queue support in DRMAAv2, This is a long  
>> demanded feature, which also popped up again in the survey.
>>
>>> The various concepts of queues are too different for that to make  
>>> any
>>> sense.  There is absolutely no way we will be able to model both  
>>> LSF and
>>> SGE queues in a way that is abstract enough to be consistent and  
>>> still
>>> specific enough to be meaningful and accurate.  We'll talk on the  
>>> next
>>> call. :)
>>
>> The intention of the current model is that JobTemplate::queueName  
>> and MonitoringSession:: drmsQueueNames act as counterparts. DRMAA  
>> would promise that all strings that show up in MonitoringSession::  
>> drmsQueueNames are valid input for JobTemplate::queueName. Nothing  
>> more.
>> The use case are DRMAA-based portals and command-line applications.  
>> The interpretation of what a queue is can be provided by the  
>> library implementation - at the end, the user anyway has to reason  
>> about the meaning of queue names.
>>
>> We could relax the conditions so that other values are also allowed  
>> in JobTemplate::queueName. This would allow MonitoringSession::  
>> drmsQueueNames to return nothing in SGE. This must be anyway  
>> possible - Condor has no queue concept at all.
>>
>> I could also agree to remove MonitoringSession::  
>> queueMaxWallclockTime and  MonitoringSession::  
>> queueMaxSlotsAllowed, since these two attributes are the ones that  
>> demand a particular understanding of what a queue is.
>>
>> Best,
>> Peter.
>



More information about the drmaa-wg mailing list