[DRMAA-WG] maxSlots attribute

Daniel Templeton daniel.templeton at oracle.com
Wed Mar 24 12:53:19 CDT 2010


What does that mean when you say "retrieve them programatically?"  What 
tool are they using to do it?  What is the interface?

Daniel

On 03/24/10 06:03, Andre Merzky wrote:
> Hi,
>
> Quoting [Peter Tr?ger] (Mar 23 2010):
>> Hi,
>>
>>> As long as the MonitoringSession::drmsQueueNames is nothing more
>>> than an opaque set of strings that are the valid values for
>>> JobTemplate::queueName, I can live with it.  I can see where that
>>> would be useful for a portal.  I thought, however, that we had come
>>> to the conclusion previously that portals and user interfaces were
>>> not really our target applications.  (Anyone remember what feature
>>> spawned that conclusion?)  I thought that DRMAA was specifically
>>> focused on applications integrating with clusters.  If so, a list of
>>> opaque strings is useless.
>>
>> We dropped the portal example, that's true. The most convincing DRMAA
>> applications at the moment are high-level APIs and meta-schedulers on
>> top of / with DRMAA support.
>>
>> I did some field study to get the picture right. LSF, PBS, SGE,
>> LoadLeveler, SAGA, Globus and GridWay can submit jobs to particular
>> queues. In LoadLeveler, queues are called "classes". Condor, JSDL and
>> OGSA-BES seem to have no queue concept - correct me if I am wrong. The
>> retrieval of the list of queue names is only supported in:
>>
>> LSF: bqueues (http://www.vub.ac.be/BFUCC/LSF/man/bqueues.1.html)
>> PBS: qstat -q (http://linux.die.net/man/1/qstat)
>> SGE: qstat -f
>> LoadLeveler: llclass -l (http://www.ccs.ornl.gov/Cheetah/
>> LL.html#Classes)
>>
>> So if we add the monitoring facility, an empty return value must be
>> still valid.
>>
>>> By the way, you'll also have to give a little thought to reconciling
>>> the 1:1 queue/host model with the 1:n and n:m models, as far as
>>> identifying them in a list goes.
>>
>> This is the true counter argument. If DRMAA monitoring gives no
>> additional hints here, invalid combinations of valid machine / queue
>> names in the job template could occur.
>> Let's wait if any defender of queue list monitoring stands up.
>> Otherwise, I propose to keep only the queue name attribute in the job
>> template.
>
> It's not a hard requirement from our side, but I think its utterly
> useful.  In general, our end users are complaining more often than
> not if they need to manually retrieve resource details like service
> contact URLs or queue names from some obscure web page, instead of
> being able to retrieve those information programatically.  The
> manual way is simply too error prone, tedious and static.
>
> Cheers, Andre.
>
>
>>
>> /Peter.
>>
>>
>>> Daniel
>>>
>>> On 3/23/10 10:27 AM, Peter Tröger wrote:
>>>>> As I said in the email I just wrote, I'm willing to be convinced
>>>>> of the
>>>>> value of adding queues to the job submission side of things.  I am,
>>>>> however, fundamentally opposed to adding queues to the monitoring
>>>>> side.
>>>>
>>>> I will heavily insist on queue support in DRMAAv2, This is a long
>>>> demanded feature, which also popped up again in the survey.
>>>>
>>>>> The various concepts of queues are too different for that to make
>>>>> any
>>>>> sense.  There is absolutely no way we will be able to model both
>>>>> LSF and
>>>>> SGE queues in a way that is abstract enough to be consistent and
>>>>> still
>>>>> specific enough to be meaningful and accurate.  We'll talk on the
>>>>> next
>>>>> call. :)
>>>>
>>>> The intention of the current model is that JobTemplate::queueName
>>>> and MonitoringSession:: drmsQueueNames act as counterparts. DRMAA
>>>> would promise that all strings that show up in MonitoringSession::
>>>> drmsQueueNames are valid input for JobTemplate::queueName. Nothing
>>>> more.
>>>> The use case are DRMAA-based portals and command-line applications.
>>>> The interpretation of what a queue is can be provided by the
>>>> library implementation - at the end, the user anyway has to reason
>>>> about the meaning of queue names.
>>>>
>>>> We could relax the conditions so that other values are also allowed
>>>> in JobTemplate::queueName. This would allow MonitoringSession::
>>>> drmsQueueNames to return nothing in SGE. This must be anyway
>>>> possible - Condor has no queue concept at all.
>>>>
>>>> I could also agree to remove MonitoringSession::
>>>> queueMaxWallclockTime and  MonitoringSession::
>>>> queueMaxSlotsAllowed, since these two attributes are the ones that
>>>> demand a particular understanding of what a queue is.
>>>>
>>>> Best,
>>>> Peter.



More information about the drmaa-wg mailing list