[DRMAA-WG] DRMAA2 comments

Daniel Gruber dgruber at univa.com
Wed May 4 14:03:51 CDT 2011


Am 04.05.2011 um 20:58 schrieb Mariusz Mamoński:

> On 29 April 2011 13:10, Nadav Brandes <nadavbrandes at gmail.com> wrote:
>> Hi guys,
>> 
>> My team and I have finished going over the latest draft of DRMAA2, and we
>> have some comments, suggestions and questions about it.
>> We want to hear your opinion about these issues.
>> 
>> Given a jobId, you can easily get its Job object using the method
>> JobSession::getJobs(in JobInfo filter), if you give has as a filter a
>> JobInfo with the wanted jobId (maybe it would be an easier shorthand if
>> DRMAA had a method JobSession::getJob(string jobId), but this is a different
>> issue). But, given a jobArrayId, there is no way to get its JobArray object,
>> which is a great limit of DRMAA that doesn't really let users to use the
>> JobArray feature in DRMAA as it is used in most batch systems. I think that
>> there should be added a similar method JobSession::getJobArrays(in
>> JobArrayInfo filter), or at least a method JobSession::getJobArray(string
>> jobArrayId).
>> A very important feature that many batch systems support is the ability to
>> limit the number of jobs in a job array that may run simultaneously (in LSF
>> it's called "Slot Limit" and you can read about it at
>> http://www-cecpv.u-strasbg.fr/Documentations/lsf/html/lsf6.1_admin/G_jobarrays.html#26618).
>> I think that DRMAA can also support this feature by:
>> 
>> Change the method JobSession::runBulkJobs so it will also accept an optional
>> argument in long slotLimit (if it's UNSET then no slot limit will be
>> assigned to the new job array).
> 
> Torque also supports this feature. What about Grid Engine?

Grid engine have support for limiting the max. amount of *tasks* running 
at the same time. Thats somewhat different.

Daniel


> 
>> Add a new method JobArray::changeSlotLimit(in long slotLimit)
>> 
>> There are some parameters that most batch systems allow changing for already
>> submitted jobs, but DRMAA doesn't support changing them. For example, DRMAA
>> doesn't let you change the priority or queue of an already submitted jobs. I
>> think that methods Job::changePriority(in long priority) and
>> Job::changeQueue(in string queueName) should be added.
>> Many batch systems allow rerunning existing jobs. Although DRMAA has a field
>> called rerunnable in the JobTemplate struct, it doesn't allow users to
>> actually rerun jobs. Maybe a method Job::rerun() could be added to DRMAA.
>> I have a question. Does DRMAA support Generic Resources? (for example, if I
>> have a cluster where some of its nodes have GPU cards, and I want to submit
>> jobs that require a certain amount of GPUs, so I would like the batch system
>> to manage it for me, as many batch systems know how to manage).
>> 
>> Thank you for reading all of this. I would very like to hear what you think
>> about each of the bullets above.
>> 
>> Regards,
>> Nadav
>> 
>> --
>>  drmaa-wg mailing list
>>  drmaa-wg at ogf.org
>>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>> 
> 
> 
> 
> -- 
> Mariusz
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg



More information about the drmaa-wg mailing list