[DRMAA-WG] DRMAA2 comments

Mariusz Mamoński mamonski at man.poznan.pl
Wed May 4 13:58:34 CDT 2011


On 29 April 2011 13:10, Nadav Brandes <nadavbrandes at gmail.com> wrote:
> Hi guys,
>
> My team and I have finished going over the latest draft of DRMAA2, and we
> have some comments, suggestions and questions about it.
> We want to hear your opinion about these issues.
>
> Given a jobId, you can easily get its Job object using the method
> JobSession::getJobs(in JobInfo filter), if you give has as a filter a
> JobInfo with the wanted jobId (maybe it would be an easier shorthand if
> DRMAA had a method JobSession::getJob(string jobId), but this is a different
> issue). But, given a jobArrayId, there is no way to get its JobArray object,
> which is a great limit of DRMAA that doesn't really let users to use the
> JobArray feature in DRMAA as it is used in most batch systems. I think that
> there should be added a similar method JobSession::getJobArrays(in
> JobArrayInfo filter), or at least a method JobSession::getJobArray(string
> jobArrayId).
> A very important feature that many batch systems support is the ability to
> limit the number of jobs in a job array that may run simultaneously (in LSF
> it's called "Slot Limit" and you can read about it at
> http://www-cecpv.u-strasbg.fr/Documentations/lsf/html/lsf6.1_admin/G_jobarrays.html#26618).
> I think that DRMAA can also support this feature by:
>
> Change the method JobSession::runBulkJobs so it will also accept an optional
> argument in long slotLimit (if it's UNSET then no slot limit will be
> assigned to the new job array).

Torque also supports this feature. What about Grid Engine?

> Add a new method JobArray::changeSlotLimit(in long slotLimit)
>
> There are some parameters that most batch systems allow changing for already
> submitted jobs, but DRMAA doesn't support changing them. For example, DRMAA
> doesn't let you change the priority or queue of an already submitted jobs. I
> think that methods Job::changePriority(in long priority) and
> Job::changeQueue(in string queueName) should be added.
> Many batch systems allow rerunning existing jobs. Although DRMAA has a field
> called rerunnable in the JobTemplate struct, it doesn't allow users to
> actually rerun jobs. Maybe a method Job::rerun() could be added to DRMAA.
> I have a question. Does DRMAA support Generic Resources? (for example, if I
> have a cluster where some of its nodes have GPU cards, and I want to submit
> jobs that require a certain amount of GPUs, so I would like the batch system
> to manage it for me, as many batch systems know how to manage).
>
> Thank you for reading all of this. I would very like to hear what you think
> about each of the bullets above.
>
> Regards,
> Nadav
>
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>



-- 
Mariusz


More information about the drmaa-wg mailing list