[DRMAA-WG] DRMAA2

Nadav Brandes nadavbrandes at gmail.com
Wed Jan 12 10:03:12 CST 2011


Hello everyone,

I went over your API description with my team (as described in
http://www.drmaa.org/drmaav2_draft5.pdf).



If it's not too late, we have few questions/suggestions:

·         Can one get a 'Job' object representing a job already submitted
once, given only the job index (as an integer)?

·         It seems like the 'JobInfo' interface misses few parameters given
in the 'JobTemplate' interface. For example, can one get the 'remoteCommand'
of a job that was already submitted, if he only has a 'Job' object in hand,
and not the 'JobTemplate'?

·         Does DRMAA support job-arrays feature (meaning submitting a group
of tasks in one job, that has a single ID)? Most schedulers support this
feature (include LSF, Moab and SGE). You do have a feature of 'runBulkJobs'
that sends a sequence of jobs altogether, but it also returns a sequence of
'Job' objects, and not a single job with a single ID.

·         Does DRMAA support the notion of queues (a feature that is
supported by all of the schedulers I know)? We believe that it could be very
useful if one could determine a queue in 'JobTemplate', change the queue of
an existing job, and also get a list of all the queues in the cluster.

·         Many batch systems have a feature that allows giving a 'project
name' to submitted jobs. We believe that it could also be very useful if
'JobTemplate' had such field.

·         Sometimes, especially when dealing with large clusters containing
a large number of compute nodes (which some of them might be out of order),
jobs might fail randomly, without any justified reason. We think it could be
useful if DRMAA supported a feature that allows rerunning failed jobs (as
many schedulers allow, like LSF).  Such 'rerun()' method could be added to
the 'Job' interface.

·         Modern schedulers (like Moab and LSF) support advanced features of
memory management, cores management, and also general resources management
(like GPUs). In general, it means giving a list of required resources to
each submitted job (for example, submitting a job that requires 5 cores,
12GB RAM, and 2 GPUs). Then the scheduler knows how to schedule the jobs so
each running job will have all the resources it needs. If 'JobTemplate' had
a resources dictionary field, it could also be very useful.



This is it for now, thank for reading it.

I would like to hear what you think.



Best Regards,

Nadav


2010/12/21 Peter Tröger <peter at troeger.eu>

> Hi Navad,
>
> Now I saw the documentation of the planned interface for DRMAA2, and I find
> it to be a great improvement, and very useful for my organization. I am
> truly anxious to try it, and have some more questions about its release:
>
>    - Do you know which distributed resource manager will be the first to
>    implement DRMAA2? (SGE maybe?) Also, do you have any estimation on when
>    it'll happen, and when will I be able to download a trial version of it?
>
> Since we have the Oracle Grid Engine Product Manager as one of the
> co-chairs, I leave the implementation estimation to you ;-) .... We also
> have very capable people in Poznan, which might take care of non-OGE
> systems. We assume to put out the spec in January, and from there, the group
> can only hope. From experience, I would expect nothing useful before Summer
> 2011.
>
>
>    - Is it still possible to suggest ideas that we have about the
>    interface of DRMAA2? If so, how is it done? Is it customary to share ideas
>    in this forum, or do you prefer it to be done through Wiki?
>
> The best thing is to start a discussion on the list. The Wiki is good as
> reference. Comments on the Wiki pages might get lost ...
>
> Best regards,
> Peter.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/drmaa-wg/attachments/20110112/a5e814d9/attachment.html 


More information about the drmaa-wg mailing list