[DRMAA-WG] Work distribution

Mariusz Mamoński mamonski at man.poznan.pl
Mon Feb 15 15:01:55 CST 2010


Hi Peter, all,

2010/2/15 Peter Tröger <peter at troeger.eu>:
> Dear core members,
>
> in order to have any chance for some kind of finalized API until OGF28, we need to distribute the work now. I need volunteers to independently take care of the following topics. We mainly talk about man page crawling and text snippet writing:
>
> --- snip ----
>
> New job template attributes.
>
> This includes the continuation of DRM system comparison we already started in the Google spreadsheet. It must also include the hunt for new job template placeholders, since this was one of the major wishes from the survey.
>
> File staging.
>
> We need the exhaustive description of how the new fileTransfers attribute should be used. Order of activities and security might be topics here. Ideally, somebody with a GridFTP background can also contribute. The initial idea was to copy from SAGA / LSF, so if you already know at least one of these systems, please volunteer.
>
I did some "research" of the file staging capabilities  in the LSF,
Torque and the PBS Pro systems (you can find a short summary in the
Google Spreadsheet - the "File Staging" tab).

Some thoughts:
- most of the batch systems were designed to work with network shared
file system. However one can imagine a situation when the home file
system for some reasons is not shared among the worker nodes (in my
opinion this is probably most relevant for the Condor, as it is
focused on harvesting idle cpu cycles of workstations rather than
managing dedicated cluster). Most of the system offers in this case
some simple file staging capabilities (mostly based on rcp/scp). I
guess that in Grid Engine this can be also implemented (if really
needed) by proper prolog/epilog scripts (assuming that list of files
to be staged is provided as environment variable in similar fashion as
stdin/out/err staging could be realized in DRMAA 1.0)
- i would prefer to keep this interface as simple as possible and thus
handle the only case that can not be handled without interaction with
DRMS: staging file from submission host to execution host, as the
execution host is usually not know before a job starts.
- staging files using some other protocols (e.g. ftp, webdav) would
require passing credentials explicitly (except gridftp), what is "out
of scope of the DRMAA spec". If needed user can in first step stage
all files to submission host using some other tools, and finally,
using DRMAA, to execution host.
- in order to keep the interface really simply i would assume that
file names (not necessary the absolute paths) are the same both on
execution host and submission host. (again if not, the user can do
easily some workaround by copying/moving the file on submission host).

So my proposition of the DRMAA staging interface looks as follows:
split "fileTransfers" attribute into two attributes (also of the
OrderedStringList type):
- stageInFiles
- stageOutFiles
which are simple list of files to be staged-in/staged-out (no URLs,
only paths). The paths can be relative (to current working directory
on submission host, and job working directory on execution host).

I don't want to be blindfolded with batch systems use cases (there are
one grid implementation for DRMAA 1.0), so if at least one person
complain i have nothing against staying with the fileTransfers
attribute which operates on full URLs.

> Advanced reservation.
>
> The API is in the IDL part, but we still need the detailed functions descriptions. Mariusz, could you take care of this ?
>
ok, i will handle the advance reservations part of the DRMAA spec.
> Job states.
>
> There is a set of wishes regarding more job states and transitions. We also have a pending mapping to other peoples job stage models.
>
> Thread safety.
>
> Somebody with a strong DRMAA implementation background needs to scan his implementation for critical (and non-critical) code parts with respect to thread safety. All experiences should be persisted in the new spec.
>
> C binding.
>
> We spent some time in the OO world, somebody needs to try a C language binding. GFD.143 shows how such a document can look like. Roger, maybe something for you ?
>
> --- snip ----
>
> All information is the Wiki. Start on this page, follow the links, and look for yellow boxes:
>
> http://wikis.sun.com/display/DRMAAv2/DRMAAv2+API
>
> I will continue to work as the integration / coordination point. Maybe I can also take care of one or the other specific problem, but definitely not for all of them. Our deadline is March 15th, then we will present to the other groups.
>
> Thanks,
> Peter.
>
>
>
>
> --
>  drmaa-wg mailing list
>  drmaa-wg at ogf.org
>  http://www.ogf.org/mailman/listinfo/drmaa-wg
>


Cheers,
-- 
Mariusz


More information about the drmaa-wg mailing list