[drmaa-wg] DRMAA TEST SUITE

Hrabri Rajic hrabri at sbcglobal.net
Fri Mar 24 00:20:45 CST 2006


Hi Ruben, Peter,

It might be a good idea for two of you to check drama_wif* functions for
correctness from your standpoint.  Tracker 1125,
https://forge.gridforum.org/tracker/?aid=1125 could explain the reasons for
many changes those routine went thru.

Attached is the up to date DRMAA spec.

Thx

	Hrabri


> -----Original Message-----
> From: owner-drmaa-wg at ggf.org [mailto:owner-drmaa-wg at ggf.org] On Behalf Of
> Ruben Santiago Montero
> Sent: Thursday, March 23, 2006 4:55 AM
> To: Peter Tröger
> Cc: DRMAA Working Group
> Subject: Re: [drmaa-wg] DRMAA TEST SUITE
> 
> Hi Peter,
> 
> On Tuesday 21 March 2006 21:43, you wrote:
> > > Sorry, I do not agree. In the DRMS context, job life cycle comprises
> all
> > > the job execution stages since the job enters the DRM system. In this
> > > sense, whenever a job is submitted there should be a termination
> (either
> > > it actually ran or not). I can give you an example, if you submit a
> job
> > > (qsub) and then you kill it (qdel), it is obvious that the job
> terminated
> > > abnormally (it has been killed), although the job never entered the
> > > running state.
> >
> > This is one possible interpretation, I agree. The DRMAA spec is aligned
> > to POSIX semantics here - it is only possible to have something
> > terminated which was running (== executed) before.
> 
> OK!!
> >
> > > There is no relation between if the job terminated normally and if
> there
> > > is no further information from the DRM. In the previous example (a job
> > > that has been killed) could or could not be more information from the
> > > DRMS.  But in any case, it is clear that the job terminated
> abnormally.
> > >
> > > drmaa_wifexited description should concentrate in one aspect since
> there
> > > is no obvious (or general) relation between job termination and
> getting
> > > further information from DRM.
> >
> > You are right. The main intention of drmaa_wifexited() is to tell you if
> > additional information about the job execution ending is available. The
> > final status of the job is provided by drmaa_job_ps(), and nothing else.
> 
> OK, We will fix the drmaa_wifexited() in GridWay DRMAA according to this.
> 
> >
> > The confusion might eventually be solvable by a slight reformulation of
> > the first sentences in the drmaa_wif...() descriptions, in order to
> > avoid the word "termination". This would not lead to a change of
> semantics.
> >
> > I have no good proposal - DRMAA group ?
> >
> > >> ( Note: The testsuite assumes here that unusable input files are
> > >> detected by the DRM before the job starts. This  seems to be
> realistic,
> > >> since file staging operations are usually not part of the job
> > >> execution.)
> > >
> > > I do not think so. Usually job preparation stages are part of the job
> > > execution, for example:
> >
> > ...
> >
> > > Therefore I suggest removing the ST_ERROR_INPUT_FAIURE,
> > > ST_ERROR_FILE_FAILURE and  ST_ERROR_FILE_FAILURE from the official
> test
> > > suite. In the previous DRMs at least, you can submit a job with output
> > > file /etc/passwd or an unusable input file , the job is queued, runs
> and
> > > fails.
> >
> > During the last phone call, the group went through the code. We agree to
> > your impression that the 3 tests are currently not sufficient. The
> > descriptions for "input / output / error stream" job template parameters
> > says that an invalid value should result in the job state
> > DRMAA_PS_FAILED - and nothing more. There is no description of what that
> > means for drmaa_wif...() calls, but the testsuite expects a particular
> > behavior. If you look at DRMAA section 2.6, it is clearly shown that
> > DRMAA_PS_FAILED is possible both for queued and running jobs.
> >
> > Our proposal is to remove the call of drmaa_wifaborted() for
> > ST_INPUT_FILE_FAILURE / ST_ERROR_FILE_FAILURE / ST_OUTPUT_FILE_FAILURE.
> > The drmaa_wait() call does not hurt (since all submitted jobs must be
> > waitable), but the crucial part is the testing for the result of
> > drmaa_synchronize(). After this change, I would expect the test cases to
> > be successful also on your system. In case of malicious input / output /
> > error files, the DRMAA implementation would only be expected to state a
> > job failure. This should work for all GridWay-supported systems, right ?
> > Could you accept this proposal ?
> >
> Sure. It make sense for me also.
> 
> There is also a validator in the state diagram (Section 2.6). I am just
> wondering if a DRMAA implementation could just reject the jobs in these
> tests
> at submission with a DRMAA_ERRNO_DENIED_BY_DRM.
> 
> > BTW: Condor is one example for a system where the existence of input
> > files is checked before the job is started. But at least your GRAM
> > example convinced me that the opposite is also true ;-) ...
> >
> > > Sure. The problem is that the code is not clear either. From DRMAA 1.0
> C
> > > bindings example:
> >
> > ...
> >
> > > From this code it seems that a signaled job should end with a zero
> exited
> > > value from wifexited (as if it did not terminate normally), as opposed
> to
> > > your comments in the previous mails and the code in the DRMAA test
> suite.
> >
> > You are right, as already said above. drmaa_wifexited() mainly indicates
> > the availability of additional information.
> 
> OK
> >
> > Regards,
> > Peter.
> 
> Best Regards,
> Rubén
> --
> +-----------------------------------------------------------+
>  Dr. Ruben Santiago Montero
>  Assistant Professor
>  Dpto. Arquitectura de Computadores y Automatica
>  Facultad de Informatica
>  Universidad Complutense      phone  : +34 91 394 75 38
>  28040 Madrid                 fax    : +34 91 394 75 27
>  Spain                        email  : rubensm at dacya.ucm.es
>  http://asds.dacya.ucm.es/
> +-----------------------------------------------------------+
> 
> GridWay, The Way to Grid! http://www.gridway.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ggf-rec-drmaa-1_0-corrected_1125r3.pdf
Type: application/pdf
Size: 346270 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/drmaa-wg/attachments/20060324/0910e525/attachment.pdf 


More information about the drmaa-wg mailing list