[DRMAA-WG] Monitoring JobTemplate attributes for running jobs

Peter Tröger peter at troeger.eu
Mon Aug 23 04:50:32 CDT 2010


We already have some understanding of persistency, so the implementation effort is manageable. I am more concerned about a clear separation of live monitoring information and original submission data. For the latter, I saw no use case so far ...

Best,
Peter.

Am 29.07.2010 um 11:02 schrieb Andre Merzky:

> Our use case for having access to the original complete job template
> is that the user can easily resubmit the same job - just changing
> for example some command line parameter, but leaving the remainder
> fixed.   In SAGA this would look like:
> 
>  saga::job::service     js ("drmaa://torque.remote.net/");
>  saga::job::job         j1 = js.get_job (jobid);   // std::string
>  saga::job::description jd = j1.get_description ();
> 
>  jd.set_attributes ("Arguments", new_args);  // std::vector <std::string>
> 
>  saga::job::job j2 = js.create_job (jd);
> 
> 
> I understand that the backend may no be able to keep the original
> job template - in that case, a 'DoesNoExist' exception on
> 'get_description()' would be appropriate, IMHO.  If the DRMAA
> implementation can cache that description somewhere, fine :-)
> 
> My $0.02, Andre.
> 
> 
> PS: saga::job::description == drmaa::job::template
> 
> 
> 
> 
> 
> Quoting [Peter Tr?ger] (Jul 29 2010):
>> From: Peter Tröger <peter at troeger.eu>
>> Date: Thu, 29 Jul 2010 10:07:23 +0200
>> To: Mariusz Mamo??ski <mamonski at man.poznan.pl>,
>> 	drmaa-wg at ogf.org
>> Subject: Re: [DRMAA-WG] Monitoring JobTemplate attributes for running jobs
>> 
>> 
>> Am 28.07.2010 um 23:42 schrieb Mariusz Mamo??ski:
>> 
>>> Hi,
>>> 
>>> 2010/7/28 Peter Tröger <peter at troeger.eu>:
>>>> Hi,
>>>> Agenda item #8 was not discussed in the call today, but it is the burning
>>>> issue for me at the moment. Please have a look in the  "Attributes in
>>>> JobInfo" tab:
>>>> http://spreadsheets.google.com/ccc?key=0AqyvnBscJNqxcnJBSUs5dXRrU29EUVhGOGthc1lDTFE
>>>> Currently, we allow to access the original JobTemplate from a JobInfo
>>>> object. The idea was to get, beside the job monitoring information, also the
>>>> information about what was demanded at submission time.
>>>> While doing the Condor mapping, I figured out that most of the JobTemplate
>>>> attributes are also monitorable for a running job. This includes things such
>>>> as executable name and working directory. Normally they should be the same
>>>> as in the JobTemplate, but Condor and SGE (at least) have this magic job
>>>> wrapper stuff, were the admin can automatically and silently reconfigure /
>>>> reinterprete everything in a JobTemplate. This might lead to the situation
>>>> were the user asks for A, and silently gets B.
>>>> The question: Should we drop the support for getting the JobTemplate as part
>>>> of JobInfo, because the information is useless ? Instead, we could add some
>>>> (or maybe most) of the JobTemplate attributes as true dynamic monitoring
>>>> information to JobInfo.
>>> in my opinion repeating almost all attributes in this case brings
>>> additional redundancy in the DRMAA API (another reason may be
>>> performance - the JobTemplate attribute are more likely immutable).
>>> Why not simply request expected behavior in the spec? e.g.:
>>> a) the JobTemplate being part of the JobInfo struct is a reference to
>>> the JobTemplate used for submission (for jobs submitted outside the
>>> session it MUST be NULL)
>>> b) the JobTemplate reflects actual attributes of a job (without
>>> obligation that all attributes must be available - e.g. in Torque the
>>> actually executed command is hidden in script)
>> 
>> Th interesting thing is that we already started to do this replication, for example: JobTemplate::candidateMachines vs. JobInfo::allocatedMachines. I still vote for finishing this replication, and remove the JT reference from JobInfo as compensation. I also have a problem with fetching live data from a structure called "template". 
>> 
>> You example from Torque underlines my argumentation - we should choose a monitorable sub set of JobTemplate and add it to the JobInfo structure, instead of linking the JobTemplate directly.
>> 
>> Any other opinions ?
>> 
>> Peter.
> -- 
> Nothing is ever easy.



More information about the drmaa-wg mailing list