[DRMAA-WG] Monitoring JobTemplate attributes for running jobs
Peter Tröger
peter at troeger.eu
Mon Aug 23 04:50:32 CDT 2010
We already have some understanding of persistency, so the implementation effort is manageable. I am more concerned about a clear separation of live monitoring information and original submission data. For the latter, I saw no use case so far ...
Best,
Peter.
Am 29.07.2010 um 11:02 schrieb Andre Merzky:
> Our use case for having access to the original complete job template
> is that the user can easily resubmit the same job - just changing
> for example some command line parameter, but leaving the remainder
> fixed. In SAGA this would look like:
>
> saga::job::service js ("drmaa://torque.remote.net/");
> saga::job::job j1 = js.get_job (jobid); // std::string
> saga::job::description jd = j1.get_description ();
>
> jd.set_attributes ("Arguments", new_args); // std::vector <std::string>
>
> saga::job::job j2 = js.create_job (jd);
>
>
> I understand that the backend may no be able to keep the original
> job template - in that case, a 'DoesNoExist' exception on
> 'get_description()' would be appropriate, IMHO. If the DRMAA
> implementation can cache that description somewhere, fine :-)
>
> My $0.02, Andre.
>
>
> PS: saga::job::description == drmaa::job::template
>
>
>
>
>
> Quoting [Peter Tr?ger] (Jul 29 2010):
>> From: Peter Tröger <peter at troeger.eu>
>> Date: Thu, 29 Jul 2010 10:07:23 +0200
>> To: Mariusz Mamo??ski <mamonski at man.poznan.pl>,
>> drmaa-wg at ogf.org
>> Subject: Re: [DRMAA-WG] Monitoring JobTemplate attributes for running jobs
>>
>>
>> Am 28.07.2010 um 23:42 schrieb Mariusz Mamo??ski:
>>
>>> Hi,
>>>
>>> 2010/7/28 Peter Tröger <peter at troeger.eu>:
>>>> Hi,
>>>> Agenda item #8 was not discussed in the call today, but it is the burning
>>>> issue for me at the moment. Please have a look in the "Attributes in
>>>> JobInfo" tab:
>>>> http://spreadsheets.google.com/ccc?key=0AqyvnBscJNqxcnJBSUs5dXRrU29EUVhGOGthc1lDTFE
>>>> Currently, we allow to access the original JobTemplate from a JobInfo
>>>> object. The idea was to get, beside the job monitoring information, also the
>>>> information about what was demanded at submission time.
>>>> While doing the Condor mapping, I figured out that most of the JobTemplate
>>>> attributes are also monitorable for a running job. This includes things such
>>>> as executable name and working directory. Normally they should be the same
>>>> as in the JobTemplate, but Condor and SGE (at least) have this magic job
>>>> wrapper stuff, were the admin can automatically and silently reconfigure /
>>>> reinterprete everything in a JobTemplate. This might lead to the situation
>>>> were the user asks for A, and silently gets B.
>>>> The question: Should we drop the support for getting the JobTemplate as part
>>>> of JobInfo, because the information is useless ? Instead, we could add some
>>>> (or maybe most) of the JobTemplate attributes as true dynamic monitoring
>>>> information to JobInfo.
>>> in my opinion repeating almost all attributes in this case brings
>>> additional redundancy in the DRMAA API (another reason may be
>>> performance - the JobTemplate attribute are more likely immutable).
>>> Why not simply request expected behavior in the spec? e.g.:
>>> a) the JobTemplate being part of the JobInfo struct is a reference to
>>> the JobTemplate used for submission (for jobs submitted outside the
>>> session it MUST be NULL)
>>> b) the JobTemplate reflects actual attributes of a job (without
>>> obligation that all attributes must be available - e.g. in Torque the
>>> actually executed command is hidden in script)
>>
>> Th interesting thing is that we already started to do this replication, for example: JobTemplate::candidateMachines vs. JobInfo::allocatedMachines. I still vote for finishing this replication, and remove the JT reference from JobInfo as compensation. I also have a problem with fetching live data from a structure called "template".
>>
>> You example from Torque underlines my argumentation - we should choose a monitorable sub set of JobTemplate and add it to the JobInfo structure, instead of linking the JobTemplate directly.
>>
>> Any other opinions ?
>>
>> Peter.
> --
> Nothing is ever easy.
More information about the drmaa-wg
mailing list