[DRMAA-WG] Monitoring JobTemplate attributes for running jobs

Peter Tröger peter at troeger.eu
Thu Jul 29 03:07:23 CDT 2010


Am 28.07.2010 um 23:42 schrieb Mariusz Mamoński:

> Hi,
> 
> 2010/7/28 Peter Tröger <peter at troeger.eu>:
>> Hi,
>> Agenda item #8 was not discussed in the call today, but it is the burning
>> issue for me at the moment. Please have a look in the  "Attributes in
>> JobInfo" tab:
>> http://spreadsheets.google.com/ccc?key=0AqyvnBscJNqxcnJBSUs5dXRrU29EUVhGOGthc1lDTFE
>> Currently, we allow to access the original JobTemplate from a JobInfo
>> object. The idea was to get, beside the job monitoring information, also the
>> information about what was demanded at submission time.
>> While doing the Condor mapping, I figured out that most of the JobTemplate
>> attributes are also monitorable for a running job. This includes things such
>> as executable name and working directory. Normally they should be the same
>> as in the JobTemplate, but Condor and SGE (at least) have this magic job
>> wrapper stuff, were the admin can automatically and silently reconfigure /
>> reinterprete everything in a JobTemplate. This might lead to the situation
>> were the user asks for A, and silently gets B.
>> The question: Should we drop the support for getting the JobTemplate as part
>> of JobInfo, because the information is useless ? Instead, we could add some
>> (or maybe most) of the JobTemplate attributes as true dynamic monitoring
>> information to JobInfo.
> in my opinion repeating almost all attributes in this case brings
> additional redundancy in the DRMAA API (another reason may be
> performance - the JobTemplate attribute are more likely immutable).
> Why not simply request expected behavior in the spec? e.g.:
> a) the JobTemplate being part of the JobInfo struct is a reference to
> the JobTemplate used for submission (for jobs submitted outside the
> session it MUST be NULL)
> b) the JobTemplate reflects actual attributes of a job (without
> obligation that all attributes must be available - e.g. in Torque the
> actually executed command is hidden in script)

Th interesting thing is that we already started to do this replication, for example: JobTemplate::candidateMachines vs. JobInfo::allocatedMachines. I still vote for finishing this replication, and remove the JT reference from JobInfo as compensation. I also have a problem with fetching live data from a structure called "template". 

You example from Torque underlines my argumentation - we should choose a monitorable sub set of JobTemplate and add it to the JobInfo structure, instead of linking the JobTemplate directly.

Any other opinions ?

Peter.



More information about the drmaa-wg mailing list