[DRMAA-WG] Monitoring JobTemplate attributes for running jobs

Andre Merzky andre at merzky.net
Thu Jul 29 04:02:41 CDT 2010


Our use case for having access to the original complete job template
is that the user can easily resubmit the same job - just changing
for example some command line parameter, but leaving the remainder
fixed.   In SAGA this would look like:

  saga::job::service     js ("drmaa://torque.remote.net/");
  saga::job::job         j1 = js.get_job (jobid);   // std::string
  saga::job::description jd = j1.get_description ();

  jd.set_attributes ("Arguments", new_args);  // std::vector <std::string>

  saga::job::job j2 = js.create_job (jd);


I understand that the backend may no be able to keep the original
job template - in that case, a 'DoesNoExist' exception on
'get_description()' would be appropriate, IMHO.  If the DRMAA
implementation can cache that description somewhere, fine :-)

My $0.02, Andre.


PS: saga::job::description == drmaa::job::template





Quoting [Peter Tr?ger] (Jul 29 2010):
> From: Peter Tröger <peter at troeger.eu>
> Date: Thu, 29 Jul 2010 10:07:23 +0200
> To: Mariusz Mamo??ski <mamonski at man.poznan.pl>,
> 	drmaa-wg at ogf.org
> Subject: Re: [DRMAA-WG] Monitoring JobTemplate attributes for running jobs
> 
> 
> Am 28.07.2010 um 23:42 schrieb Mariusz Mamo??ski:
> 
> > Hi,
> > 
> > 2010/7/28 Peter Tröger <peter at troeger.eu>:
> >> Hi,
> >> Agenda item #8 was not discussed in the call today, but it is the burning
> >> issue for me at the moment. Please have a look in the  "Attributes in
> >> JobInfo" tab:
> >> http://spreadsheets.google.com/ccc?key=0AqyvnBscJNqxcnJBSUs5dXRrU29EUVhGOGthc1lDTFE
> >> Currently, we allow to access the original JobTemplate from a JobInfo
> >> object. The idea was to get, beside the job monitoring information, also the
> >> information about what was demanded at submission time.
> >> While doing the Condor mapping, I figured out that most of the JobTemplate
> >> attributes are also monitorable for a running job. This includes things such
> >> as executable name and working directory. Normally they should be the same
> >> as in the JobTemplate, but Condor and SGE (at least) have this magic job
> >> wrapper stuff, were the admin can automatically and silently reconfigure /
> >> reinterprete everything in a JobTemplate. This might lead to the situation
> >> were the user asks for A, and silently gets B.
> >> The question: Should we drop the support for getting the JobTemplate as part
> >> of JobInfo, because the information is useless ? Instead, we could add some
> >> (or maybe most) of the JobTemplate attributes as true dynamic monitoring
> >> information to JobInfo.
> > in my opinion repeating almost all attributes in this case brings
> > additional redundancy in the DRMAA API (another reason may be
> > performance - the JobTemplate attribute are more likely immutable).
> > Why not simply request expected behavior in the spec? e.g.:
> > a) the JobTemplate being part of the JobInfo struct is a reference to
> > the JobTemplate used for submission (for jobs submitted outside the
> > session it MUST be NULL)
> > b) the JobTemplate reflects actual attributes of a job (without
> > obligation that all attributes must be available - e.g. in Torque the
> > actually executed command is hidden in script)
> 
> Th interesting thing is that we already started to do this replication, for example: JobTemplate::candidateMachines vs. JobInfo::allocatedMachines. I still vote for finishing this replication, and remove the JT reference from JobInfo as compensation. I also have a problem with fetching live data from a structure called "template". 
> 
> You example from Torque underlines my argumentation - we should choose a monitorable sub set of JobTemplate and add it to the JobInfo structure, instead of linking the JobTemplate directly.
> 
> Any other opinions ?
> 
> Peter.
-- 
Nothing is ever easy.


More information about the drmaa-wg mailing list