[SAGA-RG] Simple Job Management

Pascal Kleijer k-pasukaru at ap.jp.nec.com
Sun Dec 17 18:40:53 CST 2006


Hello Andre,

OK I let you digest the first part. If you need it I can bundle you the 
code if you feel for some Java snack :p

Also some in-line comments:


> below some answers/comments to the second part of your mail
> - for the first part I need more time for digestion... :-)
> Only so much: great that you attempt a SAGA implementation!!!
> 
> Quoting [Pascal Kleijer] (Dec 14 2006):
>> General comments and questions. Might be some meat for the public 
>> comments as well:
>>
>> Now we stumbled upon the state machine of the API. The "Unknown" and 
>> "New" state are unclear to us. In our opinion when you create a job 
>> either with the factory or directly with the constructor of the specific 
>> incarnation, we enter the "New" state. The "Unknown" state is now 
>> reserved for the very short time the object is instantiated but we 
>> directly switch to "New" once the constructor is finished. The principle 
>> in OO programming is to have a stable object once you finish 
>> constructing it and calling method; if the constructor is not enough to 
>> have a stable object you need a factory. So when you get an object back 
>> it should be in a stable state, thus the "Unknown" state is superficial 
>> in our opinion.
> 
> The new state is indeed supposed to reflect a successfull
> job creation: the job was created by calling
> job_service.create_job () synchronously.  The job instance
> then has a job description, and methods like run() can be
> called.
> 
>   saga::job j = job_service.create (job_description);
>   // job is 'New' here, and can be run.
> 
> However, when the same methid has been called
> _asynchronously_, the job instance remains uninitialized
> until the async call returns:
> 
>   saga::job j;  
>   // job state is 'Unknown' - cannot be run, yet
> 
>   saga::task = job_service.create_job <async> (&job, job_description);
>   // job state is still 'Unknown' - cannot be run, yet
> 
>   task.wait ();
>   // only now the job instance is initialized, and the state
>   // changes to 'New' - run can be called now.
> 
> The state diagram contains the method calls which lead to
> the different states - maybe that is not obvious enough...

OK I didn't think about the Task constructor. I come from a school of 
open mission critical software design where you need to mathematically 
prove that at each stage of your application your system is stable. The 
Unknown would sound strange.

 >   saga::task = job_service.create_job <async> (&job, job_description);
 >   // job state is still 'Unknown' - cannot be run, yet

If I would implement this in Java the job would not be given in the args 
list. But the task would return it after the task has finished setting 
it up. The job is returned with another method call on the task and this 
method returns a value *ONLY* when it has finished setting up. In that 
case the Unknown state is always skipped for the user point of view. 
Internally it might exists for a brief time.


>> Some metrics or attributes or the Job are useless since they come 
>> directly from the descriptor, Example: "ExecutionHosts", 
>> "WorkingDirectory" or "CPUTimeLimit". Unless you consider that these 
>> values might be different from the job description. Or if the job 
>> description don't mention them the job can have this values assigned by 
>> the back-end. Either case the API documentation should clarify this.
> 
> Well, these metrics are not useless, but potentially
> redundant ;-)  as you say, the job_description might not
> contain the respective values (only 'Executable' is required
> in jd).  Also, you may not have started the job with a
> job_description, but may have created the instane via
>   
>   list<string> ids = job_service.list_jobs ();
>   saga::job    job = job_service.get_job (ids[0]);  
> 
> Also, things like CPUTimeLimit, pwd, or queue may get
> changed by the backend.  Lastly, the metrics are
> monitorable, so you can get notifictions on such changes.
> That does not work with the job_description.
> 
> Does that make sense?

Yes it does, but it would be better to make it clean in the 
documentation that it might be redundant or that these values might be 
different from what the JD actually has.


>> The run_job from the service will not follow the API contract if 
>> implemented. Only one parameter can be returned in java. Also the 
>> streams are available thought the Job pattern.
> 
> Well, you can return an parameter array in Java - that is,
> AFAIK, the java way to multiple return parameters?  
> 
> But, yes, streams are available anyway.  run_job() is a
> convenience method anyway...  I am not sure how that should
> be rendered in Java...  It'll probably be the single most
> discussed call in the Java bindings ;-)

Yes I know. Actually returning an array is a possibility but not very 
clean because you then need to do additional type casts. Since the 
streams are available in the job, the Java binding will not return them.


> 
> 
>> In the document section 3.8.8 Examples the example at line 16 and 17 is 
>> wrong (or the method is overwritten). There should be no string 
>> argument. The host should be set in the descriptor.

I have found some more, see the public track page.


-- 

Best regards,
Pascal Kleijer

----------------------------------------------------------------
   HPC Marketing Promotion Division, NEC Corporation
   1-10, Nisshin-cho, Fuchu, Tokyo, 183-8501, Japan.
   Tel: +81-(0)42/333.6389       Fax: +81-(0)42/333.6382
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4385 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.ogf.org/pipermail/saga-rg/attachments/20061218/e220d618/attachment-0003.bin 


More information about the saga-rg mailing list