[SAGA-RG] Simple Job Management
Pascal Kleijer
k-pasukaru at ap.jp.nec.com
Wed Dec 13 20:35:32 CST 2006
Dear SAGA members,
Based on the API document currently in public comment phase, we have
implemented a very simple version of the Job Management API. Basically
we stripped the underlying SAGA model and when directly to Job
Management API. See the attachment UML graph for details. The reason was
that we did not have enough time to implement the full SAGA core to
support the API and we focus on the NAREGI Super Scheduler (SS).
Consideration and simplifications:
- We do not support the suspended state at this time (see below for
details).
- The wait method is reserved in java for signal synchronization and is
not used in the same way; we did not implement a real wait method since
we are not interested in synchronization of jobs at the moment. This
might be equivalent of the Thread.join() method.
- Metrics handling has not been added. This might come with future
incarnation of the package.
- Job_self is not supported, also we could pretend it is the same as job
in java.
- We do not include all the methods of the job_service so far in the
factory, might come in later incarnations and rename the factory in service.
- Checkpoint and migrate are not supported for now. In two models it has
no sense and the SS seems not to support it yet.
- Signal is not supported, internally some implementations have it. But
again the SS does not.
- Many attributes are not supported at the moment.
- Session and security model ignored (the SS has his own model) others
don't care.
- A job description can take strings, collections and a string arrays as
arguments. Other formats are allowed if the caller knows how to
manipulate them. Properties that are known to use string arrays have
direct assessors to facilitate their access. All properties can be
stored as a single string to allow serialization.
Since we are working in Java we decided to go pure pattern oriented. So
an application has access to the factory, job pattern and a Job
description. The concrete Job stubs implementations are not supposed to
be exposed (but are accessible since the class is public).
Now the design is made so that we can later on hock the SAGA core
classes below the current API without breaking (to much) the code
(assuming we hold on our design pattern approach).
We have three concrete implementations of Jobs: Local, SSH and Super
Scheduler.
- The local job uses the java process object and handles a job on the
same machine as the JVM. This job type does not support suspended mode
at all. This is a fully synchronous job since all actions are taken on
the spot, unless you submit the job on a queuing system.
- The SSH is a remote job incarnation, the job can run on any machine
that has the SSH daemon running, this can be a synchronous job, unless
you submit the job on a queuing system. This job type does not support
suspended mode for the moment and only POSIX systems can be used to
launch the job.
- The Super Scheduler is NAREGI specific and uses NAREGI’s middleware.
This is an asynchronous job. Suspended mode cannot be directly handled
even if the state exists in the SS, so this is still pending. This job
produces internally WSDL documents; the necessary methods are private
however.
General comments and questions. Might be some meat for the public
comments as well:
Now we stumbled upon the state machine of the API. The "Unknown" and
"New" state are unclear to us. In our opinion when you create a job
either with the factory or directly with the constructor of the specific
incarnation, we enter the "New" state. The "Unknown" state is now
reserved for the very short time the object is instantiated but we
directly switch to "New" once the constructor is finished. The principle
in OO programming is to have a stable object once you finish
constructing it and calling method; if the constructor is not enough to
have a stable object you need a factory. So when you get an object back
it should be in a stable state, thus the "Unknown" state is superficial
in our opinion.
Some metrics or attributes or the Job are useless since they come
directly from the descriptor, Example: "ExecutionHosts",
"WorkingDirectory" or "CPUTimeLimit". Unless you consider that these
values might be different from the job description. Or if the job
description don't mention them the job can have this values assigned by
the back-end. Either case the API documentation should clarify this.
The run_job from the service will not follow the API contract if
implemented. Only one parameter can be returned in java. Also the
streams are available thought the Job pattern.
In the document section 3.8.8 Examples the example at line 16 and 17 is
wrong (or the method is overwritten). There should be no string
argument. The host should be set in the descriptor.
--
Best regards,
Pascal Kleijer
----------------------------------------------------------------
HPC Marketing Promotion Division, NEC Corporation
1-10, Nisshin-cho, Fuchu, Tokyo, 183-8501, Japan.
Tel: +81-(0)42/333.6389 Fax: +81-(0)42/333.6382
-------------- next part --------------
A non-text attachment was scrubbed...
Name: JSF.png
Type: image/png
Size: 33884 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/saga-rg/attachments/20061214/b5a9ecea/attachment-0003.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4385 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.ogf.org/pipermail/saga-rg/attachments/20061214/b5a9ecea/attachment-0003.bin
More information about the saga-rg
mailing list