[ogsa-wg] Perhaps useful input to BES discussion

Ian Foster foster at mcs.anl.gov
Mon May 23 09:19:51 CDT 2005


Hiro:

The "BES is for container not job manager" argument doesn't make sense to 
me. The question of where you are permitted to direct operations for 
purposes of monitoring and control--to container, job, or both--is 
orthogonal to the question of what operations need to be supported.

The draft BES document defines "check status" and "terminate" operations, 
which are certainly required. However, more are needed, e.g.:

* soft-state lifetime management, to avoid orphan jobs

* subscribe-on-status-change operations, to avoid repeated polling.

Simply saying "we're not going to consider those because they are defined 
in WSRF" makes no sense to me. WSRF also defines "check status" and 
"terminate" operations, but you're not ignoring those.

Another generic issue that is not addressed in the BES document is how you 
model the state associated with the factory and an individual job. 
Regardless of how you choose to provide access to that state, via 
standardized WSRF operations or some custom operations, a schema needs to 
be defined implicitly or explicitly, and this must surely encompass more 
than just "job status." E.g., see below for those defined in GT4 GRAM.

With respect to your questions below:

#1: Yes, in my view.

#2: I certainly think you need to consider and address these issues together.

Ian.



Job modeling, from 
http://www.globus.org/toolkit/docs/4.0/execution/wsgram/WS_GRAM_Public_Interfaces.html#id2844424


2.3.2. Managed Job Port Type

    * serviceLevelAgreement: A wrapper around fields containing the 
single-job and multi-job descriptions or RSLs. Only one of these sub-fields 
shall have a non-null value.
    * state: The current state of the job.
    * fault: The fault (if generated) indicating the reason for failure of 
the job to complete.
    * localUserId: The job owner's local user account name.
    * userSubject: The GSI certificate DN of the job owner.
    * holding: Indicates whether a hold has been placed on this job.


2.3.3. Managed Executable Job Port Type

    * stdoutURL: A GridFTP URL to the file generated by the job which 
contains the stdout.
    * stderrURL: A GridFTP URL to the file generated by the job which 
contains the stderr.
    * credentialPath: The path (relative to the job process) to the file 
containing the user proxy used by the job to authenticate out to other 
services.
    * exitCode: The exit code generated by the job process.


2.3.4. Managed Multi-Job Port Type

    * subJobEndpoint: A set of endpoint references to the sub-jobs created 
by this multi-job.


2.3.5. Faults

    * FaultType: This is the base fault for runtime errors that occur while 
managing a job. It extends the OGSI FaultType.
    * CredentialSerializationFaultType: This fault indicates that the 
managed job service was unable to serialize or deserialize a delegated 
credential.
    * InsufficientCredentialsFaultType: This fault indicates that the 
managed job service was unable to perform some action on behalf of the 
owner of the job submission because the owner has delegated insufficient 
credentials.
    * InternalFaultType: This fault indicates that an internal operation 
failed.
    * InvalidCredentialsFaultType: This fault indicates that the managed 
job service was unable to use a delegated credential.
    * ServiceLevelAgreementFaultType: Fault for runtime errors which are 
directly related to a particular part of the ServiceLevelAgreement document 
passed to the createService method. This fault type contains the fragment 
of the ServiceLevelAgreement related to the fault as one of its elements.
    * ExecutionFailedFaultType: This fault indicates that the Managed Job 
service was unable to begin the execution of the job.
    * FilePermissionsFaultType: This fault indicates that the ManagedJob 
service does not have permissions to access a file referenced in the 
ServiceLevelAgreement.
    * InvalidPathFaultType: This fault indicates that a file or directory 
path referenced in the ServiceLevelAgreement contains an invalid path.
    * StagingFaultType: This fault indicates that part of the file staging 
requirements of the ServiceLevelAgreement could not be completed.
    * UnsupportedFeatureFaultType: This fault indicates that an error 
occurred because the RSL depended on a feature not implemented by a 
particular GRAM scheduler.



At 04:53 PM 5/23/2005 +0900, Hiro Kishimoto wrote:
>Hi Ian,
>
>Thank you for your excellent and thoughtful document!
>
>Yes, we have had a very related discussion at the meeting yesterday.
>We've discussed that BES defines subset of your 8 operation (1, 2, 7,
>and 8). Please remember BES is for Container not for Job Manager.
>
>The climate of the meeting is "container (factory) interface only, no
>job interface." And the reason is operation 2 and 7 are already specified
>in WSRF.
>
>However, I still wondering the following two issues;
>
>(1) Even though interface is already defined in the WSRF, don't we need
>to define domain-specific semantics and behavior (e.g. job destroy means
>soft kill).
>
>(2) Given that Job Manager defines Job interface explained in Ian's
>document, combination of Job Manager and Container introduces
>unexpected complexity in EMS architecture? (Job itself has its own
>interface in the context of Job Manager but has no interface in the
>context of container).
>
>Your thoughts?
>----
>Hiro Kishimoto
>
>-----Original Message-----
>From: owner-ogsa-wg at ggf.org [mailto:owner-ogsa-wg at ggf.org] On Behalf Of Ian
>Foster
>Sent: Sunday, May 22, 2005 7:37 AM
>To: ogsa-wg; OGSA-BES-bof at ggf.org
>Subject: [ogsa-wg] Perhaps useful input to BES discussion
>
>Dear All:
>
>I am sending this draft document in case it is relevant to the OGSA-WG and/or
>BES discussions.
>
>In this document, I use a simple example (a skeleton execution service) to
>compare and contrast four approaches to representing state, namely WSRF,
>WS-Transfer, REST, and "state id."
>
>I haven't sent this earlier because I'd hoped to integrate numerous comments
>that I've received from Savas and others. I hope to do so in the next week or
>two, but perhaps this draft is still of interest.
>
>Regards -- Ian.
>
>
>_______________________________________________________________
>Ian Foster                    www.mcs.anl.gov/~foster
>Math & Computer Science Div.  Dept of Computer Science
>Argonne National Laboratory   The University of Chicago
>Argonne, IL 60439, U.S.A.     Chicago, IL 60637, U.S.A.
>Tel: 630 252 4619             Fax: 630 252 1997
>         Globus Alliance, www.globus.org

_______________________________________________________________
Ian Foster                    www.mcs.anl.gov/~foster
Math & Computer Science Div.  Dept of Computer Science
Argonne National Laboratory   The University of Chicago
Argonne, IL 60439, U.S.A.     Chicago, IL 60637, U.S.A.
Tel: 630 252 4619             Fax: 630 252 1997
         Globus Alliance, www.globus.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/ogsa-bes-bof/attachments/20050523/6f2c7042/attachment.html 


More information about the ogsa-bes-bof mailing list