[ogsa-bes-wg] Additional Input into BES w.r.t ESI document

Peter G Lane lane at mcs.anl.gov
Thu May 11 12:17:41 CDT 2006


On Thu, 2006-05-11 at 08:51 +0100, Steven Newhouse wrote:
> All,
> 
>  From the OMII-Europe project (which is engaging with GGF in 
> implementing grid standards) here are some comments from one of the 
> teams that will be looking to implement the BES specification.
> 
> Please respond to the following comments. The attached document has 
> comments on the proposed ESI state model and information/resource model.
> 
> Steven
> 
> 
> About the Job Factory Interface:
> * The list of the defined operations does not cover administrative
> operations like “rejectJobSubmissions(Policy)” and “allowJobSubmission
> (Policy)” useful for disabling/enabling new job submissions based on the
> policy defined by the CE administrator (e.g. if the CE has to be shutdown
> for maintenance; disable new submissions if the number of active jobs is >
> 3000; etc). Do you plan to provide it?

That sounds like at very least a separate management interface. I think
the job factory interface should focus on very basic job creation
semantics. This also gives an implementer the flexibility to determine
how they want to regulate job creation. Do they want a public interface
or do they want some hidden, non-web service controls.

> 
> *    The current proposed JobFactory interface allows users to create new
> jobs AND (optionally) subscribe for notifications. We believe that the job
> management service should be better decoupled from the notification service,
> as they provide different functionalities. We suggest that Figure 1 be
> extended with a new box (“SubscriptionFactory”) which exposes an interface
> for creating, modifying and removing notification requests. In this way,
> notification management can be decoupled from job management, allowing a
> greater degree of flexibility. For example, it would be possible for users
> to subscribe to notifications after a job has been created (with the current
> proposed interface, this would not be possible). Moreover, it would allow
> users to submit to the notification service requests for “cumulative”
> subscriptions (i.e., in order to receive notifications related to all jobs
> submitted by the same user, or by members of the same Virtual Organization).

In principal I agree that this would be cleaner, but in practice it's
nice to reduce message round trips to speed up job submission. And
there's nothing at all preventing someone from subscribing later just
because there's this one shortcut that is allowed. In practice you also
have to be careful separating the two since a subscription after
submission causes a race condition between notifications and when the
client is setup to receive them. You end up missing notifications
frequently, which is especially bad if the client depends on all the
notifications to know what to do. Furthermore, if you don't have the
subscribe-on-create feature, you end up having to have a two-phase
commit model as well which ends up adding yet another message round trip
to the job submission.

> 
> *    We propose to include a “JobAssess” operation on the JobFactory
> Interface. This operation provides the user with an estimate of the start
> running time, e.g. taking into account the current state of the Computing
> Element, the number of running/queued jobs and other parameters.

This sounds like a reservation or agreement interface that should be
separate from the basic factory interface.

> 
> *    Related with the previous issue, it would be interesting to include
> an additional operation for estimating the Quality of Service (QoS) for the
> service instance. The exact meaning of QoS in general depends on user
> requirements (users may assume different weights for different parameters).

Same answer as above.

> 
> *    This is probably a “cosmetic” adjustment, but we believe that the
> name of the “Release” operation makes sense only for reactivating jobs
> from “Held” states; the transition from “Start Pending” and “Staging In”
> could probably be called something like “Activate”.

I don't think the ESI document says that the Release operation does
this. It's only for releasing holds. Perhaps you are assuming an implied
hold during "Start Pending" that isn't actually there? Is there a
specific quote from the document that you think says this?

> 
> *    The proposed interface does not provide mechanisms for handling
> capabilities. With this we refer to the possibility for a user to authorize
> other user(s) to perform certain operations on his/her jobs. For example, a
> user may want to allow another user to monitor his/her jobs, or to interrupt
> and abort jobs and so on. Perhaps this functionality is not strictly related
> to job management, but is rather a security issue (which, according to the
> draft, is still to be discussed). We may keep it for future discussions.

Very interesting point, though I think this could be addressed with JSDL
extensions unless you want to be able to adjust permissions after the
job has been submitted. In that case I think this is yet another
separate interface that we could propose later and implementers could
decide for themselves whether to allow this on their specific service.

> 
> 
> About the staging of files:
> *    While it is clear that users might explicitly “push” files from
> their storage space to the Grid while a job is in “Start Pending” state, it
> is unclear how users might explicitly “pull” by hand resulting files from
> the Grid after a job competed, and before everything gets cleaned up.

We've run into this issue with WS-GRAM. One idea I proposed elsewhere is
to have an extension to JSDL that specifies files you are interested in
monitoring, and have RPs in the job interface that list URLs for those
files so that you can, say, use GridFTP to pull them down. I'm on the
wall whether this is appropriate for the ESI job interface or whether
this should be a separate file monitoring interface. If you're wondering
why I suggest RPs if you already know the file you want, this is because
at least in WS-GRAM we allow for mapping of files to a GridFTP server
that may not necessarily have the same file system view. In other words,
the URL path part may not agree entirely with the path specified in
JSDL.

> 
> 
> About the Job Interface:
> *    The description of the states of Fig. 2 should be expanded with more
> details, including details on state transitions (basically, we suggest to
> put a complete description of the states in section 3.2.1).

I'll leave this up to Ian to address.

> 
> *    Section 5.1: The meaning of the “Log” property on Table 3 is not
> clear: what does it mean?

I asked the same question. I believe it will be cleared up in a later
version of the spec.

> 
> *    From Table 3 we see a JobState property which represents the current
> state of the job. We think that it would be useful to provide the user with
> the history of all job status changes with the associated timestamp. We also
> support the need for exitCode and failureReason attributes (see Section 8.2,
> issue 9) to describe the job return code and job failure reason 
> respectively.

A state history is an interesting idea. That could clear up some of the
issues I raised with not having subscribe-on-create. Also, good point
about the exit code and failure reason. I don't know if the authors
intended for this to be encompassed in the StateType (JobState RP), but
if not I agree that these are definitely needed.

> 
> *    It may be useful to provide an additional property (we may call
> it “CommandList”) representing the list of all commands issued for a given
> job.

What do you mean by "command"? Typically there is only one executable,
so if that's what you mean I don't quite follow you.

Peter

> 
> 
> About the Application Interface:
> *    The status of this interface is unclear: is this section going to be
> discussed? Are you going to consider the problem of user interaction with
> running jobs?
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3720 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/ogsa-bes-wg/attachments/20060511/97e04fba/attachment.bin 


More information about the ogsa-bes-wg mailing list