[Pgi-wg] OGF PGI : Draft PGI Specification

Morris Riedel m.riedel at fz-juelich.de
Thu Jan 21 08:04:20 CST 2010


Hi Bernd, All,

  thanks for your valuable comments.

I put them in the PGI working slides to be more efficiently discussed and tracked.

When there is time enough these points will be discussed in the telcon today.

Take care,
Morris
------------------------------------------------------------
Morris Riedel
Jülich Supercomputing Centre (JSC)
Info: http://www.fz-juelich.de/jsc/JSCPeople/riedel

"We work to better ourselves, and the rest of humanity"

Sitz der Gesellschaft: Jülich
Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDirig'in Bärbel Brumme-Bothe
Vorstand: Prof. Dr. Achim Bachem (Vorsitzender), 
Dr. Ulrich Krafft (stellv. Vorsitzender)


>------Original Message-----
>-From: pgi-wg-bounces at ogf.org [mailto:pgi-wg-bounces at ogf.org] On Behalf Of Bernd Schuller
>-Sent: Thursday, January 14, 2010 10:59 AM
>-To: pgi-wg at ogf.org
>-Cc: UNICORE devel
>-Subject: Re: [Pgi-wg] OGF PGI : Draft PGI Specification
>-
>-Hi PGI folks,
>-
>-after reading through the current draft 0.38 from
>-http://forge.gridforum.org/sf/go/doc15839?nav=1 and listening to a
>-presentation by Morris, I want to make a few comments.
>-I'll try to focus on compute functionality, to keep this mail reasonably
>-short.
>-
>-Overall I think the PGI looks very promising and I really appreciate
>-your hard work! Having been present in the UMD/EMI project preparation I
>-know exactly how hard it can be ;)
>-
>-So here goes.
>-
>-0) the requirements doc mentioned in the introduction is not accessible
>-for lesser mortals, on https://forge.gridforum.org/sf/go/doc15590
>-I get "permission denied". Maybe you could copy it to the pgi-wg area?
>-
>-1) CreateActivity
>-Since the validation steps can take some time, it is impractical to wait
>-for these steps to finish before assigning activity IDs and returning
>-the response. Clients or intermediaries will run into timeouts. The
>-system should create the activities immediately, and assign them a state
>-like "new" or "validating". IMO every remote operation that can take
>-more than a couple of seconds to generate the response should be made
>-asynchronous. Just think of held locks and shared resources like DB
>-connections together with concurrent access by many clients... we've
>-been there with UNICORE and have been forced to keep web service
>-processing times as low as possible.
>-
>-2) Change activity state
>-I don't really see a reason for all this generic stuff. In reality you
>-want to start, abort, hold, resume etc the processing of an activity, so
>-why not make this more explicit. A compromise might be to do
>-something like requestActivityStateChange("Hold"), etc, and define the
>-mandatory list of "target states" supported by this operation.
>-
>-3) Cancel activity
>-Isn't this a special case of "Change activity state" ?
>-
>-4) Wipe
>-dito
>-
>-5) Delegation port type
>-Nice idea. However you should support also SAML assertions here (proxy
>-certs are so 1995!)
>-
>-6) in Section 5.1.2
>-What does "automatic resubmission" mean? Resubmission to the batch
>-system? Or do you possibly see the PGI execution service as something
>-"above" a normal execution service (like e.g. a glite wms?). Section
>-5.1.7 seems to support this view.
>-IMO resubmitting a failed job to the batch system makes no sense, it
>-will probably just fail again ;) So what is the idea?
>-
>-7) "Delegated" state (Section 5.1.4)
>-Allowing to delegate to an off-site execution service (like a different
>-Grid middleware) adds complexity and messes up a lot of things, like
>-credential delegation, state, working directory access, etc etc.
>-Should "PGI execute" not focus on a simple, practical service for job
>-execution? This forwarding business seems to be quite out of scope...
>-How shall manual data staging be done if the session directory is
>-off-site?
>-In the intro it lists "request routing" as a requirement, but I'd
>-reconsider that.
>-
>-
>-8) "Output sandbox"
>-I'd try to avoid glite specific terms :) Maybe the "directory containing
>-the output files produced by the job". At least define the term "output
>-sandbox" somewhere.
>-
>-9) I fully support Steven's statements regarding the reuse of JSDL. In
>-some places you duplicate parts that already exist in JSDL and
>-JSDL-POSIX, sometimes with less functionality. Some examples:
>-
>- - 7.2 executable name, path, arguments. This can be done by a
>-JSDL-Posix  element, which covers even more, such as environment,
>-stdout/err/in.
>-
>- - 7.3.1.4 UserTag can be replaced by JSDL JobAnnotation
>-
>- - 7.3.6.2 Input,output,error,environment -> JSDL-Posix
>-
>-IMO JSDL-Posix (possibly with extensions) can be used in all places
>-where you need to directly specify the execution of a process. Similarly
>-the normal Application (ApplicationName, ApplicationVersion) (again
>-possibly with extensions) can be used to define execution of a
>-pre-installed software.
>-
>-10) other JSDL related comments
>-  - 7.3.2.9 LogDir in the interest of interoperability I'd assume that
>-the internals of how a middleware stores its "grid-specific diagnostics"
>-is irrelevant to the job description. E.g. UNICORE would store this in a
>-database, not in a directory on the execution system.
>-
>-11) In general it is not clear to me which of these elements MUST be
>-supported by a PGI implementation.
>-
>-12) 7.3.2.14 Start time. This is reservation functionality which opens a
>-new can of worms :-) What happens if the RMS does not support this, or
>-the request cannot be granted? If you want to support reservation, you
>-need to reflect this in the state model and in the possible errors a
>-user might get. Also reservation is not listed as a requirement in the
>-Introduction.
>-
>-13) 7.3.2.15 Notifications This should not be "custom format" but "comma
>-separated list of e-mail addresses"
>-
>-
>-Summarizing: I like the port types and the basic data and execution
>-model, also data staging and credential delegation looks good. You
>-should re-consider the job description part and clearly identify the
>-minimal set that has to be supported by every compliant implementation.
>-Also I'd try to keep all implementation-specific behaviour out of the
>-spec, like where logs are stored and what is purged by a "purge"
>-operation. What is important is the behaviour and session directory
>-access that a user can expect of any PGI service in each activity state
>-(maybe a table would be helpful).
>-
>-Best regards,
>-Bernd.
>-
>-
>---
>-Dr. Bernd Schuller
>-Distributed Systems and Grid Computing
>-Juelich Supercomputing Centre, http://www.fz-juelich.de/jsc
>-Phone: +49 246161-8736 (fax -8556)
>-Personal blog: www.jroller.com/page/gridhaus
>-
>-
>-------------------------------------------------------------------------------------------------
>-------------------------------------------------------------------------------------------------
>-Forschungszentrum Juelich GmbH
>-52425 Juelich
>-Sitz der Gesellschaft: Juelich
>-Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>-Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
>-Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
>-Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
>-Prof. Dr. Sebastian M. Schmidt
>-------------------------------------------------------------------------------------------------
>-------------------------------------------------------------------------------------------------
>-_______________________________________________
>-Pgi-wg mailing list
>-Pgi-wg at ogf.org
>-http://www.ogf.org/mailman/listinfo/pgi-wg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3550 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/pgi-wg/attachments/20100121/fb2080a2/attachment.bin 


More information about the Pgi-wg mailing list