[Pgi-wg] OGF OGSA-BES - Requirements for an improved Basic Execution Service

Andre Merzky andre at merzky.net
Fri Sep 16 08:36:26 CDT 2011


+1 on all points.

Cheers, Andre.


2011/9/16 Bernd Schuller <b.schuller at fz-juelich.de>:
> hi Etienne,
>
> On Fri, 2011-09-16 at 13:32 +0200, Etienne URBAH wrote:
> [...]
>>
>> RFC-3820-compliant X509 Proxies
>> -------------------------------
>> The RFC-3820-compliant X509 proxies are fully supported by the jLite
>> library written in Java by Oleg SUKHOROSLOV and available at
>> http://code.google.com/p/jlite/
>>
>
> You must be joking. I was talking about open source software like Apache
> httpd and Java JDK. jLite just wraps the cog-globus libraries and adds
> some gLite access APIs. No, thanks.
>
>>
>> Dependency of BES on other grid software components / operational issues
>> ------------------------------------------------------------------------
>> It is very good that we know agree on GLUE 2.0 as base for the
>> Information System.  Otherwise, we could NOT agree on the way to express
>> references to grid entities in the Job Description document.
>>
>> Your comments about chapter 6.5 confirm that the specifications of the
>> BES Client interface DEPENDS on whether BES supports X509 proxies for
>> delegation of Security credentials or NOT.
>>
>
> At least this was the EMI-ES v1.0 conclusion, which need not be the
> final word. The only dependency (in EMI-ES) is the specification which
> delegated credential is to be used for which data staging item.
>
> Delegation can be performed FULLY TRANSPARENT to the BES (for example on
> a message level as in UNICORE), and the BES interface specification is
> not dependent on it at all.
>
>> Since delegation of Security credentials is a MUST for BES, my
>> conclusion is that we MUST agree on SECURITY issues (even if some are
>> operational issues) BEFORE trying to write down BES requirements.
>>
>> In the same way, we know that Clients need to perform complex queries on
>> Jobs. The BES Client interface DEPENDS on whether such queries are
>> accepted by BES itself, of by a separate Logging and Bookkeeping
>> service.  So, I think that we have to agree on the existence or absence
>> of a separate Logging and Bookkeeping service BEFORE trying to write
>> down BES requirements about queries.
>
>
> The BES client interface does not necessarily need to allow to perform
> complex queries, these should be part of a separate interface.
>
>
>> As a summary :
>> -  Some BES requirements are quite independent from other grid
>> components, and we can discuss on them immediately.
>> -  But some other BES requirements are DEPENDENT from foundational grid
>> components or operational issues, in particular Information System,
>> Security, Logging and Bookkeeping, ...
>> -  Therefore, we have to agree on these other grid components or
>> operational issues FIRST.
>>
>> This is a critical issue, and I propose that we discuss on it at OGF33.
>
> Unfortunately I won't be in Lyon, only via phone.
>
> Summarising, my main points for the BES interface specification :
>
>  * specifications must be narrowly scoped and composable (which is the
> JSDL/BES model anyway).
>
>  * do not try to specify specific authentication methods. Do not try to
> specify specific delegation methods. Security is a cross cutting concern
> and should be dealt with separately.
>
>  * do not assume a special environment where a BES instance will run.
> Interactions with other services (except for data access) are optional
> and should be treated as such.
>
>  * leave operational aspects to operations. Recommendations for BES
> implementors may be given, of course.
>
>
> Best regards,
> Bernd.
>
>>
>> I will answer to your other comments later.
>>
>> Best regards.
>>
>> -----------------------------------------------------
>> Etienne URBAH         LAL, Univ Paris-Sud, IN2P3/CNRS
>>                        Bat 200   91898 ORSAY    France
>> Tel: +33 1 64 46 84 87      Skype: etienne.urbah
>> Mob: +33 6 22 30 53 27      mailto:urbah at lal.in2p3.fr
>> -----------------------------------------------------
>>
>>
>> On Thu, 15/09/2011 22:44, Bernd Schuller wrote:
>> > Hi Etienne,
>> >
>> > thanks for the clarifications. So indeed your document is aimed at both:
>> >
>> > 1) providing requirements for the actual BES specification ("client
>> > interface" in your terminology)
>> > 2) the operation and deployment issues that have to be solved for
>> > interoperability on an infrastructure level (say EGI and EDGI).
>> >
>> > It would be very beneficial for further progress if these two distinct
>> > concerns could be separated, at least CLEARLY marked in your document.
>> >
>> > I have added some more comments inline.
>> >
>> > On Thu, 2011-09-15 at 19:27 +0200, Etienne URBAH wrote:
>> >
>> >> Concerning the document named 'Requirements for an improved Basic
>> >> Execution Service (BES)' and available at
>> >> http://forge.gridforum.org/sf/go/doc16306 :
>> >>
>> >> THANK YOU VERY MUCH for having taken the time to read this document, and
>> >> for having taken the time to provide comments :
>> >>
>> >> Such comments are very useful for the improvement of documents, permit
>> >> convergence and prepare later agreement.
>> >> [...]
>> >> On Thu, 15/09/2011 13:00, Bernd Schuller wrote:
>> >>>
>> >>> [...]
>> >>
>> >>> 1.4 Methodology
>> >>>
>> >>>    * ref 4: glite user guide ->   out of scope, gLite is too complex and
>> >>> too specific in its architecture (if there is such a thing) to be useful
>> >>> as a base for BES
>> >>     Yes, this is a very large user guide for specific usage of gLite by
>> >> users, and it does NOT provide a clear description of the architecture
>> >> and of the functionalities.
>> >>     But it is NOT so complex.  As soon as I finished reading this guide,
>> >> it was easy for me to perform reverse engineering and extract the
>> >> effective architecture (SOA with internal interfaces) and the implied
>> >> functionalities (which are to be improved).
>> >
>> > Indeed it appears that you try to impose gLite specifics (like a logging
>> > &  bookkeeping service or proxy certificates on the transport level) as
>> > requirements. This would severly limit the BES specification effort, and
>> > will not be accepted (I hope) by other stakeholders.
>> >
>> >>     In the text, I have prepended a few words to explain that.
>> >
>> > Basically you imply that the "architecture and functionalities" of the
>> > gLite execution system (together with the PGI work) is somehow the
>> > guideline to be followed, which I fully disagree with.
>> >
>> >>>
>> >>> 2.4 Collaboration with other services
>> >>>
>> >>> While this is important for interoperability, it is unimportant
>> >>> for the specification of a BES. The BES spec should NOT try to specify
>> >>> all the interaction with the rest of the world. This is the task of a
>> >>> "grid architecture specification" like OGSA.
>> >>     My document is NOT targeted only to the specification of the BES
>> >> Client interface, but to the clear and consistent description of BES
>> >> context and functional + operational requirements which are really
>> >> necessary for interoperability.
>> >>     As far as I know, OGSA does NOT take into account GLUE 2.0 yet.
>> >> Therefore, an up to date 'grid architecture specification' is absolutely
>> >> necessary.
>> >
>> > Glue2 is just an information model, not necessarily a perfect one nor
>> > the only one. However, I agree an information model has to be adopted
>> > for BES and any associated information systems.
>> >
>> >>     If OGSA members consider chapters 2.3 and 2.4 of my document as a
>> >> 'grid architecture specification' which updates and improves OGSA, I
>> >> thank them.  If they consider that this 'grid architecture
>> >> specification' does NOT comply with OGSA and competes with it, then I
>> >> assert that it obsoletes OGSA.
>> >
>> > I can't really say, the OGSA group stopped its work a long time ago and
>> > it's a long time that I looked at the documents.
>> >
>> >>>
>> >>> Specifically, the interactions with security, monitoring, accounting and
>> >>> logging framework are OPERATIONAL concerns that MUST NOT be a mandatory
>> >>> part of a BES specification.
>> >>     FAILURE of practical operations is often caused by LACK of early care
>> >> about operational concerns during specification phase.
>> >
>> > Agreed.
>> >
>> >>   As GIN-GC has
>> >> proven and documented, this is even more true for interoperability on
>> >> the field (as opposed to theoretical interoperability at the WSDL level).
>> >>     I confirm that care about operational concerns is REQUIRED for real
>> >> operations and for practical interoperability on the field.  Although
>> >> operational concerns are NOT part of the BES Client interface, they are
>> >> REQUIRED for the overall specifications of BES in its context.
>> >>     In the text, I have stressed that the document DOES take into account
>> >> operational concerns.
>> >>
>> >>>
>> >>> 4. BES non-functional requirements
>> >>>
>> >>> 4.1.2 Traceability - should be SHOULD not MUST
>> >>     This is an operational concern :  Would you really take the risk that
>> >> the whole EGI becomes a spambot or a scambot ?
>> >
>> > Isn't it already, powered by gLite and used by the wlcg botnet (just
>> > kidding of course) :-)
>> >
>> >>     No traceability -->  No post mortem analysis after attack -->  Large
>> >> infection -->  Panic -->  Abrupt and very long shutdown of all services.
>> >>     I fully confirm that traceability is a MUST.
>> >
>> > It is an internal detail which any good implementation will provide.
>> > If BES-A is much easier and more secure to operate than BES-B, admins
>> > can choose which to install.
>> >
>> >>>
>> >>> 4.1.3 Security
>> >>>    - how can a specification "implement" a policy? You probably
>> >>>      meant "BES implementations SHOULD ..."
>> >>     The text now is 'BES specifications MUST fully take into account the
>> >> Security Policies ...'
>> >
>> > Still no understanding here... let's take traceability
>> > The relevant EGI policy
>> > <https://documents.egi.eu/public/ShowDocument?docid=81>  says
>> >
>> > "[...] software deployed in the Grid MUST include the
>> > ability to produce sufficient and relevant logging [...]
>> > The level of the logging MUST be configured by all service providers,
>> > including but not limited to the Sites, to produce the required
>> > information which MUST be retained for a minimum of 90 days."
>> >
>> > For example all UNICORE services can be made to comply with this
>> > by configuration of the logging library we use (Apache Log4j), and by
>> > not deleting log files for 90 days.
>> > So this is a feature of the implementation and the administrator in
>> > charge, not the specification. Thus, your sentence should read "BES
>> > implementations SHOULD ..." (It is MUST of course only if they want to
>> > be deployed in EGI)
>> > One does not try to specify implementation details, at least not in any
>> > specification I've ever seen (e.g. does the HTTP specification say
>> > anything about server logging or accepted CAs?).
>> >
>> >
>> >
>> >> [...]
>> >>>
>> >>> 4.2
>> >>>    all of this is out of scope. For example the UNICORE service
>> >>> container hosts a number of services including an execution service.
>> >>> Probably you mean that the execution service SPECIFICATION should be
>> >>> limited to the execution service and MUST NOT specify accounting,
>> >>> security etc.
>> >>     'Well defined and narrow scope' is a general engineering requirement.
>> >>    It is fundamental concern crossing requirements, design,
>> >> specifications, fabrication, tests, operations, user experience and
>> >> product maintenance for all types of products, even outside software
>> >> engineering.
>> >
>> > exactly.
>> >
>> >>     I confirm that 'Well defined and narrow scope' is absolutely REQUIRED
>> >> for sound software design, implementation, deployment and maintainability.
>> >>     From your comment, I assume that the Execution Service of UNICORE
>> >> does have a 'Well defined and narrow scope', does have precise
>> >> interfaces with other UNICORE services, and minimizes overlaps with them.
>> >
>> > of course. And UNICORE does not include a L&B service :-)
>> >
>> >
>> >> [...]
>> >>> 6.1
>> >>>
>> >>>    * "SSL certificates MUST be signed by a CA..." this is an operational
>> >>> decision, and has nothing to do with the BES spec.
>> >>> For example, a site may run an inhouse deployment of BES using an
>> >>> in-house CA. This requirement should be deleted.
>> >>     This operational concern is REQUIRED for practical interoperability
>> >> on the field.  I have prepended :
>> >>     * Authentication of Servers :  The Execution Service SHOULD permit
>> >> Clients to authenticate it.  If an Execution Service authenticates
>> >> itself to clients, it MUST permit Clients to really perform this
>> >> authentication.
>> >
>> > This sentence makes no sense to me, sorry.
>> > Maybe "Server and client SHOULD communicate via a secure channel
>> > (SSL/TLS)". Even this may not be true in the future, though it is for
>> > all(?) the Grid systems currently.
>> >
>> >>>
>> >>> 6.3
>> >>>
>> >>> * "For Client authentication, the Execution Service MUST accept all
>> >>> following authentication methods:  Full X509,  RFC-3820-compliant X509
>> >>> Proxy"
>> >>>
>> >>> This requirement is invalid. I agree that it would be nice to be able to
>> >>> specify authentication methods, but it is impossible. For example
>> >>> Shibboleth, Username/password, OpenID, OAuth (e.g. for a REST interface
>> >>> over plain HTTP), or even NOTHING (e.g. in an inhouse grid) all can be
>> >>> valid authentication methods in some circumstances.
>> >>     There are 2 separate requirements :
>> >>     - 1 'MUST' for Full X509 and RFC-3820-compliant X509 Proxy
>> >>     - 1 'MAY'  for all other ones.
>> >>>
>> >>> Furthermore, making proxies a MUST implies that nonstandard
>> >>> authentication libraries instead of TLS/SSL must be used, making the BES
>> >>> implementation insecure. For some implementors (like UNICORE) proxies on
>> >>> the transport level are very much a no-go.
>> >>     I had clearly specified RFC-3820-compliant X509 Proxy, which ARE
>> >> standard.
>> >>     Your critics are valid for GSI proxies, which I have taken care NOT
>> >> to mention.
>> >
>> > by "standard" I did not mean that it is an RFC, but software support.
>> > As opposed to standard SSL/TLS, proxies are almost not supported by
>> > industry standard tools, for example Apache httpd or the Java JDK.
>> > One has to rely on custom code, which is notoriously buggy and error
>> > prone.
>> >
>> > Since one important non-functional requirement (for me at least) is to
>> > be able use standard (off-the-shelf) open source software, having to
>> > support proxies is a big limitation.
>> >
>> >>> 6.4.
>> >>>
>> >>> "This authorization mechanism MUST be consistent across all instances of
>> >>> the Execution Service"
>> >>>
>> >>> This violates the autonomy of a site. Site administrators often wish to
>> >>> stay in control of their resources, and do not accept external
>> >>> authorisation decision points. And anyway, who cares? Since the AuthZ
>> >>> mechanism is internal to the BES, it cannot be specified in the
>> >>> BES spec as such.
>> >>     Interoperability requires a federation of independent administrative
>> >> domain to agree on common functionalities, interfaces and operations.
>> >>     This DOES sometime violate the autonomy of each individual site.
>> >>     The requirement is NOT that the AUTHZ decision point is external to
>> >> any site, but that all participating site MUST accept to install inside
>> >> their site an instance of a commonly validated software implementing the
>> >> decision point.
>> >
>> > No. Each site may choose their own authz decision point, IMO.
>> >
>> >>     The AUTHZ mechanism MUST NOT be internal to the BES :
>> >
>> > Maybe I was not clear. The authz mechanism is invisible for outside
>> > parties (like clients). It can be an external component, an internal
>> > component, whatever, it is up to the BES implementor and the site admin.
>> >
>> >> For example,
>> >> in UNICORE atomic services, the 'de.fzj.unicore.uas.security' package
>> >> is described as 'The security subsystem of UNICORE/X', and is NOT
>> >> internal to 'de.fzj.unicore.uas.impl.job'.
>> >
>> > In UNICORE site admins can choose what attribute sources and XACML
>> > decision points they want to use, but the clients (including other
>> > services) do not need to know this. That is what I meant by "internal".
>> >
>> >>>
>> >>> 6.5
>> >>>
>> >>> These are reqiurements on the security layer (or framework) and should
>> >>> not be used as requirements on BES.
>> >>     These security requirements DO have impacts on the BES Client
>> >> interface and on the Job Description document.
>> >>     In the text, I have made it clear.
>> >
>> > indeed while preparing the EMI-ES specification, we came to the
>> > following conclusions
>> > 1) when using proxies for delegation, it is necessary to map each data
>> > staging item to a delegated credential (you can check the EMI-ES job
>> > description for details)
>> > 2) the delegation operations are separate from the job management
>> > operations, so they do not necessarily have to be part of the BES client
>> > interface.
>> >
>> > Also, there are existing implementations (UNICORE and Genesis come to my
>> > mind) that do not need this at all, because they do delegation without
>> > proxies.
>> >
>> > So I disagree, 6.5 mostly describes features of the particular security
>> > framework that is used.
>> >
>> >>>
>> >>> 8 BES requirements related to "Application Repositories"
>> >>>
>> >>> While I agree that BES should understand the notion of an "Application"
>> >>> (see e.g. JSDL ApplicationName), I don't agree that the BES should
>> >>> use these for Scheduling. Rather, this is the job of a broker.
>> >>     The text is now :
>> >>     * Resource selection :  The Execution Service MUST use, among others,
>> >> these references to 'Installed Applications' in order to select the most
>> >> adequate computing resource for the Job.
>> >>>
>> >>> 9 BES requirements applying to Accounting
>> >>>
>> >>> As a "MUST", these are out of scope, and should be made "SHOULD".
>> >>     No Accounting -->  No precise reporting to funding agencies -->  No
>> >> funding -->  Abrupt and very long shutdown of all services.
>> >>     I fully confirm that Accounting is a MUST.
>> >>
>> >
>> > ... operational
>> >
>> >>>
>> >>> 10 Logging/Bookkeeping
>> >>>
>> >>> Same as 9.
>> >>     Same as 'Traceability' :
>> >>     This is an operational concern :  Would you really take the risk that
>> >> the whole EGI becomes a spambot or a scambot ?
>> >>     No Logging and Bookkeeping -->  No post mortem analysis after attack
>> >> -->  Large infection -->  Panic -->  Abrupt and very long shutdown of all
>> >> services.
>> >>     I fully confirm that Logging and Bookkeeping is a MUST.
>> >>
>> >
>> > an operational MUST maybe for some infrastructures, not all.
>> >
>> > E.g. a typical HPC site has its own accounting, its own logging systems
>> > independent of the (Grid) software used to submit jobs to it.
>> >
>> >>>
>> >>> 12 Jobs
>> >>>
>> >>> 12.1 Types of job
>> >>>
>> >>> Support for parallel jobs: it should be "MUST" :-)
>> >>     The text is now :
>> >>     - The concept of 'Single Job' includes Jobs running
>> >> massively-parallel processes using MPI on one large-scale HPC System.
>> >> The Execution Service MUST understand instructions for usage of MPI
>> >> inside the Job Description document.  The Execution Service SHOULD
>> >> transmit these instructions to the Batch System, or return an explicit
>> >> error message if not supported.
>> >
>> > OK
>> >
>> >
>> >
>> > Best regards,
>> > Bernd.
>> >
>> >
>> > --
>> > Dr. Bernd Schuller
>> > Federated Systems and Data
>> > Juelich Supercomputing Centre, http://www.fz-juelich.de/jsc
>> > Phone: +49 246161-8736 (fax -8556)
>> >
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------------------------
>> > ------------------------------------------------------------------------------------------------
>> > Forschungszentrum Juelich GmbH
>> > 52425 Juelich
>> > Sitz der Gesellschaft: Juelich
>> > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>> > Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
>> > Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
>> > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
>> > Prof. Dr. Sebastian M. Schmidt
>> > ------------------------------------------------------------------------------------------------
>> > ------------------------------------------------------------------------------------------------
>>
>
> --
> Dr. Bernd Schuller
> Federated Systems and Data
> Juelich Supercomputing Centre, http://www.fz-juelich.de/jsc
> Phone: +49 246161-8736 (fax -8556)
>
>
>
>



-- 
Nothing is ever easy...


More information about the Pgi-wg mailing list