[Pgi-wg] OGF OGSA-BES - Requirements for an improved Basic Execution Service

Fri Sep 16 08:26:30 CDT 2011

hi Etienne,

On Fri, 2011-09-16 at 13:32 +0200, Etienne URBAH wrote:
[...] 
> 
> RFC-3820-compliant X509 Proxies
> -------------------------------
> The RFC-3820-compliant X509 proxies are fully supported by the jLite 
> library written in Java by Oleg SUKHOROSLOV and available at 
> http://code.google.com/p/jlite/
> 

You must be joking. I was talking about open source software like Apache
httpd and Java JDK. jLite just wraps the cog-globus libraries and adds
some gLite access APIs. No, thanks.

> 
> Dependency of BES on other grid software components / operational issues
> ------------------------------------------------------------------------
> It is very good that we know agree on GLUE 2.0 as base for the 
> Information System.  Otherwise, we could NOT agree on the way to express 
> references to grid entities in the Job Description document.
> 
> Your comments about chapter 6.5 confirm that the specifications of the 
> BES Client interface DEPENDS on whether BES supports X509 proxies for 
> delegation of Security credentials or NOT.
> 

At least this was the EMI-ES v1.0 conclusion, which need not be the
final word. The only dependency (in EMI-ES) is the specification which
delegated credential is to be used for which data staging item.

Delegation can be performed FULLY TRANSPARENT to the BES (for example on
a message level as in UNICORE), and the BES interface specification is
not dependent on it at all.

> Since delegation of Security credentials is a MUST for BES, my 
> conclusion is that we MUST agree on SECURITY issues (even if some are 
> operational issues) BEFORE trying to write down BES requirements.
> 
> In the same way, we know that Clients need to perform complex queries on 
> Jobs. The BES Client interface DEPENDS on whether such queries are 
> accepted by BES itself, of by a separate Logging and Bookkeeping 
> service.  So, I think that we have to agree on the existence or absence 
> of a separate Logging and Bookkeeping service BEFORE trying to write 
> down BES requirements about queries.

The BES client interface does not necessarily need to allow to perform
complex queries, these should be part of a separate interface. 

> As a summary :
> -  Some BES requirements are quite independent from other grid 
> components, and we can discuss on them immediately.
> -  But some other BES requirements are DEPENDENT from foundational grid 
> components or operational issues, in particular Information System, 
> Security, Logging and Bookkeeping, ...
> -  Therefore, we have to agree on these other grid components or 
> operational issues FIRST.
> 
> This is a critical issue, and I propose that we discuss on it at OGF33.

Unfortunately I won't be in Lyon, only via phone.

Summarising, my main points for the BES interface specification :

 * specifications must be narrowly scoped and composable (which is the
JSDL/BES model anyway).

 * do not try to specify specific authentication methods. Do not try to
specify specific delegation methods. Security is a cross cutting concern
and should be dealt with separately.

 * do not assume a special environment where a BES instance will run.
Interactions with other services (except for data access) are optional
and should be treated as such.

 * leave operational aspects to operations. Recommendations for BES
implementors may be given, of course.

Best regards,
Bernd.

> 
> I will answer to your other comments later.
> 
> Best regards.
> 
> -----------------------------------------------------
> Etienne URBAH         LAL, Univ Paris-Sud, IN2P3/CNRS
>                        Bat 200   91898 ORSAY    France
> Tel: +33 1 64 46 84 87      Skype: etienne.urbah
> Mob: +33 6 22 30 53 27      mailto:urbah at lal.in2p3.fr
> -----------------------------------------------------
> 
> 
> On Thu, 15/09/2011 22:44, Bernd Schuller wrote:
> > Hi Etienne,
> >
> > thanks for the clarifications. So indeed your document is aimed at both:
> >
> > 1) providing requirements for the actual BES specification ("client
> > interface" in your terminology)
> > 2) the operation and deployment issues that have to be solved for
> > interoperability on an infrastructure level (say EGI and EDGI).
> >
> > It would be very beneficial for further progress if these two distinct
> > concerns could be separated, at least CLEARLY marked in your document.
> >
> > I have added some more comments inline.
> >
> > On Thu, 2011-09-15 at 19:27 +0200, Etienne URBAH wrote:
> >
> >> Concerning the document named 'Requirements for an improved Basic
> >> Execution Service (BES)' and available at
> >> http://forge.gridforum.org/sf/go/doc16306 :
> >>
> >> THANK YOU VERY MUCH for having taken the time to read this document, and
> >> for having taken the time to provide comments :
> >>
> >> Such comments are very useful for the improvement of documents, permit
> >> convergence and prepare later agreement.
> >> [...]
> >> On Thu, 15/09/2011 13:00, Bernd Schuller wrote:
> >>>
> >>> [...]
> >>
> >>> 1.4 Methodology
> >>>
> >>>    * ref 4: glite user guide ->   out of scope, gLite is too complex and
> >>> too specific in its architecture (if there is such a thing) to be useful
> >>> as a base for BES
> >>     Yes, this is a very large user guide for specific usage of gLite by
> >> users, and it does NOT provide a clear description of the architecture
> >> and of the functionalities.
> >>     But it is NOT so complex.  As soon as I finished reading this guide,
> >> it was easy for me to perform reverse engineering and extract the
> >> effective architecture (SOA with internal interfaces) and the implied
> >> functionalities (which are to be improved).
> >
> > Indeed it appears that you try to impose gLite specifics (like a logging
> > &  bookkeeping service or proxy certificates on the transport level) as
> > requirements. This would severly limit the BES specification effort, and
> > will not be accepted (I hope) by other stakeholders.
> >
> >>     In the text, I have prepended a few words to explain that.
> >
> > Basically you imply that the "architecture and functionalities" of the
> > gLite execution system (together with the PGI work) is somehow the
> > guideline to be followed, which I fully disagree with.
> >
> >>>
> >>> 2.4 Collaboration with other services
> >>>
> >>> While this is important for interoperability, it is unimportant
> >>> for the specification of a BES. The BES spec should NOT try to specify
> >>> all the interaction with the rest of the world. This is the task of a
> >>> "grid architecture specification" like OGSA.
> >>     My document is NOT targeted only to the specification of the BES
> >> Client interface, but to the clear and consistent description of BES
> >> context and functional + operational requirements which are really
> >> necessary for interoperability.
> >>     As far as I know, OGSA does NOT take into account GLUE 2.0 yet.
> >> Therefore, an up to date 'grid architecture specification' is absolutely
> >> necessary.
> >
> > Glue2 is just an information model, not necessarily a perfect one nor
> > the only one. However, I agree an information model has to be adopted
> > for BES and any associated information systems.
> >
> >>     If OGSA members consider chapters 2.3 and 2.4 of my document as a
> >> 'grid architecture specification' which updates and improves OGSA, I
> >> thank them.  If they consider that this 'grid architecture
> >> specification' does NOT comply with OGSA and competes with it, then I
> >> assert that it obsoletes OGSA.
> >
> > I can't really say, the OGSA group stopped its work a long time ago and
> > it's a long time that I looked at the documents.
> >
> >>>
> >>> Specifically, the interactions with security, monitoring, accounting and
> >>> logging framework are OPERATIONAL concerns that MUST NOT be a mandatory
> >>> part of a BES specification.
> >>     FAILURE of practical operations is often caused by LACK of early care
> >> about operational concerns during specification phase.
> >
> > Agreed.
> >
> >>   As GIN-GC has
> >> proven and documented, this is even more true for interoperability on
> >> the field (as opposed to theoretical interoperability at the WSDL level).
> >>     I confirm that care about operational concerns is REQUIRED for real
> >> operations and for practical interoperability on the field.  Although
> >> operational concerns are NOT part of the BES Client interface, they are
> >> REQUIRED for the overall specifications of BES in its context.
> >>     In the text, I have stressed that the document DOES take into account
> >> operational concerns.
> >>
> >>>
> >>> 4. BES non-functional requirements
> >>>
> >>> 4.1.2 Traceability - should be SHOULD not MUST
> >>     This is an operational concern :  Would you really take the risk that
> >> the whole EGI becomes a spambot or a scambot ?
> >
> > Isn't it already, powered by gLite and used by the wlcg botnet (just
> > kidding of course) :-)
> >
> >>     No traceability -->  No post mortem analysis after attack -->  Large
> >> infection -->  Panic -->  Abrupt and very long shutdown of all services.
> >>     I fully confirm that traceability is a MUST.
> >
> > It is an internal detail which any good implementation will provide.
> > If BES-A is much easier and more secure to operate than BES-B, admins
> > can choose which to install.
> >
> >>>
> >>> 4.1.3 Security
> >>>    - how can a specification "implement" a policy? You probably
> >>>      meant "BES implementations SHOULD ..."
> >>     The text now is 'BES specifications MUST fully take into account the
> >> Security Policies ...'
> >
> > Still no understanding here... let's take traceability
> > The relevant EGI policy
> > <https://documents.egi.eu/public/ShowDocument?docid=81>  says
> >
> > "[...] software deployed in the Grid MUST include the
> > ability to produce sufficient and relevant logging [...]
> > The level of the logging MUST be configured by all service providers,
> > including but not limited to the Sites, to produce the required
> > information which MUST be retained for a minimum of 90 days."
> >
> > For example all UNICORE services can be made to comply with this
> > by configuration of the logging library we use (Apache Log4j), and by
> > not deleting log files for 90 days.
> > So this is a feature of the implementation and the administrator in
> > charge, not the specification. Thus, your sentence should read "BES
> > implementations SHOULD ..." (It is MUST of course only if they want to
> > be deployed in EGI)
> > One does not try to specify implementation details, at least not in any
> > specification I've ever seen (e.g. does the HTTP specification say
> > anything about server logging or accepted CAs?).
> >
> >
> >
> >> [...]
> >>>
> >>> 4.2
> >>>    all of this is out of scope. For example the UNICORE service
> >>> container hosts a number of services including an execution service.
> >>> Probably you mean that the execution service SPECIFICATION should be
> >>> limited to the execution service and MUST NOT specify accounting,
> >>> security etc.
> >>     'Well defined and narrow scope' is a general engineering requirement.
> >>    It is fundamental concern crossing requirements, design,
> >> specifications, fabrication, tests, operations, user experience and
> >> product maintenance for all types of products, even outside software
> >> engineering.
> >
> > exactly.
> >
> >>     I confirm that 'Well defined and narrow scope' is absolutely REQUIRED
> >> for sound software design, implementation, deployment and maintainability.
> >>     From your comment, I assume that the Execution Service of UNICORE
> >> does have a 'Well defined and narrow scope', does have precise
> >> interfaces with other UNICORE services, and minimizes overlaps with them.
> >
> > of course. And UNICORE does not include a L&B service :-)
> >
> >
> >> [...]
> >>> 6.1
> >>>
> >>>    * "SSL certificates MUST be signed by a CA..." this is an operational
> >>> decision, and has nothing to do with the BES spec.
> >>> For example, a site may run an inhouse deployment of BES using an
> >>> in-house CA. This requirement should be deleted.
> >>     This operational concern is REQUIRED for practical interoperability
> >> on the field.  I have prepended :
> >>     * Authentication of Servers :  The Execution Service SHOULD permit
> >> Clients to authenticate it.  If an Execution Service authenticates
> >> itself to clients, it MUST permit Clients to really perform this
> >> authentication.
> >
> > This sentence makes no sense to me, sorry.
> > Maybe "Server and client SHOULD communicate via a secure channel
> > (SSL/TLS)". Even this may not be true in the future, though it is for
> > all(?) the Grid systems currently.
> >
> >>>
> >>> 6.3
> >>>
> >>> * "For Client authentication, the Execution Service MUST accept all
> >>> following authentication methods:  Full X509,  RFC-3820-compliant X509
> >>> Proxy"
> >>>
> >>> This requirement is invalid. I agree that it would be nice to be able to
> >>> specify authentication methods, but it is impossible. For example
> >>> Shibboleth, Username/password, OpenID, OAuth (e.g. for a REST interface
> >>> over plain HTTP), or even NOTHING (e.g. in an inhouse grid) all can be
> >>> valid authentication methods in some circumstances.
> >>     There are 2 separate requirements :
> >>     - 1 'MUST' for Full X509 and RFC-3820-compliant X509 Proxy
> >>     - 1 'MAY'  for all other ones.
> >>>
> >>> Furthermore, making proxies a MUST implies that nonstandard
> >>> authentication libraries instead of TLS/SSL must be used, making the BES
> >>> implementation insecure. For some implementors (like UNICORE) proxies on
> >>> the transport level are very much a no-go.
> >>     I had clearly specified RFC-3820-compliant X509 Proxy, which ARE
> >> standard.
> >>     Your critics are valid for GSI proxies, which I have taken care NOT
> >> to mention.
> >
> > by "standard" I did not mean that it is an RFC, but software support.
> > As opposed to standard SSL/TLS, proxies are almost not supported by
> > industry standard tools, for example Apache httpd or the Java JDK.
> > One has to rely on custom code, which is notoriously buggy and error
> > prone.
> >
> > Since one important non-functional requirement (for me at least) is to
> > be able use standard (off-the-shelf) open source software, having to
> > support proxies is a big limitation.
> >
> >>> 6.4.
> >>>
> >>> "This authorization mechanism MUST be consistent across all instances of
> >>> the Execution Service"
> >>>
> >>> This violates the autonomy of a site. Site administrators often wish to
> >>> stay in control of their resources, and do not accept external
> >>> authorisation decision points. And anyway, who cares? Since the AuthZ
> >>> mechanism is internal to the BES, it cannot be specified in the
> >>> BES spec as such.
> >>     Interoperability requires a federation of independent administrative
> >> domain to agree on common functionalities, interfaces and operations.
> >>     This DOES sometime violate the autonomy of each individual site.
> >>     The requirement is NOT that the AUTHZ decision point is external to
> >> any site, but that all participating site MUST accept to install inside
> >> their site an instance of a commonly validated software implementing the
> >> decision point.
> >
> > No. Each site may choose their own authz decision point, IMO.
> >
> >>     The AUTHZ mechanism MUST NOT be internal to the BES :
> >
> > Maybe I was not clear. The authz mechanism is invisible for outside
> > parties (like clients). It can be an external component, an internal
> > component, whatever, it is up to the BES implementor and the site admin.
> >
> >> For example,
> >> in UNICORE atomic services, the 'de.fzj.unicore.uas.security' package
> >> is described as 'The security subsystem of UNICORE/X', and is NOT
> >> internal to 'de.fzj.unicore.uas.impl.job'.
> >
> > In UNICORE site admins can choose what attribute sources and XACML
> > decision points they want to use, but the clients (including other
> > services) do not need to know this. That is what I meant by "internal".
> >
> >>>
> >>> 6.5
> >>>
> >>> These are reqiurements on the security layer (or framework) and should
> >>> not be used as requirements on BES.
> >>     These security requirements DO have impacts on the BES Client
> >> interface and on the Job Description document.
> >>     In the text, I have made it clear.
> >
> > indeed while preparing the EMI-ES specification, we came to the
> > following conclusions
> > 1) when using proxies for delegation, it is necessary to map each data
> > staging item to a delegated credential (you can check the EMI-ES job
> > description for details)
> > 2) the delegation operations are separate from the job management
> > operations, so they do not necessarily have to be part of the BES client
> > interface.
> >
> > Also, there are existing implementations (UNICORE and Genesis come to my
> > mind) that do not need this at all, because they do delegation without
> > proxies.
> >
> > So I disagree, 6.5 mostly describes features of the particular security
> > framework that is used.
> >
> >>>
> >>> 8 BES requirements related to "Application Repositories"
> >>>
> >>> While I agree that BES should understand the notion of an "Application"
> >>> (see e.g. JSDL ApplicationName), I don't agree that the BES should
> >>> use these for Scheduling. Rather, this is the job of a broker.
> >>     The text is now :
> >>     * Resource selection :  The Execution Service MUST use, among others,
> >> these references to 'Installed Applications' in order to select the most
> >> adequate computing resource for the Job.
> >>>
> >>> 9 BES requirements applying to Accounting
> >>>
> >>> As a "MUST", these are out of scope, and should be made "SHOULD".
> >>     No Accounting -->  No precise reporting to funding agencies -->  No
> >> funding -->  Abrupt and very long shutdown of all services.
> >>     I fully confirm that Accounting is a MUST.
> >>
> >
> > ... operational
> >
> >>>
> >>> 10 Logging/Bookkeeping
> >>>
> >>> Same as 9.
> >>     Same as 'Traceability' :
> >>     This is an operational concern :  Would you really take the risk that
> >> the whole EGI becomes a spambot or a scambot ?
> >>     No Logging and Bookkeeping -->  No post mortem analysis after attack
> >> -->  Large infection -->  Panic -->  Abrupt and very long shutdown of all
> >> services.
> >>     I fully confirm that Logging and Bookkeeping is a MUST.
> >>
> >
> > an operational MUST maybe for some infrastructures, not all.
> >
> > E.g. a typical HPC site has its own accounting, its own logging systems
> > independent of the (Grid) software used to submit jobs to it.
> >
> >>>
> >>> 12 Jobs
> >>>
> >>> 12.1 Types of job
> >>>
> >>> Support for parallel jobs: it should be "MUST" :-)
> >>     The text is now :
> >>     - The concept of 'Single Job' includes Jobs running
> >> massively-parallel processes using MPI on one large-scale HPC System.
> >> The Execution Service MUST understand instructions for usage of MPI
> >> inside the Job Description document.  The Execution Service SHOULD
> >> transmit these instructions to the Batch System, or return an explicit
> >> error message if not supported.
> >
> > OK
> >
> >
> >
> > Best regards,
> > Bernd.
> >
> >
> > --
> > Dr. Bernd Schuller
> > Federated Systems and Data
> > Juelich Supercomputing Centre, http://www.fz-juelich.de/jsc
> > Phone: +49 246161-8736 (fax -8556)
> >
> >
> >
> >
> > ------------------------------------------------------------------------------------------------
> > ------------------------------------------------------------------------------------------------
> > Forschungszentrum Juelich GmbH
> > 52425 Juelich
> > Sitz der Gesellschaft: Juelich
> > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> > Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
> > Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
> > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> > Prof. Dr. Sebastian M. Schmidt
> > ------------------------------------------------------------------------------------------------
> > ------------------------------------------------------------------------------------------------
> 

-- 
Dr. Bernd Schuller
Federated Systems and Data
Juelich Supercomputing Centre, http://www.fz-juelich.de/jsc
Phone: +49 246161-8736 (fax -8556)