[Pgi-wg] OGF OGSA-BES - Requirements for an improved Basic Execution Service

Thu Sep 15 15:44:35 CDT 2011

Hi Etienne,

thanks for the clarifications. So indeed your document is aimed at both:

1) providing requirements for the actual BES specification ("client
interface" in your terminology)
2) the operation and deployment issues that have to be solved for
interoperability on an infrastructure level (say EGI and EDGI).

It would be very beneficial for further progress if these two distinct
concerns could be separated, at least CLEARLY marked in your document.

I have added some more comments inline.

On Thu, 2011-09-15 at 19:27 +0200, Etienne URBAH wrote:

> Concerning the document named 'Requirements for an improved Basic
> Execution Service (BES)' and available at
> http://forge.gridforum.org/sf/go/doc16306 :
>
> THANK YOU VERY MUCH for having taken the time to read this document, and
> for having taken the time to provide comments :
>
> Such comments are very useful for the improvement of documents, permit
> convergence and prepare later agreement.
> [...]
> On Thu, 15/09/2011 13:00, Bernd Schuller wrote:
> >
> >[...]
>
> > 1.4 Methodology
> >
> >   * ref 4: glite user guide ->  out of scope, gLite is too complex and
> > too specific in its architecture (if there is such a thing) to be useful
> > as a base for BES
>    Yes, this is a very large user guide for specific usage of gLite by
> users, and it does NOT provide a clear description of the architecture
> and of the functionalities.
>    But it is NOT so complex.  As soon as I finished reading this guide,
> it was easy for me to perform reverse engineering and extract the
> effective architecture (SOA with internal interfaces) and the implied
> functionalities (which are to be improved).

Indeed it appears that you try to impose gLite specifics (like a logging
& bookkeeping service or proxy certificates on the transport level) as
requirements. This would severly limit the BES specification effort, and
will not be accepted (I hope) by other stakeholders.

>    In the text, I have prepended a few words to explain that.

Basically you imply that the "architecture and functionalities" of the
gLite execution system (together with the PGI work) is somehow the
guideline to be followed, which I fully disagree with.

> >
> > 2.4 Collaboration with other services
> >
> > While this is important for interoperability, it is unimportant
> > for the specification of a BES. The BES spec should NOT try to specify
> > all the interaction with the rest of the world. This is the task of a
> > "grid architecture specification" like OGSA.
>    My document is NOT targeted only to the specification of the BES
> Client interface, but to the clear and consistent description of BES
> context and functional + operational requirements which are really
> necessary for interoperability.
>    As far as I know, OGSA does NOT take into account GLUE 2.0 yet.
> Therefore, an up to date 'grid architecture specification' is absolutely
> necessary.

Glue2 is just an information model, not necessarily a perfect one nor
the only one. However, I agree an information model has to be adopted
for BES and any associated information systems.

>    If OGSA members consider chapters 2.3 and 2.4 of my document as a
> 'grid architecture specification' which updates and improves OGSA, I
> thank them.  If they consider that this 'grid architecture
> specification' does NOT comply with OGSA and competes with it, then I
> assert that it obsoletes OGSA.

I can't really say, the OGSA group stopped its work a long time ago and
it's a long time that I looked at the documents.

> >
> > Specifically, the interactions with security, monitoring, accounting and
> > logging framework are OPERATIONAL concerns that MUST NOT be a mandatory
> > part of a BES specification.
>    FAILURE of practical operations is often caused by LACK of early care
> about operational concerns during specification phase.

Agreed.

>  As GIN-GC has
> proven and documented, this is even more true for interoperability on
> the field (as opposed to theoretical interoperability at the WSDL level).
>    I confirm that care about operational concerns is REQUIRED for real
> operations and for practical interoperability on the field.  Although
> operational concerns are NOT part of the BES Client interface, they are
> REQUIRED for the overall specifications of BES in its context.
>    In the text, I have stressed that the document DOES take into account
> operational concerns.
>
> >
> > 4. BES non-functional requirements
> >
> > 4.1.2 Traceability - should be SHOULD not MUST
>    This is an operational concern :  Would you really take the risk that
> the whole EGI becomes a spambot or a scambot ?

Isn't it already, powered by gLite and used by the wlcg botnet (just
kidding of course) :-)

>    No traceability --> No post mortem analysis after attack --> Large
> infection --> Panic --> Abrupt and very long shutdown of all services.
>    I fully confirm that traceability is a MUST.

It is an internal detail which any good implementation will provide.
If BES-A is much easier and more secure to operate than BES-B, admins
can choose which to install.

> >
> > 4.1.3 Security
> >   - how can a specification "implement" a policy? You probably
> >     meant "BES implementations SHOULD ..."
>    The text now is 'BES specifications MUST fully take into account the
> Security Policies ...'

Still no understanding here... let's take traceability
The relevant EGI policy
<https://documents.egi.eu/public/ShowDocument?docid=81> says

"[...] software deployed in the Grid MUST include the
ability to produce sufficient and relevant logging [...]
The level of the logging MUST be configured by all service providers,
including but not limited to the Sites, to produce the required
information which MUST be retained for a minimum of 90 days."

For example all UNICORE services can be made to comply with this
by configuration of the logging library we use (Apache Log4j), and by
not deleting log files for 90 days.
So this is a feature of the implementation and the administrator in
charge, not the specification. Thus, your sentence should read "BES
implementations SHOULD ..." (It is MUST of course only if they want to
be deployed in EGI)
One does not try to specify implementation details, at least not in any
specification I've ever seen (e.g. does the HTTP specification say
anything about server logging or accepted CAs?).

>[...]
> >
> > 4.2
> >   all of this is out of scope. For example the UNICORE service
> > container hosts a number of services including an execution service.
> > Probably you mean that the execution service SPECIFICATION should be
> > limited to the execution service and MUST NOT specify accounting,
> > security etc.
>    'Well defined and narrow scope' is a general engineering requirement.
>   It is fundamental concern crossing requirements, design,
> specifications, fabrication, tests, operations, user experience and
> product maintenance for all types of products, even outside software
> engineering.

exactly.

>    I confirm that 'Well defined and narrow scope' is absolutely REQUIRED
> for sound software design, implementation, deployment and maintainability.
>    From your comment, I assume that the Execution Service of UNICORE
> does have a 'Well defined and narrow scope', does have precise
> interfaces with other UNICORE services, and minimizes overlaps with them.

of course. And UNICORE does not include a L&B service :-)

> [...]
> > 6.1
> >
> >   * "SSL certificates MUST be signed by a CA..." this is an operational
> > decision, and has nothing to do with the BES spec.
> > For example, a site may run an inhouse deployment of BES using an
> > in-house CA. This requirement should be deleted.
>    This operational concern is REQUIRED for practical interoperability
> on the field.  I have prepended :
>    * Authentication of Servers :  The Execution Service SHOULD permit
> Clients to authenticate it.  If an Execution Service authenticates
> itself to clients, it MUST permit Clients to really perform this
> authentication.

This sentence makes no sense to me, sorry.
Maybe "Server and client SHOULD communicate via a secure channel
(SSL/TLS)". Even this may not be true in the future, though it is for
all(?) the Grid systems currently.

> >
> > 6.3
> >
> > * "For Client authentication, the Execution Service MUST accept all
> > following authentication methods:  Full X509,  RFC-3820-compliant X509
> > Proxy"
> >
> > This requirement is invalid. I agree that it would be nice to be able to
> > specify authentication methods, but it is impossible. For example
> > Shibboleth, Username/password, OpenID, OAuth (e.g. for a REST interface
> > over plain HTTP), or even NOTHING (e.g. in an inhouse grid) all can be
> > valid authentication methods in some circumstances.
>    There are 2 separate requirements :
>    - 1 'MUST' for Full X509 and RFC-3820-compliant X509 Proxy
>    - 1 'MAY'  for all other ones.
> >
> > Furthermore, making proxies a MUST implies that nonstandard
> > authentication libraries instead of TLS/SSL must be used, making the BES
> > implementation insecure. For some implementors (like UNICORE) proxies on
> > the transport level are very much a no-go.
>    I had clearly specified RFC-3820-compliant X509 Proxy, which ARE
> standard.
>    Your critics are valid for GSI proxies, which I have taken care NOT
> to mention.

by "standard" I did not mean that it is an RFC, but software support.
As opposed to standard SSL/TLS, proxies are almost not supported by
industry standard tools, for example Apache httpd or the Java JDK.
One has to rely on custom code, which is notoriously buggy and error
prone.

Since one important non-functional requirement (for me at least) is to
be able use standard (off-the-shelf) open source software, having to
support proxies is a big limitation.

> > 6.4.
> >
> > "This authorization mechanism MUST be consistent across all instances of
> > the Execution Service"
> >
> > This violates the autonomy of a site. Site administrators often wish to
> > stay in control of their resources, and do not accept external
> > authorisation decision points. And anyway, who cares? Since the AuthZ
> > mechanism is internal to the BES, it cannot be specified in the
> > BES spec as such.
>    Interoperability requires a federation of independent administrative
> domain to agree on common functionalities, interfaces and operations.
>    This DOES sometime violate the autonomy of each individual site.
>    The requirement is NOT that the AUTHZ decision point is external to
> any site, but that all participating site MUST accept to install inside
> their site an instance of a commonly validated software implementing the
> decision point.

No. Each site may choose their own authz decision point, IMO.

>    The AUTHZ mechanism MUST NOT be internal to the BES :

Maybe I was not clear. The authz mechanism is invisible for outside
parties (like clients). It can be an external component, an internal
component, whatever, it is up to the BES implementor and the site admin.

> For example,
> in UNICORE atomic services, the 'de.fzj.unicore.uas.security' package
> is described as 'The security subsystem of UNICORE/X', and is NOT
> internal to 'de.fzj.unicore.uas.impl.job'.

In UNICORE site admins can choose what attribute sources and XACML
decision points they want to use, but the clients (including other
services) do not need to know this. That is what I meant by "internal".

> >
> > 6.5
> >
> > These are reqiurements on the security layer (or framework) and should
> > not be used as requirements on BES.
>    These security requirements DO have impacts on the BES Client
> interface and on the Job Description document.
>    In the text, I have made it clear.

indeed while preparing the EMI-ES specification, we came to the
following conclusions
1) when using proxies for delegation, it is necessary to map each data
staging item to a delegated credential (you can check the EMI-ES job
description for details)
2) the delegation operations are separate from the job management
operations, so they do not necessarily have to be part of the BES client
interface.

Also, there are existing implementations (UNICORE and Genesis come to my
mind) that do not need this at all, because they do delegation without
proxies.

So I disagree, 6.5 mostly describes features of the particular security
framework that is used.

> >
> > 8 BES requirements related to "Application Repositories"
> >
> > While I agree that BES should understand the notion of an "Application"
> > (see e.g. JSDL ApplicationName), I don't agree that the BES should
> > use these for Scheduling. Rather, this is the job of a broker.
>    The text is now :
>    * Resource selection :  The Execution Service MUST use, among others,
> these references to 'Installed Applications' in order to select the most
> adequate computing resource for the Job.
> >
> > 9 BES requirements applying to Accounting
> >
> > As a "MUST", these are out of scope, and should be made "SHOULD".
>    No Accounting --> No precise reporting to funding agencies --> No
> funding --> Abrupt and very long shutdown of all services.
>    I fully confirm that Accounting is a MUST.
>

... operational

> >
> > 10 Logging/Bookkeeping
> >
> > Same as 9.
>    Same as 'Traceability' :
>    This is an operational concern :  Would you really take the risk that
> the whole EGI becomes a spambot or a scambot ?
>    No Logging and Bookkeeping --> No post mortem analysis after attack
> --> Large infection --> Panic --> Abrupt and very long shutdown of all
> services.
>    I fully confirm that Logging and Bookkeeping is a MUST.
>

an operational MUST maybe for some infrastructures, not all.

E.g. a typical HPC site has its own accounting, its own logging systems
independent of the (Grid) software used to submit jobs to it.

> >
> > 12 Jobs
> >
> > 12.1 Types of job
> >
> > Support for parallel jobs: it should be "MUST" :-)
>    The text is now :
>    - The concept of 'Single Job' includes Jobs running
> massively-parallel processes using MPI on one large-scale HPC System.
> The Execution Service MUST understand instructions for usage of MPI
> inside the Job Description document.  The Execution Service SHOULD
> transmit these instructions to the Batch System, or return an explicit
> error message if not supported.

OK

Best regards,
Bernd.

--
Dr. Bernd Schuller
Federated Systems and Data
Juelich Supercomputing Centre, http://www.fz-juelich.de/jsc
Phone: +49 246161-8736 (fax -8556)

------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------