[caops-wg] OCSP requirements - final(?) version uploaded

Fri May 27 15:41:03 CDT 2005

I received 2 preliminary comments about our experimental OCSP
service.  I have a suggestion for another section and some other
comments by extension.  We (in DOEGrids) haven't had a customer
or community interest in this service for a while, so it is 
gratifying to finally have some one take an interest.  

1) Suspension of certificates
This is really a CA issue.   Security workers may want/need to
suspend a certificate or class of certificates during the
early phases of  incident handling.  Suspension acts like
revocation, but doesn't have to permanent (need CRLv2 to support
this.)

The relevance to OCSP is that in practice the OCSP servers can maintain more 
up to date information about revocations and provide a better
approximation of real-time revocation information than distributed CRL 
files.  This makes capabilities like certificate suspension more practical,
and can better address the needs of security workers for rapid revocation
information.  [I think text like this belongs in the intro, where OCSP
is being sold.  Also maybe in section 6.]

2) Proxy certs
Customer expressed interest in dealing with proxy certs.
The situation seems to be that in practice a chain of certs is
available, from the EE to the last proxy cert.  Our experimental
responder didn't deal with that well, apparentlhy peeling off the
first one and skipping any others (first one being the last proxy
cert).  

The standard reads (to me) as ambiguous about this, which surprised
me, because I expected the above behavior:
(RFC 2560 page 2)

The response for each of the certificates in a request consists of

   -- target certificate identifier
   -- certificate status value
   -- response validity interval
   -- optional extensions

Is this a bug in the spec, or the implementation?  What is the 
OCSP server "art" on this since this spec came out?   There is a 
remark about this somewhere in section 3 - it seems to be 
sort of out of place.

I think the document, good as it is, is missing some requirements
and recommendations language that is needed for whoever might want
to develop the  client side for OCSP in Grids, which, afaik,
is still pre-natal.  The first thing that comes to mind is
that the client section is still a little unclear.

*Nonce
There are a couple of remarks about nonces that I think the
sophisticated security worker - especially some of the ones I was hoping
to interest in this service - would not agree to.  I have no problem
with the language in 4.5 but the client recommendation somewhere in 
section 7 just says flat out don't do it -- seems contradictory.  There are circumstances
where real time is needed.  We need a nuanced nonce instead.

In 7.3, say 
OCSP clients are not recommended to include nonces except  ... - or -
OCSP clients should only include nonces  ... in requests to local Trusted responders
or other OCSP responders by prior agreement and consultation.  (See section 4.5.)

In 4.5 say
Some services may not support nonce requests, and in other cases it may
produce intolerable burden on the OCSP responder and delay for the client
application.  Nonces should only be used in situations where the most
up to date information is required, particularly to meet security requirements.

[Drop the "overkill" sentence - not useful.]

* How should CRL and OCSP and be used together?
4.2 talks about CRL's, as does 7.3, but most of the rest of the
doc seems to assume only OCSP will exist.  For example, 4.7 suggests
that 
In case the resulting status after an exhausted search is still
an error or status Unknown, the client SHOULD interpret that as Revoked 
with revocationReason certificateHold (that is, a non-definite revocation 
state), unless otherwise configured.

Ok, that works in some circumstances, and not in others.  

Preference for CRL and OCSP - no 1 right anser
Preferences depend on the circumstances and 
operating characteristics of each server or user application.

Experience with Grid / openssl use of CRLs and Netscape's 
OCSP client suggest to me that network failure and OCSP responder
timeout should be considered as "unknown - tryLayer" 
(we can agree to that  - similar to 4.7).

What about this:
Search revocation information in preference order
   clients should be able to chose OCSP, CRL, and order searched
   see also section 7.3
First "revoked" answer ends the search.

If no revoked status is returned, and all sources are exhausted,
and OCSP status was ambiguous, client response should be configurable.
Testing proxy certificates introduces a much higher likelihood of
status unknown, and unpredictable network and server issues can
provide timeouts or "tryAgain" responses.  Therefore the response 
must be tuned to local security requirements and expectations.

[Based on exp above I would have to differ strongly with the 4.7 and
security workers on this; we should avoid DoS wherever we can
since understanding and diagnosing these problems are very difficult
as is explaining them to frustated users and system administrators.
Perhaps, we can keep the recommendation, but disclose this risk.
In addition we might recommend that this particular case be 
distinctly logged so that resulting problems can be more quickly
diagnosed.]

*Cert suspension complicates the interaction with CRL's.  Suppose
a CRL indicates a certificate is suspended/revoked - should the
application consult the OCSP responder to see if the certificate
is still suspended?  I don't think we have the experience to 
answer this, but hopefully I'm wrong; but if that's the case then
we should note this potential DoS in a client section or #8.

*Bundles of certs/proxies
To deal with the proxy cert issue on the OCSP client side, developers
should be prepared parse collections of certs into a single (or
multiple) request(s).  This removes most of the proxy support problem from the 
client; if the responder knows about proxy certs, it can answer,
and "unknown" the others.  Perhaps, in order to deal with
limitations of some OCSP servers, the certs should be ordered 
appropriately to make sure the well-known EE cert is tested.

We might also want to tell developers and OCSP client configurers
OCSP queries on proxy certs are NOT RECOMMENDED and should 
be avoided, except by prior agreement and consultation with
specially configured OCSP responders that can deal with them.

On the responder side,
*Section 5.5 talks about support for proxy certs.  I obviously don't think it
is out of scope - we MUST talk about it - but it is indisputable we don't
know how to deal with them.  But punting completely is leaving a lot of
the potential value added of Grid OCSP out.

An additional sentence might be
It has been suggested that an OCSP responder could be configured to register
revocations of independent proxy certificates and return appropriate responses.
No commercial OCSP service known supports this kind of operation, but the
CRL management component of an OCSP server could be augmented with an additional
database and management interface.  This experimental service should be confined
to a Trusted Local responder.  "nextUpdate" should not be set for proxy certificates. 
Further specification of this experimental service and management interface are
out of scope [or something else indicating we can't provide more direction].

*Section 5.3
The sources must be properly protected against malice use....
Suggest [maybe it's overload]
The OCSP responder database must be protected properly.  In most cases
the database will be updated automatically, and adequate change control
and logging must be used to ensure data is obtained and loaded from a 
trusted source in a timely fashion.  Signatures on CRLs must be checked
and CRLs must be refreshed in a timely fashion.   OCSP responders must
ensure that proper change control and access controls are in place to 
prevent unauthorized addition or removal of certificate status information
from the database.  This is particularly important to any OCSP service
providing experimental support for proxy certificate verification (see section 5.5).

*5.1
While we do not require the use of hardware protection, 
we RECOMMEND that the security of the OCSP responder key 
be in parity with the CA issuing key.

[This is a meaningless recommendation, because we have no single
standard for CA issuing keys.  Also, there are differences in the
way OCSP and CA issuers are handled - you can change the OCSP
key pair every hour if you want.  Disagree about HSM.  I'm not sure what to say, but...
Suggest we just lift EUGridPMA spirit for this document:]
Access to OCSP responder keys  must be carefully controlled.
In all cases system level access to OCSP responder systems must
be limited and logged.    Access to key backup media must also
be limited and logged.   For OCSP responders using software crypto
stores, we recommend that this key NOT be backed up.  We also
recommend the key be changed more frequently than end user signing keys.
These steps can reduce but not eliminate the demand for hardware
crypto stores; hardware security modules should be used for high
visibilty OCSP responders (see also prev paragraph?). 
Service providers should also consider transponder configurations
to reduce the number of highly secured OCSP responder keys needed.

*The introduction - should it say something about providing direction for
developers and implementers of OCSP services in Grids?  Maybe it's self
evident but there is a lot of work for developers in particular in here
and it's not spelled out much.

*Fault tolerance &al is mentioned but I don't think it is discussed (maybe
indirectly in config rec on p 9)

In part, we can deal with this in the client by making them more robust
as discussed above.   Perhaps at the end of #5 a section about fault tolerance
or high availability:
something like
OCSP responders should be configured on a server with high availability
capability: redundant, failure-correcting/responding hardware components.
The OCSP responder system should be configured to automatically recover
and continue from a single failure of disks supporting the current
OCSP database, hardware security module, or other critical system component.
This might be particularly important for OCSP responders that operate in whole
or in part in transponder mode. [?]
In order to deal with site failures or network partitioning,  OCSP service
providers should provision multiple, topologically  and geographcally dispersed 
OCSP responders with mirrored OCSP databases and configuration.  If possible,
WAN high availability capability should be employed.