[occi-wg] Is HTTP the HTTP of cloud computing?

Mon May 25 11:14:58 CDT 2009

And finally the conclusion:
Is HTTP the HTTP of cloud
computing?<http://samj.net/2009/05/is-http-http-of-cloud-computing.html>Ok
so after asking Is OCCI the HTTP of cloud
computing?<http://samj.net/2009/05/is-occi-http-of-cloud-computing.html>I
realised that the position may have already been filled and that the
question was more Is AtomPub already the HTTP of cloud
computing?<http://samj.net/2009/05/is-atompub-already-http-of-cloud.html>

After all my strategy for OCCI was to follow Google's example with GData by
adding some necessary functionality (a search interface, caching directives,
resource-specific attributes, etc.). Most of the heavy lifting was actually
being done by AtomPub, thus avoiding a huge amount of tedious and
error-prone protocol writing (around 20,000 words of it) - something which
OGF and the OCCI working group isn't really geared up for anyway. This is
clearly a workable and well-proven approach as it as been adopted
strategically by both Microsoft and Google and also tactically by Salesforce
and IBM, among others. Best of all adding things like
queries<http://code.google.com/apis/gdata/docs/2.0/reference.html#Queries>and
versioning<http://code.google.com/apis/gdata/docs/2.0/reference.html#ResourceVersioning>is
a manageable workload while starting from scratch is most certainly
not.

But what if there were an easier way? Recall that the problem we are trying
to solve is exposing a flexible interface to an arbitrarily large collection
of interconnected compute, storage and network resources. We need to be able
to describe and manipulate the resources (CRUD), associate them with each
other via rich links (e.g. links with attributes like local identifiers -
eth0, sda, etc.) and change their state (start, stop, restart, etc.), among
other things.

Representational State Transfer (REST)

Actually we're not talking about exposing the resources themselves (that
would be impossible) but various *representations* of those resources - like
Plato's shadows on the cave
walls<http://en.wikipedia.org/wiki/Allegory_of_the_cave>- this is the
"REpresentational" in "REpresentational State Transfer
(REST)". There's an infinite number of possible representations so it's
impossible to try to capture them all now, but here's some examples:

   - An Open Virtualisation Format (OVF) serialisation of a compute resource
   - A platform-specific descriptor file (e.g. VMX)
   - A complete archive of the virtual machine with its dependencies (OVA)
   - A graphical image of the console at a given point in time ('snapshot')
   - A video stream of the console for archiving/audit purposes (ala
   Citrix's Project
Iris<http://www.brianmadden.com/blogs/brianmadden/archive/2005/06/08/project-iris-becoming-a-reality-tech-preview-coming-next-month.aspx>
   )
   - The console itself (e.g. SSH, ICA, RDP, VNC)
   - Build documentation (e.g. PDF, ODF)
   - Esoteric enterprise requirements (e.g. NMS configuration)

It doesn't take a rocket scientist to spot the correlation between this and
HTTP's existing content negotiation functionality (whereby a client can ask
for a specific representation of a given resource - e.g. HTML vs PDF) so
this is already pretty much solved for us (see HTTP's Accept:
header<http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html>for the
details). For bonus points this information should be exposed in the
URI as it's not always possible or convenient to set headers ala:

   - http://example.com/.atom (using filename extensions)
   - http://example.com/;content-type=text/html (using the full Internet
   media type)

Web Linking

But what about the links? As I explained
yesterday<http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html>the
web is built on links embedded in HTML documents using the A tag. Atom
also provides enhanced linking functionality via the LINK element, where it
is also possible to specify content types, languages, etc. In this case
however we want to allow resources to be arbitrary types and more often than
not we won't have the ability to link within the payload itself. This leaves
us with two options: put the links in the payload anyway by relying on a
meta-model like Atom (or one we roll ourselves) or find some way to
represent them within HTTP itself.

Enter HTTP headers which are also extensible and, as it turns out, in the
process of being extended (or at least refined) to handle this very
requirement by fellow down under, Mark Nottingham <http://www.mnot.net/>.
See the "Web Linking" IETF Internet-Draft
(draft-nottingham-http-link-header<https://datatracker.ietf.org/drafts/draft-nottingham-http-link-header/>,
at the time of writing version
05<http://www.ietf.org/internet-drafts/draft-nottingham-http-link-header-05.txt>)
for the nitty gritty details and the ietf-http-wg
list<http://lists.w3.org/Archives/Public/ietf-http-wg/>for
some <http://lists.w3.org/Archives/Public/ietf-http-wg/2009AprJun/0211.html>
current<http://lists.w3.org/Archives/Public/ietf-http-wg/2009AprJun/0196.html>
discussions<http://lists.w3.org/Archives/Public/ietf-http-wg/2009AprJun/0103.html>.
Basically it clarifies the existing Link:
headers<http://tools.ietf.org/html/rfc2068#section-19.6.2.4>and the
result looks something like this:

Link: <http://example.com/TheBook/chapter2>; rel="previous";
title="previous chapter"

The Link: header itself is also extensible so we can faithfully represent
our model by adding e.g. the local device name when linking storage and
network resources to compute resources and other requisite attributes. It
would be helpful if the content-type were also specified (Atom allows for
multiple links of the same relation provided the content-type differs for
example) but language is already covered by HTTP (it doesn't seem useful to
advertise French links to someone who already asked to speak English).

It's also interesting to note that earlier versions of the HTTP RFCs
actually [poorly] specified both the Link:
headers<http://tools.ietf.org/html/rfc2068#section-19.6.2.4>as well as
LINK <http://tools.ietf.org/html/rfc2068#section-19.6.1.2> and
UNLINK<http://tools.ietf.org/html/rfc2068#section-19.6.1.3>methods for
maintaining links between web resources. John Pritchard had a
crack at clarification in the Efficient HyperLink Maintenance for
HTTP<http://ftp.ics.uci.edu/pub/ietf/http/draft-pritchard-http-links-00.txt>I-D
but like most I-Ds this one seems to have died after 6 months, and
with
it the methods themselves. It seems to me that adding HTTP methods at this
time is a drastic (and almost certainly infeasible) action, especially for
something that could just as easily be accomplished via headers ala
Set-Cookie: (too bad the I-D doesn't specify how to add/delete/modify
links!). In the simplest sense a Link: header appearing in a PUT or POST
could replace the existing one(s) but something more elegant for acting on
individual links would be nice - probably a discussion worth having on
the ietf-http-wg
list <http://lists.w3.org/Archives/Public/ietf-http-wg/>.

Organisation of Information

Looking back to Atom for a second we're still missing some key
functionality:

   - Atom id -> HTTP URL
   - Atom updated -> HTTP Last-Modified: Header
   - Atom title and summary -> Atom/HTTP Slug:
Header<http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2>or
equivalent
   - Atom link -> HTTP Link: Header
   - Atom category -> ???

Houston, we have a problem. OCCI use cases range from embedded hypervisors
exposing a single resource to a single entry-point for an entire enterprise
or the "Great Global Grid" - we need a way to organise, categories and
search for the information, likely including:

   - Free text search via a Google-style "?q=firewall" syntax
   - Taxonomy <http://en.wikipedia.org/wiki/Taxonomy> via categories (already
   done <http://www.ibm.com/developerworks/xml/library/x-tipatom4.html> for
   Atom) for things like "Operating System" and "Data Center"
   - Folksonomy <http://en.wikipedia.org/wiki/Folksonomy> via [user]
tags (already
   done <http://edward.oconnor.cx/2007/02/representing-tags-in-atom> for
   Atom and bearing in mind that tag
spaces<http://microformats.org/wiki/rel-tag#Tag_Spaces>are cool) for
things like "testlab"

Fortunately the good work already done in this area for Atom would be
realatively easy to port to a Category: HTTP header, following the
Link:header example above. In the mean time a standard search
interface
(including category support) is trivial and thanks to Google, already
done<http://code.google.com/apis/gdata/docs/2.0/reference.html#Queries>
.

Structured Data Formats

HTML also resolves another pressing issue - what format to use for
submitting key-value pairs (which constitutes a large part of what we need
to do with OCCI). It gives us two options:

   - application/x-www-form-encoded<http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1>which
is simple but quickly gets messy with encoding and non-ASCII
   characters
   - multipart/form-data<http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2>which
is less efficient but will handle pretty much whatever you throw at it
   (including large files)

The advantages of being able to create a resource from a web form simply by
POSTing to the collection of resources (e.g. http://example.com/compute),
and with HTML 5 by PUTting the resource in place directly (e.g.
http://example.com/compute/<uuid>) are immediately obvious. Not only does
this help make the human and programmable web one and the same (which in
turn makes it much easier for developers/users to kick the tyres and
understand the API) but it means that scripting even advanced tasks with
curl/wget would be trivial. Plus there's no place for time-wasting religious
arguments about angle brackets (XML) over curly braces (JSON).

RESTful State Machines

Something else which has not sat well with me until I spent the weekend
ingesting RESTful Web Services
<http://oreilly.com/catalog/9780596529260/>book (by Leonard
Richardson <http://www.crummy.com/> and Sam
Ruby<http://intertwingly.net/blog/>)
was the "actuator" concept we picked up from the Sun Cloud APIs. This breaks
away from RESTful principles by exposing an RPC-style API for triggering
state changes (e.g. start, stop, restart). Granted it's an improvement on
the alternative (GETting a resource and PUTting it back with an updated
state) as Tim Bray explains in RESTful
Casuistry<http://www.tbray.org/ongoing/When/200x/2009/03/20/Rest-Casuistry>(to
which Roy
Fielding <http://roy.gbiv.com/untangled/2009/it-is-okay-to-use-post> and Bill
de hÓra <http://www.dehora.net/journal/2009/02/03/just-use-post/>also
responded), but it still "feels funky". Sure it doesn't make any sense to
try to "force" a monitored
status<http://roy.gbiv.com/untangled/2009/it-is-okay-to-use-post#comment-966>to
some other value (for example setting a "state" attribute to
"running"),
especially when we can't be sure that's the state we'll get to (maybe there
will be an error or the transition will be dependent on some outcome over
which we have no control). Similarly it doesn't make much sense to treat
states as nouns, for example adding a "running" state to a collection of
states (even if a resource can be "running" and "backing up" concurrently).
But is using URLs as "buttons" representing verbs/transitions the best
answer?

What makes more sense [to me] is to request a transition and check back for
updates (e.g. by polling or HTTP server
push<http://en.wikipedia.org/wiki/Push_technology>).
If it's RESTful to POST comments to an article (which in addition to its own
contents acts as a collection of zero or more comments) then POSTing a
request to change state to a [sub]resource also makes sense. As a bonus
these can be parametrised (for example a "resize" request can be accompanied
with a "size" parameter and a "stop" request sent with clarification as to
whether an "ACPI Off" or "Pull Cord" is required). Transitions that take a
while, like "format" on a storage resource, can simply return HTTP 201
Accepted so we've got support for asynchronous actions as well - indeed some
requests (e.g. "backup") may not even be started immediately. We may also
want to consider using something like Post Once Exactly
(POE)<http://www.mnot.net/drafts/draft-nottingham-http-poe-00.txt>to
ensure that requests like "restart" aren't executed repeatedly and
that
we can cancel requests that the system hasn't had a chance to deal with yet.

Exactly how this should look in terms of URL layout I'm not sure (perhaps
http://example.com/<resource>/requests) but being able to enumerate the
possible actions as well as acceptable parameters (e.g. an enum for
variations on "stop" or a range for "resize") would be particularly useful
for clients.

Collections

This is all well and good for individual resources, but collections are
still a serious problem. There are many use cases which involve retrieving
an arbitrarily large number of resources and making a HTTP request for each
(as well as requests for enumeration etc.) doesn't make sense. More
importantly, it doesn't scale - particularly in enterprise environments
where requests via proxies and filters can suffer from high latency (if not
low bandwidth).

One potential solution is to strap multiple HTTP message entities together
as a multipart document, but that's hardly clean and results in some hairy
coding on the client side (e.g. manual manipulation of HTTP messages that
would otherwise be fully automated). The best solution we currently have for
this problem (as evidenced by widespread deployment) is AtomPub so I'm still
fairly sure it's going to have to make an appearance somewhere, even if it
doesn't wrap all of the resources by default.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/occi-wg/attachments/20090525/632ca0be/attachment.html