[occi-wg] Voting result

Fri May 8 23:58:06 CDT 2009

On Sat, May 9, 2009 at 1:46 AM, Tim Bray <Tim.Bray at sun.com> wrote:

> On May 8, 2009, at 3:55 PM, Sam Johnston wrote:
>
>   Oh, here's a dirty secret: both XML and JSON are lousy "wrapper" formats.
>>  If you have different kinds of things, you're usually better off letting
>> them stand alone and link back and forth; that hypertext thing.
>>
>> Right so say I want to create a new resource, or move it from one service
>> to another, or manipulate its parameters, or any one of an infinite number
>> of other potential operations - and say this resource is an existing virtual
>> machine in a binary format (e.g. OVA) or an XML representation of a complex
>> storage subsystem or a flat text network device config or... you get the
>> point. It is far simpler for me to associate/embed the resource with its
>> descriptor directly than to have a two phase process of uploading it and
>> then feeding a URL to another call.
>>
>
> By participating in this discussion I'm rapidly developing an obligation to
> go learn the use-cases and become actually well-informed on the specifics.
>  And I'm uncomfortable disagreeing with Sam so much, because the work that's
> being done here seems good.
>

Thanks, your input is very much appreciated.

> But... I just don't buy the argument above.  You can't package binary
> *anything* into XML (or JSON) without base-64-ing it, blecch.  And here's
> the dirty secret: a lot of times you can't even package XML into other XML
> safely, you break unique-id attributes and digital signatures and namespace
> prefixes and embedded XPaths and so on and so on.  The Web architecture
> really wants you to deal with resources as homogeneous blogs, that's why
> media-types are so important.
>

Payload transparency is a nice to have - that is, learning from OCCI that
the thing has 2 CPUs and 2Gb RAM, but then being able to peer into embedded
OVF to determine more advanced parameters. Given the vast majority of the
payloads we're likely to want to use are going to be XML based (e.g. OVF)
this should work reasonably well most of the time, and is in any case not
critical for basic functionality.

I'm not suggesting that someone embed a 40Gb base64 encoded image into the
OCCI stream, but we can't assume that everything is always going to be flat
files (VMX) or XML (OVF). Of course Atom "alternate" link relations
elegantly solve the much of this problem and can even expose situations
where the resource is available in multiple formats (e.g. VMX and OVF). For
more advanced use cases I've proposed a bulk transfer API that essentially
involves creating the resources by some other means and then passing it in
to OCCI as a href (think regularly rsync'd virtual machines for disaster
recovery purposes, drag and drop WebDAV interfaces and other stuff that
implementors will, with any luck, implement).

In any case this approach serves the migration requirement very well - the
idea of being able to faithfully serialise and/or pass an arbitrarily
complex collection of machines between implementations seems like utopia but
it's well within our reach. Being then able to encrypt and/or sign it
natively is just icing on the cake.

There's absolutely nothing to say that OCCI messages have to be
ephemeral<http://www.tbray.org/ongoing/When/200x/2006/12/21/JSON>and
there are many compelling use cases (from backups to virtual
appliances)
where treating resources as documents and collections as feeds make a lot of
sense - and few where it doesn't.

> The wild success of the browser platform suggests that having to retrieve
> more than one resource to get a job one is not a particularly high hurdle,
> operationally.
>

Multiple requests is certainly a problem when you're dealing with a large
number of resources, as is the case when you're wanting to display, report
on or migrate even a small virtual infrastructure/data center. This is
particularly true in enterprise environments where HTML requests tend to
pass through a bunch of diffferent systems which push latency through the
roof - I had to make some fairly drastic optimisations even to a GData
client for Google Apps provisioning recently for exactly this reason and the
thing would have been completely unusable had I have separate requests for
each object.

>
>  As an aside I wonder how many times there were conversations like this
>> previously (where working groups had the blinkers on with the usual "not in
>> the charter" cop out) and how significant a contributor this inability to
>> work together was to the WS-* train wreck...
>>
>
> I'm waiting for someone to write the definitive book on why WS-* imploded.
>  Probably mostly biz and process probs as you suggest, but I suspect not
> enough credit is given to the use at the core of XSD and WSDL, which are
> both profoundly-bad technologies.  But I digress.  Well, it's Friday.
>

I'd very much like to read this book but as an unbiased observer I do
believe the blinkers played a critical role. Short of creating ad-hoc links
between SSOs (as we will with SNIA next wednesday) there aren't really any
good solutions... having one organisation handle the standardisation or even
coordination of same is another recipe for disaster. Certainly choosing one
format over another, especially when all markup ends up looking like
XML<http://www.megginson.com/blogs/quoderat/2007/01/03/all-markup-ends-up-looking-like-xml/>,
is not going to prevent the same from recurring, while playing nice in the
sandpit may well be enough to avoid egregious offenses.

>
>  Remember also that Google are already using this in production for a
>> myriad resources on a truly massive scale (and have already ironed out the
>> bugs in 2.0) - the same cannot be said of the alternatives and dropping by
>> their front door on the way could prove hugely beneficial.
>>
>
> I think you're having trouble convincing people that GData, which is pure
> resource CRUD, is relevant to cloud-infrastructure wrangling.  I'm a huge
> fan of GData.
>

CRUD plays a hugely important role in creating and maintaining virtual
infrastructure (most resources are, after all, very much document like -
e.g. VMX/OVF/VMDK/etc.) - just think about the operations clients typically
need to do and the overwhelming majority of them are, in fact, CRUD. The
main addition is triggering state changes via actuators/controllers (thanks
for the great advice<http://www.tbray.org/ongoing/When/200x/2009/03/16/Sun-Cloud>on
this topic by the way - very elegant) and this is something I believe
we've done a good job of courtesy custom link relations. The main gaps
currently relate to how to handle parametrised operations (e.g. resizing a
storage resource) and how to create arbitrary associations between resources
(e.g. from compute to its storage and network resources and less obvious
ones like a logical volume back to its physical container). Oh and on the
topic of performance, Google use
projections<http://code.google.com/apis/calendar/docs/2.0/reference.html#Projection>to
limit data returned - I was thinking we would return little more than
IDs
and titles by default (think discovery - a very common use case) and
optionally provide a list of extensions (e.g. billing, performance
monitoring, etc.) that we want to hear back from... this is going to be
important for calls that take a long time (like summing up usage or
retrieving the CDATA content) and by default the feed should stream without
blocking.

At the end of the day we can very easily create something (and essentially
already have thanks in no small part to Google's pioneering work in this
area) that can represent anything from a contact or calendar entry to a
virtual machine or network. The advantages in terms of being able to handle
non-obvious but equally important tasks such as managing
users<http://code.google.com/apis/apps/gdata_provisioning_api_v2.0_reference.html#User_Account_URL_Table>are
huge.

JSON does make sense for many applications and I'd very much like to cater
to the needs of JSON users by way of a dedicated Atom to JSON transformation
(something others can contribute to and benefit from), but I don't believe
it's at all the right choice for this application. Its main advantages were
efficiency (much of which is lost thank to remediating security
issues<http://en.wikipedia.org/wiki/JSON#Security_issues>with regular
expressions - this
parser code <http://www.json.org/json2.js> doesn't look any more performant
than a native XML parser) and being able to bypass browser security, both of
which are non-issues for us.

Sam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/occi-wg/attachments/20090509/bfc971d9/attachment.html