[occi-wg] Votes: XML vs. JSON vs. TXT

Thu May 7 10:07:59 CDT 2009

Andre Merzky <andre at merzky.net> writes:

> +1 for TXT, as that is most simple, flat, and easy to map to
> anything else.  Well, you can screw up TXT, too, but we
> won't, right? ;-)

I'm standing back from the rest of the discussion as I've already advocated
maximum format simplicity ad nauseam. However, I'll just quickly point out
that the type of text format I have been proposing is a simple

  KEY VALUE
  KEY VALUE

format, which can be read by a trivial shell fragment

  while read K V; do [...]; done

Once it gets more complex than that, the sysadmin-friendliness starts to
evaporate very quickly and you might as well use JSON. (For example, I
remember another provider telling me that they'd offered CSV and nobody had
used it. Frankly I'm not surprised given how much of a pain it is to parse
correctly.)

In our own API, our 'native' format looks like this and it has worked very
well. The code is simple and clean, feedback from users has been very
positive, and people really do write five line shell scripts to drive our
infrastructure in the way they might script their own machines. This was my
original design aim. Meanwhile, we're able to do admin across the cluster
with shell one-liners, which would require separate tools if our API were
more clumsy.

We've structured our keys hierarchically using colons as a separator, so a
text fragment

  ide:0:0 08c92dd5-70a0-4f51-83d2-835919d254df
  nic:0:dhcp 91.203.56.132
  nic:0:model e1000

might be viewed as equivalent to the flat JSON hash

  { "ide:0:0": "08c92dd5-70a0-4f51-83d2-835919d254df",
    "nic:0:dhcp": "91.203.56.132",
    "nic:0:model": "e1000"
  }

but equally as equivalent to the nested JSON hashes

  { "name": "TestServer", "cpu":4000, "mem":2048}

  { "ide": {
       0: {
         0: "08c92dd5-70a0-4f51-83d2-835919d254df"
       }
    },
    "nic": {
      0: {
        "dhcp": "91.203.56.132",
        "model": "e1000"
      }
    }
  }

The use of whitespace to separate keys from values and newlines to separate
records restricts keys (but not values) to contain no spaces, and both keys
and values to have no leading whitespace or embedded newlines. In our
environment, this isn't a problem, but you would need to add shell-friendly
\ escaping to the mix if you wanted to allow demented keys and values in a
whitespace-separated text format.

Of course, if the standard goes with just JSON, it isn't a disaster from our
point of view. We'll probably end up offering a small translation binary
linking against libcurl to access it in the above flattened format. Seems a
shame to force every casual user to compile and use a tool just to make the
API usable from their command-line, though, when this isn't currently the
case with our API.

It'll be hard to sell something like this to end-users as an improvement,
but definitely less hard-to-sell a little JSON-parsing binary (which I could
write standalone) than if the tool also needs you to compile and link
against cumbersome XML and Atom libraries (which I wouldn't even think of
trying to do standalone). I guess my vote is there strongly anti-xml/atom,
vaguely positive about json, and strongly positive about simple plaintext.

Cheers,

Chris.