2014-03-24 2:47 GMT+01:00 Peter Gutmann <pgut001@cs.auckland.ac.nz>:
Their prime directive is that financial value can never be
created or destroyed, so you can never have a situation in which a failure
anywhere will result in one blob of financial value being recorded in two
locations, or no locations.  Saying that you'll address this by rolling back
transactions won't fly both because no standard database can handle the load
they work at, and because the financial world isn't going to stop and wait
while you perform a rollback.

So how do they do that? If there's power failure on a specific box, what happens? Are all transactions synced to disk before commit, thus minimal rollbacks? A minimal rollback takes a very small margin of what would happen in case of power failure on a box. Maybe they have several boxes advocating a single transaction, so that expectible failures would never crash a system completely.

I can imagine mitigating this by redundantly processing everything, in which case sequence must be kept somehow and so I can't imagine it being ridiculously fast. Maybe you mean the throughput is insane, because that'd make more sense given the multiple months of CPU being thrown at it. If you didn't then caching would just slow things down (most of the time).

Finance should run better on SSDs, so I imagine this is an old story.

Overall a bit confusing, and I'd love some more details! Like, why are they even using disks when fiber and RAM might be faster and similarly reliable?