2014-03-24 2:47 GMT+01:00 Peter Gutmann <pgut001@cs.auckland.ac.nz>:
Their prime directive is that financial value can never be created or destroyed, so you can never have a situation in which a failure anywhere will result in one blob of financial value being recorded in two locations, or no locations. Saying that you'll address this by rolling back transactions won't fly both because no standard database can handle the load they work at, and because the financial world isn't going to stop and wait while you perform a rollback.
So how do they do that? If there's power failure on a specific box, what happens? Are all transactions synced to disk before commit, thus minimal rollbacks? A minimal rollback takes a very small margin of what would happen in case of power failure on a box. Maybe they have several boxes advocating a single transaction, so that expectible failures would never crash a system completely. I can imagine mitigating this by redundantly processing everything, in which case sequence must be kept somehow and so I can't imagine it being ridiculously fast. Maybe you mean the throughput is insane, because that'd make more sense given the multiple months of CPU being thrown at it. If you didn't then caching would just slow things down (most of the time). Finance should run better on SSDs, so I imagine this is an old story. Overall a bit confusing, and I'd love some more details! Like, why are they even using disks when fiber and RAM might be faster and similarly reliable?