[ot][spam][crazy] journal: doubly append-only balanced trees
I remember there's an old mainstream project that does this, but I don't remember what it is off hand very well. Anyway it basically just used whatever the clearest approach was. First update is first root. Second update makes two leaves. The new root goes in the second update, with both leaves its children. Third update makes a third leaf and third root. The tree is binary and the third leaf has a sibling that is waiting for the 4th update. There is an internode between the third leaf and the third root that is the parent of the 4th root. With the original system, I believe this internode can be considered implicit, since everything is always a balanced binary tree. Hence, only the data and leaf index need be stored via the storage medium, in the extreme. Compare my approach. Ideally I would make the same simple balanced trees, but maybe with more internode information since leaves are not always at the end. We could set a norm for implicit internode information, if not provided, to optimize append-only cases further. Let's see if I can stabilize this in my mind a little ... in the previous way, a binary tree is built up from one side to the other, like a card tower. For updates that are two powers, a new root is made: like a new row of cards except in a hierarchical binary way. First one node. Then three. Then seven: two threes with a seventh as the root. That third root looks helpful for simple description. When the 4th node is added, it is basically saying there is a new root with 1 subtree and 1 leaf as children: anything with comparable topology or what it. Then the 5th node becomes a sibling to the 4th. What data would be in a 5th update?
A core point here is that the n/2 nodes and leaves are never touched. The subtree stays exactly the same until the max height increases by 1. This could be good to verify. Then, what data goes in updates after the first new leaf for a new root?
Thinking of lookup data. Say the root were included in every tip update. The root could reference the locations where its immediate child internodes are. The important ones would be the final updates that fill a subtree, as these have access to the location of every leaf. Before a subtree is filled, what to include seems a little hazy. The first leaf needs a reference to itself, and the left largest subtree, to pretend to be a new root. On the second subtree leaf, a new subroot is made to hold both of them. so it contains two indexing nodes: one to reference the last biggest subtree, and one to reference its pairing with its neighbor. Then the third subtree leaf actually only needs two again: a root that references the old root and the larger subroot, and then the larger subroot that references its sibling subtree and itself. In a graph, basically an index is needed for what .... Base root. Parenting subroot. It has a log2 in it somewhere i guess. The big question is how to build indexes based on last state, that develop the same minimal number of indexing entries. The idea comes from imagining balanced subtrees that never move and never change after being filled.
Thinking on balanced trees. What kind of update properties are needed to retain the balanced tree behavior, where few index updates are needed for append-to-end behavior? For one thing, there's a deep subtree that is copied up as the same reference repeatedly. If my code is functioning correctly, it hopefully does this. One thing my code does diffe
I had a really great brief ability to think of trees normally, and it seems the thing to do is to ensure that every included entry in an index is a full tree (# leaves = 1 << height) (since new leaves are flushed in the same data as the root, "balanced" trees have unbalanced shape with extra leaves on internodes), and I think treat each flush as if it is building a full tree. This could get complicated for random writes, but isn't for just appending. I got inhibited when working to accumulate children of last non-full tree so as to treat a subregion of it as a full tree. Not complex to implement, just inhibited.
O_ ! k what's up? i have a computer with a partial implementation on it i'm not sure why to select "full" subtrees but i understand it optimizes the case for appending or prepending data, producing balanced trees with few new index nodes.
ok here's what to do, i think: -> find the largest subtrees on both sides of a write -> start from each subtree and walk toward the write, accumulating the largest full subtrees possible this collects data that isn't changed together into a single reusable full tree to be quickly referenced in an index this can likely replace branch merging and such
Thinking about retaing memories around a behavior with some complexity to it, while repeating it excessively with changes to it.
I'm not sure of a good structure for a new flush class, but i recall it is very obvious if the parts are thought about together. I end up writing a new class structure, because so many of the parts move around when changing how the data is organized.
participants (1)
-
Undiscussed Horrific Abuse, One Victim of Many