[spam][crazy][ot] log: #3 write-anywhere to an append-only-log - cypherpunks - lists.cpunks.org

newer
[ot][spam][crazy] log: append-only...

[spam][crazy][ot] log: #3 write-anywhere to an append-only-log

older
Accused of having shown three...

Undiscussed Horrific Abuse, One Victim of Many

14 Jul 2022 14 Jul '22

8:27 p.m.

so i'm thinking the big remaining bit to make an algorithm here that i like is to implement moving parts into balanced/full trees, away from recent writes. maybe i'll just accumulate little bits around that. can use chunk structure: class Chunk: start, end, data, height, leaf_count, age def __len__( def is_leaf( def is_full( new basic concept for add() function: retain areas separate from writes, that are shrunk or grown to maximum depth using full trees only goal: write that part of an add() function

Reply

Sign in to reply online Use email software

Show replies by date

Undiscussed Horrific Abuse, One Victim of Many

14 Jul 14 Jul

8:29 p.m.

Entry structure and Flush structure derive from Chunk. Entry wraps either one, providing a view as if bounds are limited.

Reply

Sign in to reply online Use email software

Undiscussed Horrific Abuse, One Victim of Many

8:30 p.m.

def add(new_write): # hmm ... thinking of: the nearest write to the left and right of new_write # seems we'll want a sorted list of write indices, like in version #1

Reply

Sign in to reply online Use email software

Undiscussed Horrific Abuse, One Victim of Many

8:39 p.m.

# all subtrees are kept as full trees. def add(new_write): # assume there is a sorted list of writes # merge this in. retain what the lengths of non-write data are to the left and right # if the writes trim non-write entries, then: # on the left, walk left-to-right, and on the right, walk right-to-left, covering the trimmed lengths # when between two writes, consider only the length that goes through the largest full tree between them: so as to leave smaller trees near the more distant write, to merge with a future flush, and not have unnecessary depth # there might be a little more to think about and adjust here, regarding producing these subtrees well. not sure. # when between a write and the edge of the data, consider the entire length # expand out full trees from the trimmed lengths, and merge them into the more distant full trees when they are equal or larger to those more distant full subtrees

Reply

Sign in to reply online Use email software

Undiscussed Horrific Abuse, One Victim of Many

8:42 p.m.

there might be some merit to building the index on the fly rather than incrementally. basically, you want the biggest full subtrees between the writes, where these full subtrees are taken from the content of the last flush. the reason to have a full subtree is to limit the depth and sibling node count together. however, interestingly since some leaves end up on internodes, one could construct a subtree with the same depth that has more leaves than a full subtree would. in practice, the full subtree approach limits the sibling node count, because the extra subtrees that are in excess of the full count can be merged together with future writes, to make another equally-sized full subtree. :S what does that mean in a depth-focused approach

Reply

Sign in to reply online Use email software

Undiscussed Horrific Abuse, One Victim of Many

8:44 p.m.

so basically an important property is not whether the tree is full or not, it is rather - whether it is refrained from exceeding the max depth - where parts of it are pulled out in order to produce that limit the parts of it that are pulled out should go near future writes, so as to combine with them and make a balanced tree depth > fullness

Reply

Sign in to reply online Use email software

Undiscussed Horrific Abuse, One Victim of Many

8:45 p.m.

i guess it would be good to retain is_full() in parallel with height <= max_depth since the concept is partial . these are just metrics that can be applied to portions of the graph, in order to judge what to collect where. the metrics are basically the same.

Reply

Sign in to reply online Use email software

Undiscussed Horrific Abuse, One Victim of Many

16 Jul 16 Jul

12:47 p.m.

current plan is to consider flushes as lists of regions of subtrees, terminated by either new writes or end-of-stream. then concepts around how to best consolidate those can develop around member functions of the lists.

Reply

Sign in to reply online Use email software

Undiscussed Horrific Abuse, One Victim of Many

12:49 p.m.

Flush: - an updatable tree containing writes and indexes, designed to be useful for syncing with a storage medium. contains a list of subindexes SubIndex - a list of bounded regions (flush entries) of previous flushes, each held as a useful tree. contains links to its left and write to new writes contained in the flush.

Reply

Sign in to reply online Use email software

Undiscussed Horrific Abuse, One Victim of Many

1:01 p.m.

class FlushRegion: def __init__(self, *entries, left_write = None, right_write = None): for left, right in zip(entries[:-1], entries[1:]): assert left.end <= right.start if left_write is not None and len(entries): assert left_write.end <= entries[0].start if right_write is not None: assert left_write.end <= right_write.start if right_write is not None and len(entries): assert right_write.start >= entries[-1].end self.entries = entries self.left_write = left_write self.right_write = right_write # i'm thinking of also renaming Flush.Entry to ChunkRegion to help improve habits around generalisation i think it makes sense for me to go back to thinking_of_2, and change the implementation a little to use the above classes. this helps me build ease around integrating the disparate work.

Reply

Sign in to reply online Use email software

1129

Age (days ago)

1131

Last active (days ago)

Download

9 comments

1 participants

tags

participants (1)

Undiscussed Horrific Abuse, One Victim of Many