[ot][spam][crazy] in place sort of binary strings ie lines in a file

Undescribed Horrific Abuse, One Victim & Survivor of Many gmkarl at gmail.com
Thu Jan 19 19:14:20 PST 2023


put it together:

offsets = list of all linebreaks

then we form list of all byte ranges from the linebreaks

then sort list of byte ranges.

now make list of output byte ranges.
this can be done by starting at the start and summing the lengths.

then cut the lists so that there is no overlap at the edges. this
means looking in one of the inout/output lists, for the offsets of the
other while walking through it. that can be done with a binary search,
which is the bisect module in python. the byte ranges turn into
sequences.

then we can swap the sequences one at a time to reorder the data in place.
 it makes
sense to only swap the input list, which then represents how the data
currently is, and slowly turns into a misordered output list.

—> a problem remains: how to look up offsets to swap out written
areas, when the regions are fragmented,p. hence, the working list is
not ordered by lines, but rather is an ordered list of regions by
offset, and another list gives the regions for each line.

i think you do it with an output list and a working list of ordered regions.


More information about the cypherpunks mailing list