[spam][crazy][log] idea: relearning to write code

Undiscussed Groomed for Male Slavery, One Victim of Many gmkarl+brainwashingandfuckingupthehackerslaves at gmail.com
Tue Aug 16 08:02:28 PDT 2022

suddenly sent that. different kind of inhibition.

1004 pasting stuff during dyskinesia ;p

(Pdb) p index
{'capture': {'ditem': ['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw',
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'api_block': 997052}

curl -L https://arweave.net/lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y
| python3 -m json.tool

 72                                         import pdb; pdb.set_trace()
 73  ->                                 self.channels.add(channel_name)
 74                                     length_sum = 0

            "capture": {
                "ditem": [
                "length": 20480

(Pdb) p stream_output_offset, expected_stream_output_offset
(0, 1110016)
after fixing assertion mistake, not finding offset error
i'm realising that the subindices are actualyl as wide as the whole
stream. i think i was manually calculating it wrongly.
> /home/ubuntu/src/log/download.py(70)iterate()
-> if type(channel_data) is dict and 'ditem' in channel_data:
            "capture": {
                "ditem": [
                "length": 20480

it appears to pass on from that breakpoint correctly. it then pops
bakc up to the root node, and likely proceeds with the third child.
1013 .

when it pops, it is at an unexpected offset ... possibly because i
made the same error in calculating it.

this might actually be a bug in the tree, unsure
(Pdb) n
> /home/ubuntu/src/log/download.py(95)iterate()
-> assert stream_output_offset == expected_stream_output_offset
(Pdb) p stream_output_offset, expected_stream_output_offset
(585728, 593920)


reasonable to diagnose. just 1 down from the root. 2nd child in.
length possibly mismatching.

width of child 1 = 499712
width of child 2 = 94208

(Pdb) p stream_output_offset, expected_stream_output_offset
(585728, 593920)
(Pdb) 499712 + 94208

it's like a bug with the downloader. the bounds specify to extract
exactly 94208 bytes.

this is _hard_ but good practice! i'm planning to leave the system at
11:00 and try to do daily routine stuff.


turns out it's a bug in the uploader. the data in the second child is
only 585728 bytes long.


this could be helped by an assertion in the uploader. not sure what yet.

        lengths = sum((capture['length'] for capture in
data.get('capture', [])))
        datas = {
            type: dict(
                ditem = [item['id'] for item in items],
                length = sum((item['length'] for item in items))
            for type, items in data.items()

i'm not sure how the tree is referencing a child with more data than
the child contains.
maybe i could add an assertion to the tree code.


            running_size = 0
            running_leaf_count = 0
    def _insert(self, last_publish, *ordered_splices):
        # a reasonable next step is to provide for truncation appends,
where a tail of the data is replaced with new data

        # currently only performs 1 append
        assert len(ordered_splices) == 1

        for spliced_out_start, spliced_out_stop, spliced_in_size,
spliced_in_data in ordered_s
            #new_node_leaf_count = self.leaf_count # + 1

            new_leaf_count = self.leaf_count
            new_size = self.size

            for idx, (branch_leaf_count, branch_offset, branch_size,
branch_id) in enumerate(self):
                if branch_leaf_count * self.degree <= new_leaf_count:

            self[idx:] = (
running_size, spliced_out_start - running_size, last_publish),
                (new_leaf_count, running_size, new_size, last_publish),
                (-1, 0, spliced_in_size, spliced_in_data)

maybe here at self[idx:] is where an assert would go
how was the root updated, to include a partial index?
new_size must have been wrong?

            assert self.size == sum((size for leaf_count, offset,
size, value in self))
this happens at the end of every mutation.
it addresses the root only, not its children.

            self[idx:] = (
running_size, spliced_out_start - running_size, last_publish),
                (new_leaf_count, running_size, new_size, last_publish),
                (-1, 0, spliced_in_size, spliced_in_data)

adding this:
            assert new_size == sum((size for leaf_count, offset, size,
value in self[idx:]))


I guess I'll try to make code to recreate the try while downloading
it, so as to test the creation of this tree from its data.

(Pdb) p 585728-499712
(Pdb) p 561152 + 4096 + 20480

from flat_tree import flat_tree

    index.append(id, len(chunk), chunk)
                        comparison.append(comparison.size, index_subsize, index)
index_subsize, index)

(Pdb) p comparison.leaf_count


(Pdb) p comparison.snap()
[(27, 27, 0, 499712), (3, 30, 499712, 28672), (3, 33, 528384, 32768),
(1, 34, 561152, 4096), (-1, {'capture': {'ditem':
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'api_block': 997052}, 0, 20480)]

  (27, 27, 0, 499712),
  (3, 30, 499712, 28672),
  (3, 33, 528384, 32768),
  (1, 34, 561152, 4096),
  (-1, {'capture': {'ditem':
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'api_block': 997052}, 0, 20480)

the root is different because it hasn't added the later data yet :/

OK. what i can remember is that every state of the tree was already
uploaded. it's retained and referenced. also, the flat_tree class is
easy to make import old data. noted also it would be more interesting
to compare if it used the whole trees as the references.
(Pdb) p comparison.snap()
[(27, 27, 0, 499712), (3, 30, 499712, 28672), (3, 33, 528384, 32768),
(1, 34, 561152, 4096), (-1, {'capture': {'ditem':
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'api_block': 997052}, 0, 20480)]

so at what point did the length issue develop, if it is there?

i went back as far as _AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc so far.
it contains the 94208 length reference, and then 20480 tacked on the
end embedded.

the only index prior to that is the one that is only 565248 bytes long
so i guess i would want to reproduce that 565248 one, and tack the
extra 20480 onto it, and see what kind of index it makes. it seems to
me it is an error to make the one with the 94208 length. then i can
make an assert for it and/or fix it or whatnot.

lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y is 565248 bytes long

_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc is on top of it, and
references it as if it is 593920

i'm worried the most likely situation here is that some data happened
between them and was dropped. but i could try this.

maybe i'll go to the block explorer and see the sequence of transactions.

the txs are ordered alphabetically by the block explorer. they are
bundled into a larger transaction with id
lUx1VzFzykYepB44NfrD_GqJLZj4fD9vb6rd0IxBWH4 . i'll use my code to see
their order within it.

>>> import ar
>>> peer = ar.Peer()
>>> stream = peer.stream('lUx1VzFzykYepB44NfrD_GqJLZj4fD9vb6rd0IxBWH4')
>>> header = ar.ANS104BundleHeader.fromstream(stream)

>>> header.length_by_id.keys()                                                                dict_keys(['GEZeoe9DMmxtVi4Jqx-q-g9yIMYOw7vWb2fF9GjkVkQ', '-yL3L6w9ysIWrcg8ZSXwV_DxdBOr4PjEJWjnxOYqIU0', 'KLjPJ3JGVxHhtSLzFK8-dlU_pTncyu-C6B3s0F5yBuc', 'zDRSNDKjL04CPFzhzxgmT3ODebBfTbI2RMH

these aren't alphabetical, so they might be ordered .
they're big.

$ sudo swapon ~/extraswap
just in case

>>> bundle = Bundle.fromstream(stream)

i'm guessing it's paused loading it over the network?


47.78 MB

not sure what is taking so long.

$ sudo apt-get install jnettop



jnettop shows minimal transfer, with no reverse lookups that i
identify as associated with arweave.

^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 540, in fromstream
    header = ANS104BundleHeader.fromstream(stream)
  File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 87, in fromstream
    return cls({
  File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 87, in <dictcomp>
    return cls({
  File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 83, in <genexpr>
    (int.from_bytes(stream.read(32), 'little'), b64enc(stream.read(32)))
  File "/home/ubuntu/src/pyarweave/ar/utils/serialization.py", line 7, in b64enc
    return base64url_encode(data).decode()
  File "/home/ubuntu/.local/lib/python3.9/site-packages/jose/utils.py",
line 88, in base64url_encode
    return base64.urlsafe_b64encode(input).replace(b"=", b"")
  File "/usr/lib/python3.9/base64.py", line 111, in urlsafe_b64encode
    def urlsafe_b64encode(s):

it looks like it was actually processing them.
maybe i can do it manually and put it in tqdm.


looks like there's some bug in Bundle.fromstream, which I will ignore
for the moment.

>>> dataitems = [ar.DataItem.fromstream(stream, length=length) for length in tqdm.tqdm(header.length_by_id.values())]
100%|███████████████████████████████████████████████████████| 679/679
[00:14<00:00, 47.57it/s]

>>> idx_by_id = {dataitem.header.id: idx for idx, dataitem in enumerate(dataitems)}
>>> idx_by_id['lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y']
>>> idx_by_id['_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc']

>>> my_ditems = [dataitem for dataitem in dataitems if dataitem.header.owner == dataitems[413].header.owner]
>>> len(my_ditems)
i have 1/3rd of the ditems in that tx ;p

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y' is not in list
>>> my_ditems.index('_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: '_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc' is not in list

i did something wrong.

stepping away 1101 .

I'm hunting down an incorrect length in my last published test. the
second root child is referenced as longer than it is. i was taking
some time to look to see if any intermediate roots were dropped.

More information about the cypherpunks mailing list