[spam][crazy][log] idea: relearning to write code

Tue Aug 16 08:02:28 PDT 2022

1002
suddenly sent that. different kind of inhibition.

1004 pasting stuff during dyskinesia ;p

(Pdb) p index
{'capture': {'ditem': ['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw',
'-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs',
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'],
'api_block': 997052}

curl -L https://arweave.net/lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y
| python3 -m json.tool

 72                                         import pdb; pdb.set_trace()
 73  ->                                 self.channels.add(channel_name)
 74                                     length_sum = 0

        561152,
        4096
    ],
    [
        -1,
        {
            "capture": {
                "ditem": [
                    "495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw",
                    "-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs",
                    "aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU"
                ],
                "length": 20480

1007
1008
(Pdb) p stream_output_offset, expected_stream_output_offset
(0, 1110016)
1009
after fixing assertion mistake, not finding offset error
i'm realising that the subindices are actualyl as wide as the whole
stream. i think i was manually calculating it wrongly.
1011
1012
(Pdb)
> /home/ubuntu/src/log/download.py(70)iterate()
-> if type(channel_data) is dict and 'ditem' in channel_data:
(Pdb)
        {
            "capture": {
                "ditem": [
                    "495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw",
                    "-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs",
                    "aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU"
                ],
                "length": 20480
            },

it appears to pass on from that breakpoint correctly. it then pops
bakc up to the root node, and likely proceeds with the third child.
1013 .

when it pops, it is at an unexpected offset ... possibly because i
made the same error in calculating it.

this might actually be a bug in the tree, unsure
(Pdb) n
AssertionError
> /home/ubuntu/src/log/download.py(95)iterate()
-> assert stream_output_offset == expected_stream_output_offset
(Pdb) p stream_output_offset, expected_stream_output_offset
(585728, 593920)

1015

reasonable to diagnose. just 1 down from the root. 2nd child in.
length possibly mismatching.

width of child 1 = 499712
width of child 2 = 94208

(Pdb) p stream_output_offset, expected_stream_output_offset
(585728, 593920)
(Pdb) 499712 + 94208
593920

it's like a bug with the downloader. the bounds specify to extract
exactly 94208 bytes.
1017

this is _hard_ but good practice! i'm planning to leave the system at
11:00 and try to do daily routine stuff.

1019

turns out it's a bug in the uploader. the data in the second child is
only 585728 bytes long.

1019.

1021
this could be helped by an assertion in the uploader. not sure what yet.

        lengths = sum((capture['length'] for capture in
data.get('capture', [])))
        datas = {
            type: dict(
                ditem = [item['id'] for item in items],
                length = sum((item['length'] for item in items))
            )
            for type, items in data.items()
        }
        indices.append(
            prev,
            lengths,
            dict(
                **datas,

i'm not sure how the tree is referencing a child with more data than
the child contains.
maybe i could add an assertion to the tree code.

1023

            running_size = 0
            running_leaf_count = 0
1023
    def _insert(self, last_publish, *ordered_splices):
        # a reasonable next step is to provide for truncation appends,
where a tail of the data is replaced with new data

        # currently only performs 1 append
        assert len(ordered_splices) == 1

        for spliced_out_start, spliced_out_stop, spliced_in_size,
spliced_in_data in ordered_s
1024
            #new_node_leaf_count = self.leaf_count # + 1

            new_leaf_count = self.leaf_count
            new_size = self.size

            for idx, (branch_leaf_count, branch_offset, branch_size,
branch_id) in enumerate(self):
                if branch_leaf_count * self.degree <= new_leaf_count:
#proposed_leaf_count
                    break

            self[idx:] = (
                #(leaf_count_of_partial_index_at_end_tmp,
running_size, spliced_out_start - running_size, last_publish),
                (new_leaf_count, running_size, new_size, last_publish),
                (-1, 0, spliced_in_size, spliced_in_data)
            )

maybe here at self[idx:] is where an assert would go
how was the root updated, to include a partial index?
new_size must have been wrong?

1025
            assert self.size == sum((size for leaf_count, offset,
size, value in self))
this happens at the end of every mutation.
it addresses the root only, not its children.

            self[idx:] = (
                #(leaf_count_of_partial_index_at_end_tmp,
running_size, spliced_out_start - running_size, last_publish),
                (new_leaf_count, running_size, new_size, last_publish),
                (-1, 0, spliced_in_size, spliced_in_data)
            )

adding this:
            assert new_size == sum((size for leaf_count, offset, size,
value in self[idx:]))

1028

I guess I'll try to make code to recreate the try while downloading
it, so as to test the creation of this tree from its data.

old:
(Pdb) p 585728-499712
86016
(Pdb) p 561152 + 4096 + 20480
585728

newer:
from flat_tree import flat_tree

1030
    index.append(id, len(chunk), chunk)
1031
                        comparison.append(comparison.size, index_subsize, index)
1032
                        comparison.append(comparison.leaf_count,
index_subsize, index)

(Pdb) p comparison.leaf_count
35

1034

(Pdb) p comparison.snap()
[(27, 27, 0, 499712), (3, 30, 499712, 28672), (3, 33, 528384, 32768),
(1, 34, 561152, 4096), (-1, {'capture': {'ditem':
['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw',
'-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs',
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'],
'api_block': 997052}, 0, 20480)]

[
  (27, 27, 0, 499712),
  (3, 30, 499712, 28672),
  (3, 33, 528384, 32768),
  (1, 34, 561152, 4096),
  (-1, {'capture': {'ditem':
['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw',
'-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs',
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'],
'api_block': 997052}, 0, 20480)
]

the root is different because it hasn't added the later data yet :/

OK. what i can remember is that every state of the tree was already
uploaded. it's retained and referenced. also, the flat_tree class is
easy to make import old data. noted also it would be more interesting
to compare if it used the whole trees as the references.
1036
(Pdb) p comparison.snap()
[(27, 27, 0, 499712), (3, 30, 499712, 28672), (3, 33, 528384, 32768),
(1, 34, 561152, 4096), (-1, {'capture': {'ditem':
['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw',
'-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs',
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'],
'api_block': 997052}, 0, 20480)]

1037
so at what point did the length issue develop, if it is there?

1039
i went back as far as _AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc so far.
it contains the 94208 length reference, and then 20480 tacked on the
end embedded.

1041
the only index prior to that is the one that is only 565248 bytes long
so i guess i would want to reproduce that 565248 one, and tack the
extra 20480 onto it, and see what kind of index it makes. it seems to
me it is an error to make the one with the 94208 length. then i can
make an assert for it and/or fix it or whatnot.
1042

lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y is 565248 bytes long

_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc is on top of it, and
references it as if it is 593920

i'm worried the most likely situation here is that some data happened
between them and was dropped. but i could try this.

maybe i'll go to the block explorer and see the sequence of transactions.

1045
the txs are ordered alphabetically by the block explorer. they are
bundled into a larger transaction with id
lUx1VzFzykYepB44NfrD_GqJLZj4fD9vb6rd0IxBWH4 . i'll use my code to see
their order within it.

>>> import ar
>>> peer = ar.Peer()
>>> stream = peer.stream('lUx1VzFzykYepB44NfrD_GqJLZj4fD9vb6rd0IxBWH4')
>>> header = ar.ANS104BundleHeader.fromstream(stream)

1047
>>> header.length_by_id.keys()                                                                dict_keys(['GEZeoe9DMmxtVi4Jqx-q-g9yIMYOw7vWb2fF9GjkVkQ', '-yL3L6w9ysIWrcg8ZSXwV_DxdBOr4PjEJWjnxOYqIU0', 'KLjPJ3JGVxHhtSLzFK8-dlU_pTncyu-C6B3s0F5yBuc', 'zDRSNDKjL04CPFzhzxgmT3ODebBfTbI2RMH

these aren't alphabetical, so they might be ordered .
they're big.

1050
$ sudo swapon ~/extraswap
just in case

>>> bundle = Bundle.fromstream(stream)

i'm guessing it's paused loading it over the network?

https://viewblock.io/arweave/tx/lUx1VzFzykYepB44NfrD_GqJLZj4fD9vb6rd0IxBWH4

Size
47.78 MB

not sure what is taking so long.

$ sudo apt-get install jnettop

1051

1052

jnettop shows minimal transfer, with no reverse lookups that i
identify as associated with arweave.

^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 540, in fromstream
    header = ANS104BundleHeader.fromstream(stream)
  File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 87, in fromstream
    return cls({
  File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 87, in <dictcomp>
    return cls({
  File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 83, in <genexpr>
    (int.from_bytes(stream.read(32), 'little'), b64enc(stream.read(32)))
  File "/home/ubuntu/src/pyarweave/ar/utils/serialization.py", line 7, in b64enc
    return base64url_encode(data).decode()
  File "/home/ubuntu/.local/lib/python3.9/site-packages/jose/utils.py",
line 88, in base64url_encode
    return base64.urlsafe_b64encode(input).replace(b"=", b"")
  File "/usr/lib/python3.9/base64.py", line 111, in urlsafe_b64encode
    def urlsafe_b64encode(s):
KeyboardInterrupt

it looks like it was actually processing them.
maybe i can do it manually and put it in tqdm.

1053

looks like there's some bug in Bundle.fromstream, which I will ignore
for the moment.

>>> dataitems = [ar.DataItem.fromstream(stream, length=length) for length in tqdm.tqdm(header.length_by_id.values())]
100%|███████████████████████████████████████████████████████| 679/679
[00:14<00:00, 47.57it/s]

1058
>>> idx_by_id = {dataitem.header.id: idx for idx, dataitem in enumerate(dataitems)}
>>> idx_by_id['lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y']
413
>>> idx_by_id['_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc']
632

>>> my_ditems = [dataitem for dataitem in dataitems if dataitem.header.owner == dataitems[413].header.owner]
>>> len(my_ditems)
246
i have 1/3rd of the ditems in that tx ;p
1059

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y' is not in list
>>> my_ditems.index('_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: '_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc' is not in list

i did something wrong.

stepping away 1101 .

I'm hunting down an incorrect length in my last published test. the
second root child is referenced as longer than it is. i was taking
some time to look to see if any intermediate roots were dropped.