[spam][crazy][log] idea: relearning to write code
Undiscussed Groomed for Male Slavery, One Victim of Many
gmkarl+brainwashingandfuckingupthehackerslaves at gmail.com
Tue Aug 16 08:02:28 PDT 2022
1002
suddenly sent that. different kind of inhibition.
1004 pasting stuff during dyskinesia ;p
(Pdb) p index
{'capture': {'ditem': ['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw',
'-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs',
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'],
'api_block': 997052}
curl -L https://arweave.net/lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y
| python3 -m json.tool
72 import pdb; pdb.set_trace()
73 -> self.channels.add(channel_name)
74 length_sum = 0
561152,
4096
],
[
-1,
{
"capture": {
"ditem": [
"495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw",
"-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs",
"aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU"
],
"length": 20480
1007
1008
(Pdb) p stream_output_offset, expected_stream_output_offset
(0, 1110016)
1009
after fixing assertion mistake, not finding offset error
i'm realising that the subindices are actualyl as wide as the whole
stream. i think i was manually calculating it wrongly.
1011
1012
(Pdb)
> /home/ubuntu/src/log/download.py(70)iterate()
-> if type(channel_data) is dict and 'ditem' in channel_data:
(Pdb)
{
"capture": {
"ditem": [
"495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw",
"-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs",
"aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU"
],
"length": 20480
},
it appears to pass on from that breakpoint correctly. it then pops
bakc up to the root node, and likely proceeds with the third child.
1013 .
when it pops, it is at an unexpected offset ... possibly because i
made the same error in calculating it.
this might actually be a bug in the tree, unsure
(Pdb) n
AssertionError
> /home/ubuntu/src/log/download.py(95)iterate()
-> assert stream_output_offset == expected_stream_output_offset
(Pdb) p stream_output_offset, expected_stream_output_offset
(585728, 593920)
1015
reasonable to diagnose. just 1 down from the root. 2nd child in.
length possibly mismatching.
width of child 1 = 499712
width of child 2 = 94208
(Pdb) p stream_output_offset, expected_stream_output_offset
(585728, 593920)
(Pdb) 499712 + 94208
593920
it's like a bug with the downloader. the bounds specify to extract
exactly 94208 bytes.
1017
this is _hard_ but good practice! i'm planning to leave the system at
11:00 and try to do daily routine stuff.
1019
turns out it's a bug in the uploader. the data in the second child is
only 585728 bytes long.
1019.
1021
this could be helped by an assertion in the uploader. not sure what yet.
lengths = sum((capture['length'] for capture in
data.get('capture', [])))
datas = {
type: dict(
ditem = [item['id'] for item in items],
length = sum((item['length'] for item in items))
)
for type, items in data.items()
}
indices.append(
prev,
lengths,
dict(
**datas,
i'm not sure how the tree is referencing a child with more data than
the child contains.
maybe i could add an assertion to the tree code.
1023
running_size = 0
running_leaf_count = 0
1023
def _insert(self, last_publish, *ordered_splices):
# a reasonable next step is to provide for truncation appends,
where a tail of the data is replaced with new data
# currently only performs 1 append
assert len(ordered_splices) == 1
for spliced_out_start, spliced_out_stop, spliced_in_size,
spliced_in_data in ordered_s
1024
#new_node_leaf_count = self.leaf_count # + 1
new_leaf_count = self.leaf_count
new_size = self.size
for idx, (branch_leaf_count, branch_offset, branch_size,
branch_id) in enumerate(self):
if branch_leaf_count * self.degree <= new_leaf_count:
#proposed_leaf_count
break
self[idx:] = (
#(leaf_count_of_partial_index_at_end_tmp,
running_size, spliced_out_start - running_size, last_publish),
(new_leaf_count, running_size, new_size, last_publish),
(-1, 0, spliced_in_size, spliced_in_data)
)
maybe here at self[idx:] is where an assert would go
how was the root updated, to include a partial index?
new_size must have been wrong?
1025
assert self.size == sum((size for leaf_count, offset,
size, value in self))
this happens at the end of every mutation.
it addresses the root only, not its children.
self[idx:] = (
#(leaf_count_of_partial_index_at_end_tmp,
running_size, spliced_out_start - running_size, last_publish),
(new_leaf_count, running_size, new_size, last_publish),
(-1, 0, spliced_in_size, spliced_in_data)
)
adding this:
assert new_size == sum((size for leaf_count, offset, size,
value in self[idx:]))
1028
I guess I'll try to make code to recreate the try while downloading
it, so as to test the creation of this tree from its data.
old:
(Pdb) p 585728-499712
86016
(Pdb) p 561152 + 4096 + 20480
585728
newer:
from flat_tree import flat_tree
1030
index.append(id, len(chunk), chunk)
1031
comparison.append(comparison.size, index_subsize, index)
1032
comparison.append(comparison.leaf_count,
index_subsize, index)
(Pdb) p comparison.leaf_count
35
1034
(Pdb) p comparison.snap()
[(27, 27, 0, 499712), (3, 30, 499712, 28672), (3, 33, 528384, 32768),
(1, 34, 561152, 4096), (-1, {'capture': {'ditem':
['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw',
'-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs',
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'],
'api_block': 997052}, 0, 20480)]
[
(27, 27, 0, 499712),
(3, 30, 499712, 28672),
(3, 33, 528384, 32768),
(1, 34, 561152, 4096),
(-1, {'capture': {'ditem':
['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw',
'-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs',
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'],
'api_block': 997052}, 0, 20480)
]
the root is different because it hasn't added the later data yet :/
OK. what i can remember is that every state of the tree was already
uploaded. it's retained and referenced. also, the flat_tree class is
easy to make import old data. noted also it would be more interesting
to compare if it used the whole trees as the references.
1036
(Pdb) p comparison.snap()
[(27, 27, 0, 499712), (3, 30, 499712, 28672), (3, 33, 528384, 32768),
(1, 34, 561152, 4096), (-1, {'capture': {'ditem':
['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw',
'-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs',
'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480},
'min_block': [996652,
'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'],
'api_block': 997052}, 0, 20480)]
1037
so at what point did the length issue develop, if it is there?
1039
i went back as far as _AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc so far.
it contains the 94208 length reference, and then 20480 tacked on the
end embedded.
1041
the only index prior to that is the one that is only 565248 bytes long
so i guess i would want to reproduce that 565248 one, and tack the
extra 20480 onto it, and see what kind of index it makes. it seems to
me it is an error to make the one with the 94208 length. then i can
make an assert for it and/or fix it or whatnot.
1042
lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y is 565248 bytes long
_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc is on top of it, and
references it as if it is 593920
i'm worried the most likely situation here is that some data happened
between them and was dropped. but i could try this.
maybe i'll go to the block explorer and see the sequence of transactions.
1045
the txs are ordered alphabetically by the block explorer. they are
bundled into a larger transaction with id
lUx1VzFzykYepB44NfrD_GqJLZj4fD9vb6rd0IxBWH4 . i'll use my code to see
their order within it.
>>> import ar
>>> peer = ar.Peer()
>>> stream = peer.stream('lUx1VzFzykYepB44NfrD_GqJLZj4fD9vb6rd0IxBWH4')
>>> header = ar.ANS104BundleHeader.fromstream(stream)
1047
>>> header.length_by_id.keys() dict_keys(['GEZeoe9DMmxtVi4Jqx-q-g9yIMYOw7vWb2fF9GjkVkQ', '-yL3L6w9ysIWrcg8ZSXwV_DxdBOr4PjEJWjnxOYqIU0', 'KLjPJ3JGVxHhtSLzFK8-dlU_pTncyu-C6B3s0F5yBuc', 'zDRSNDKjL04CPFzhzxgmT3ODebBfTbI2RMH
these aren't alphabetical, so they might be ordered .
they're big.
1050
$ sudo swapon ~/extraswap
just in case
>>> bundle = Bundle.fromstream(stream)
i'm guessing it's paused loading it over the network?
https://viewblock.io/arweave/tx/lUx1VzFzykYepB44NfrD_GqJLZj4fD9vb6rd0IxBWH4
Size
47.78 MB
not sure what is taking so long.
$ sudo apt-get install jnettop
1051
1052
jnettop shows minimal transfer, with no reverse lookups that i
identify as associated with arweave.
^CTraceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 540, in fromstream
header = ANS104BundleHeader.fromstream(stream)
File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 87, in fromstream
return cls({
File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 87, in <dictcomp>
return cls({
File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 83, in <genexpr>
(int.from_bytes(stream.read(32), 'little'), b64enc(stream.read(32)))
File "/home/ubuntu/src/pyarweave/ar/utils/serialization.py", line 7, in b64enc
return base64url_encode(data).decode()
File "/home/ubuntu/.local/lib/python3.9/site-packages/jose/utils.py",
line 88, in base64url_encode
return base64.urlsafe_b64encode(input).replace(b"=", b"")
File "/usr/lib/python3.9/base64.py", line 111, in urlsafe_b64encode
def urlsafe_b64encode(s):
KeyboardInterrupt
it looks like it was actually processing them.
maybe i can do it manually and put it in tqdm.
1053
looks like there's some bug in Bundle.fromstream, which I will ignore
for the moment.
>>> dataitems = [ar.DataItem.fromstream(stream, length=length) for length in tqdm.tqdm(header.length_by_id.values())]
100%|███████████████████████████████████████████████████████| 679/679
[00:14<00:00, 47.57it/s]
1058
>>> idx_by_id = {dataitem.header.id: idx for idx, dataitem in enumerate(dataitems)}
>>> idx_by_id['lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y']
413
>>> idx_by_id['_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc']
632
>>> my_ditems = [dataitem for dataitem in dataitems if dataitem.header.owner == dataitems[413].header.owner]
>>> len(my_ditems)
246
i have 1/3rd of the ditems in that tx ;p
1059
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y' is not in list
>>> my_ditems.index('_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: '_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc' is not in list
i did something wrong.
stepping away 1101 .
I'm hunting down an incorrect length in my last published test. the
second root child is referenced as longer than it is. i was taking
some time to look to see if any intermediate roots were dropped.
More information about the cypherpunks
mailing list