1002 suddenly sent that. different kind of inhibition. 1004 pasting stuff during dyskinesia ;p (Pdb) p index {'capture': {'ditem': ['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw', '-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs', 'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480}, 'min_block': [996652, 'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'], 'api_block': 997052} curl -L https://arweave.net/lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y | python3 -m json.tool 72 import pdb; pdb.set_trace() 73 -> self.channels.add(channel_name) 74 length_sum = 0 561152, 4096 ], [ -1, { "capture": { "ditem": [ "495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw", "-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs", "aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU" ], "length": 20480 1007 1008 (Pdb) p stream_output_offset, expected_stream_output_offset (0, 1110016) 1009 after fixing assertion mistake, not finding offset error i'm realising that the subindices are actualyl as wide as the whole stream. i think i was manually calculating it wrongly. 1011 1012 (Pdb)
/home/ubuntu/src/log/download.py(70)iterate() -> if type(channel_data) is dict and 'ditem' in channel_data: (Pdb) { "capture": { "ditem": [ "495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw", "-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs", "aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU" ], "length": 20480 },
it appears to pass on from that breakpoint correctly. it then pops bakc up to the root node, and likely proceeds with the third child. 1013 . when it pops, it is at an unexpected offset ... possibly because i made the same error in calculating it. this might actually be a bug in the tree, unsure (Pdb) n AssertionError
/home/ubuntu/src/log/download.py(95)iterate() -> assert stream_output_offset == expected_stream_output_offset (Pdb) p stream_output_offset, expected_stream_output_offset (585728, 593920)
1015 reasonable to diagnose. just 1 down from the root. 2nd child in. length possibly mismatching. width of child 1 = 499712 width of child 2 = 94208 (Pdb) p stream_output_offset, expected_stream_output_offset (585728, 593920) (Pdb) 499712 + 94208 593920 it's like a bug with the downloader. the bounds specify to extract exactly 94208 bytes. 1017 this is _hard_ but good practice! i'm planning to leave the system at 11:00 and try to do daily routine stuff. 1019 turns out it's a bug in the uploader. the data in the second child is only 585728 bytes long. 1019. 1021 this could be helped by an assertion in the uploader. not sure what yet. lengths = sum((capture['length'] for capture in data.get('capture', []))) datas = { type: dict( ditem = [item['id'] for item in items], length = sum((item['length'] for item in items)) ) for type, items in data.items() } indices.append( prev, lengths, dict( **datas, i'm not sure how the tree is referencing a child with more data than the child contains. maybe i could add an assertion to the tree code. 1023 running_size = 0 running_leaf_count = 0 1023 def _insert(self, last_publish, *ordered_splices): # a reasonable next step is to provide for truncation appends, where a tail of the data is replaced with new data # currently only performs 1 append assert len(ordered_splices) == 1 for spliced_out_start, spliced_out_stop, spliced_in_size, spliced_in_data in ordered_s 1024 #new_node_leaf_count = self.leaf_count # + 1 new_leaf_count = self.leaf_count new_size = self.size for idx, (branch_leaf_count, branch_offset, branch_size, branch_id) in enumerate(self): if branch_leaf_count * self.degree <= new_leaf_count: #proposed_leaf_count break self[idx:] = ( #(leaf_count_of_partial_index_at_end_tmp, running_size, spliced_out_start - running_size, last_publish), (new_leaf_count, running_size, new_size, last_publish), (-1, 0, spliced_in_size, spliced_in_data) ) maybe here at self[idx:] is where an assert would go how was the root updated, to include a partial index? new_size must have been wrong? 1025 assert self.size == sum((size for leaf_count, offset, size, value in self)) this happens at the end of every mutation. it addresses the root only, not its children. self[idx:] = ( #(leaf_count_of_partial_index_at_end_tmp, running_size, spliced_out_start - running_size, last_publish), (new_leaf_count, running_size, new_size, last_publish), (-1, 0, spliced_in_size, spliced_in_data) ) adding this: assert new_size == sum((size for leaf_count, offset, size, value in self[idx:])) 1028 I guess I'll try to make code to recreate the try while downloading it, so as to test the creation of this tree from its data. old: (Pdb) p 585728-499712 86016 (Pdb) p 561152 + 4096 + 20480 585728 newer: from flat_tree import flat_tree 1030 index.append(id, len(chunk), chunk) 1031 comparison.append(comparison.size, index_subsize, index) 1032 comparison.append(comparison.leaf_count, index_subsize, index) (Pdb) p comparison.leaf_count 35 1034 (Pdb) p comparison.snap() [(27, 27, 0, 499712), (3, 30, 499712, 28672), (3, 33, 528384, 32768), (1, 34, 561152, 4096), (-1, {'capture': {'ditem': ['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw', '-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs', 'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480}, 'min_block': [996652, 'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'], 'api_block': 997052}, 0, 20480)] [ (27, 27, 0, 499712), (3, 30, 499712, 28672), (3, 33, 528384, 32768), (1, 34, 561152, 4096), (-1, {'capture': {'ditem': ['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw', '-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs', 'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480}, 'min_block': [996652, 'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'], 'api_block': 997052}, 0, 20480) ] the root is different because it hasn't added the later data yet :/ OK. what i can remember is that every state of the tree was already uploaded. it's retained and referenced. also, the flat_tree class is easy to make import old data. noted also it would be more interesting to compare if it used the whole trees as the references. 1036 (Pdb) p comparison.snap() [(27, 27, 0, 499712), (3, 30, 499712, 28672), (3, 33, 528384, 32768), (1, 34, 561152, 4096), (-1, {'capture': {'ditem': ['495FZqKXSr9cCPObKGVuNHShJA79enrwHDk-xcMOBVw', '-m-6k-usTx0RUbRI9EEDRaiA2vIapMObgKz3S1bB2Vs', 'aA2go7KTnc4ArkqcjDN-pg4A-c97_bypml5C01eS5ZU'], 'length': 20480}, 'min_block': [996652, 'pG4gRSc03l2js77IfpfUvkTx2zRQFE5capCxY7rSjZ5UWT-5NqeV6U0bvlu_uxW0'], 'api_block': 997052}, 0, 20480)] 1037 so at what point did the length issue develop, if it is there? 1039 i went back as far as _AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc so far. it contains the 94208 length reference, and then 20480 tacked on the end embedded. 1041 the only index prior to that is the one that is only 565248 bytes long so i guess i would want to reproduce that 565248 one, and tack the extra 20480 onto it, and see what kind of index it makes. it seems to me it is an error to make the one with the 94208 length. then i can make an assert for it and/or fix it or whatnot. 1042 lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y is 565248 bytes long _AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc is on top of it, and references it as if it is 593920 i'm worried the most likely situation here is that some data happened between them and was dropped. but i could try this. maybe i'll go to the block explorer and see the sequence of transactions. 1045 the txs are ordered alphabetically by the block explorer. they are bundled into a larger transaction with id lUx1VzFzykYepB44NfrD_GqJLZj4fD9vb6rd0IxBWH4 . i'll use my code to see their order within it.
import ar peer = ar.Peer() stream = peer.stream('lUx1VzFzykYepB44NfrD_GqJLZj4fD9vb6rd0IxBWH4') header = ar.ANS104BundleHeader.fromstream(stream)
1047
header.length_by_id.keys() dict_keys(['GEZeoe9DMmxtVi4Jqx-q-g9yIMYOw7vWb2fF9GjkVkQ', '-yL3L6w9ysIWrcg8ZSXwV_DxdBOr4PjEJWjnxOYqIU0', 'KLjPJ3JGVxHhtSLzFK8-dlU_pTncyu-C6B3s0F5yBuc', 'zDRSNDKjL04CPFzhzxgmT3ODebBfTbI2RMH
these aren't alphabetical, so they might be ordered . they're big. 1050 $ sudo swapon ~/extraswap just in case
bundle = Bundle.fromstream(stream)
i'm guessing it's paused loading it over the network? https://viewblock.io/arweave/tx/lUx1VzFzykYepB44NfrD_GqJLZj4fD9vb6rd0IxBWH4 Size 47.78 MB not sure what is taking so long. $ sudo apt-get install jnettop 1051 1052 jnettop shows minimal transfer, with no reverse lookups that i identify as associated with arweave. ^CTraceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 540, in fromstream header = ANS104BundleHeader.fromstream(stream) File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 87, in fromstream return cls({ File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 87, in <dictcomp> return cls({ File "/home/ubuntu/src/pyarweave/ar/bundle.py", line 83, in <genexpr> (int.from_bytes(stream.read(32), 'little'), b64enc(stream.read(32))) File "/home/ubuntu/src/pyarweave/ar/utils/serialization.py", line 7, in b64enc return base64url_encode(data).decode() File "/home/ubuntu/.local/lib/python3.9/site-packages/jose/utils.py", line 88, in base64url_encode return base64.urlsafe_b64encode(input).replace(b"=", b"") File "/usr/lib/python3.9/base64.py", line 111, in urlsafe_b64encode def urlsafe_b64encode(s): KeyboardInterrupt it looks like it was actually processing them. maybe i can do it manually and put it in tqdm. 1053 looks like there's some bug in Bundle.fromstream, which I will ignore for the moment.
dataitems = [ar.DataItem.fromstream(stream, length=length) for length in tqdm.tqdm(header.length_by_id.values())] 100%|███████████████████████████████████████████████████████| 679/679 [00:14<00:00, 47.57it/s]
1058
idx_by_id = {dataitem.header.id: idx for idx, dataitem in enumerate(dataitems)} idx_by_id['lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y'] 413 idx_by_id['_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc'] 632
my_ditems = [dataitem for dataitem in dataitems if dataitem.header.owner == dataitems[413].header.owner] len(my_ditems) 246 i have 1/3rd of the ditems in that tx ;p 1059
Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: 'lZ9z6x0_XFj9xASzqmCE8Dkm8F3p55t0CaNjzw2gQ3Y' is not in list
my_ditems.index('_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: '_AdSfr-AHdtWF20eR9ThV8NEOey7QydTsIbUpRX6GIc' is not in list
i did something wrong. stepping away 1101 . I'm hunting down an incorrect length in my last published test. the second root child is referenced as longer than it is. i was taking some time to look to see if any intermediate roots were dropped.