I got one done this morning, at end below. The tip, tail, or root ditem is 7nQhX2NCh7FypOcdutGtHjpxgRZBCRO4mGELSHg6UOo and the first is CwVPl5va1OL1pk5tC8KJrMMeONJfXdgLFMBAPgWpXt8 . The format is a merkle tree of ditems with data ditems alongside. I haven't here noted what transactions the ditems are in, nor verified they all made it, but given it's a merkle tree i find it a big accomplishment. The "min_block" heights and hashes show where to start scraping blocks to find them -- these were the tip blocks at time of transmission. They're usually a few behind. A ditem or dataitem is a format for "bundling" potentially unrelated data transactions together into a larger arweave transaction. I'll find the spec and paste it at the very end. It's at https://github.com/ ArweaveTeam/arweave-standards/blob/master/ans/ANS-104.md . On Wednesday, August 30, 2023, mailbombbin <mailbombbin@gmail.com> wrote:
trying to arweave cypherpunks.mbox on this old free phone and servicd. Each attempt below, last one is ongoing streaming. Only gotten about 1℅ up, eta is multiple days, restarts if interrupted :/ fun ! Maybe I can upgrade to other data if I do this. The raw tx hashes are the first tx of the stream and the json are the latest tip.
l2q0jo9erSsW_MiTk2YrxhIoBsqfmi-nfKPqYzLej5A {"ditem": ["CVkfCiCTsObB1sHjudqiDMF9-cX5P6xzWLZvoABrAbc"], "min_block": [1079670, "OwPhY-8h61vNiMNTtIRF39vijB-q8vVNLWNFr5p0TlJl-kmHtHipt5rtfsis0EP-"], "api_timestamp": 1671329083973} EranJaRX9WUoceGqHuGwQRkB8dqlJVdF_MxD0fFg6xI {"ditem": ["Enae4U8a85xGpCqaFGSGVpqqiuUPCRExzS2xHKn7Vis"], "min_block": [1251083, "fn1pKdTNVozz4CoJz9OzTNm76s7o6ThBUQXZtzRJu2qw7ILvAYNrnxrVf1SV98rC"], "api_timestamp": 1693378286195} UNDylqCRfc3nF3b1McE2UnG_5eadZyfJ1pvzpVPzjlQ-snapshot {"ditem": ["V3LCuJsoFtsGXdwl99oXOZRYPntwhVR6q-a2FSs47s0"], "min_block": [1251096, "1l3NSawiDMe5VchF5r8GMRalPe8wvfu6DBkG_URocl7OGdvnDAcMnDAy02SItQ3O"], "api_timestamp": 1693379952192} UNDylqCRfc3nF3b1McE2UnG_5eadZyfJ1pvzpVPzjlQ {"ditem": ["pmZv23ozBAtg2OqfEM0WbMO0CKQhF31xaVrcn4gHu4g"], "min_block": [1251103, "sILWXTqVWEJCig3DqdIJ4LDN-yOOt7MtCgvKCqMBCFhUasT_YznAlu_hBiUSDfhW"], "api_timestamp": 1693381163640} Pj2OVDb1Qy_c-sEUDJ7YDXiknyf0_XHyau2haSvXZy4 {"ditem": ["EB2Vx1qRTZD69pgOUW1qKJlsNCGiGek7xYHxXWVVL0w"], "min_block": [1251107, "xYpeonExDVykF9l1HpdFCY1WeecKTn8_nvI8VA6XEbEdWwQ9Zfqfll5MKWlrTr6i"], "api_timestamp": 1693381651624} aT7mJyb7S_Fup7hXLOX594AAZB5kbdNmbb7dcITPHz8 {"ditem": ["r2RxgqBgY4_SJRACoXW6y7FJZiVQuFv5hA5fpG4sLuw"], "min_block": [1251116, "Wzlxc6biOHcclEnXZcjkLkXt6dIrJ0LqZq3XkOZnU9a6UJVmtTHeMSIYoNcnAxm0"], "api_timestamp": 1693382647170} 6qUZ8Elu4VHU8x0KVEXxdBBRa6wf22SQeIcaIfOd8pk-snapshot {"ditem": ["H2eGB9xaqLj9GKcNyayfuXbybaNsGxYEAqCHGYPOCGo"], "min_block": [1251172, "hsLwXBo0jkJmD3PB8-4noeEqw6XxLz99BlVG3hVFy17Sqgl8EELYJzL1WoVJPKOi"], "api_timestamp": 1693389485899} 6qUZ8Elu4VHU8x0KVEXxdBBRa6wf22SQeIcaIfOd8pk {"ditem": ["AgeO4GGdQJTXEt61gPQkUIZQsspPRtv5QCj5XWQRQPg"], "min_block": [1251186, "SKo8T7D2Q_adY6wotI06T40JzISSNfalk_58pVludWB-udypIUH159nHOAO_18S1"], "api_timestamp": 1693390764511} NUYGqDns5GlPzkwnRf6R3WqbTSCOiVaqFj4RZKWj20s {"ditem": ["V5OQCgpDsyZRgJRQdOfTXDBsaxt3LssGzF337yaOZ8w"], "min_block": [1251188, "4gbxsXtclQzMF6XTgh3zmt9KCUine9JFEn52NxGAXQ4vL-6omn1J4B2O5OfrrPSt"], "api_timestamp": 1693391084207} Ao5L_Z3mOsWIZLn98659_M1M1ovjshisErNFtSk1vf0 {"ditem": ["RSe6xlixVjIyW6NJjDn5QgCr9ynqiipw9x36HFhDGR0"], "min_block": [1251213, "WZE35niQMMn0YoIfEfQzqitJUr_6kOLIGteuknFPDF1wPAw7zPrgjR7_PKoYWzeq"], "api_timestamp": 1693394676328}
~/log $ cat Ao5L_Z3mOsWIZLn98659_M1M1ovjshisErNFtSk1vf0 ; echo {"ditem": ["E0gGVPzBBQpkYhCcn5VEURR-I4VMHiRsUeH7mkXkmeU"], "min_block": [1251336, "oBo7VhTTM0i6xYjdRR8HknzYx7cfPfCaHSp-SI0YzGw6bRCeC6B-mzLlWkVAAvNf"], "api_timestamp": 1693410208571} ~/log $ cat CwVPl5va1OL1pk5tC8KJrMMeONJfXdgLFMBAPgWpXt8 ; echo {"ditem": ["7nQhX2NCh7FypOcdutGtHjpxgRZBCRO4mGELSHg6UOo"], "min_block": [1251868, "xJGt7DzzCGesuPrG_M3T6foidYTA6XvSoibIMOqA5FVcNzD79eMdiwsOoW2Nt07B"], "api_timestamp": 1693481248830} ~/log $ curl -L https://arweave.net/7nQhX2NCh7FypOcdutGtHjpxgRZBCR O4mGELSHg6UOo [[2187, {"ditem": ["f0aNoAC0eE5z_UXwAXdVKMNpML5QAqUFqNSgeuIWG_E"], "min_block": [1251652, "Qkzbe8_gXbqDBppn4TptrCYwSaUWrbpcvDMs3_NH9V2nTAwMWGctlRx9M2agVpja"], "api_timestamp": 1693450606642}, 0, 726928100], [729, {"ditem": [" 847TVDQZUSRFPljN7gk0Sjajmpvt3umrFNx0gg-nQBE"], "min_block": [1251757, " lULiu4xpAZEcR53gBkLpUnkcKcMbTi4KhX6uJMUbwwk738RT_-opNmumOOW1rPtR"], "api_timestamp": 1693465798522}, 726928100, 272660798], [243, {"ditem": [" bb5i5avE77Nr2McNpcbiXmH52nM36ZkODw-nFImYPb4"], "min_block": [1251806, "eMVtsJ8ol_0NIidrQL_sAd8aE1MzHJk3qxU6seQrRamVYUgBIM4d1uCmIP9gXLzh"], "api_timestamp": 1693471537811}, 999588898, 113715760], [243, {"ditem": ["gUW2R1MtGAmMg35S_Ct6DZPtU8fT9SzmLrNOFDL6wsk"], "min_block": [1251853, "hAh6YJQBe_V7CAcwMy-8RMIcau4vIuWQY0_sZ1KzoTfJa8kdruXJ3gZ8IwLIKwXd"], "api_timestamp": 1693477520732}, 1113304658, 126017816], [81, {"ditem": ["iQZ_3uPaOW4BqKGXTYPIHp3k7tFt-GzE7yb2Ndgi-E8"], "min_block": [1251858, " 3sCog7HD3JgpCl4N0b0FDLiyP1T3gH7OF3gDwtyK95lbGyuZm9mmTQmaek43emH8"], "api_timestamp": 1693479825617}, 1239322474, 45691520], [27, {"ditem": [" fqZb1FmGPUjcBvNZ4a2xaSK3IjCNyo86DY9h8tqYIZ8"], "min_block": [1251865, " Zwb9WGXhKq1vR2MNmjZtzURDNszjJBBS3vZqOaVizvVk3m1RzRohZ7FhemT_-nEH"], "api_timestamp": 1693480582178}, 1285013994, 15500288], [9, {"ditem": ["1ivR53vUE6rt6Nm_F60oGpNHSifiLmYciTnSbD_NW0c"], "min_block": [1251867, " aKc7hDDVrfl24Is9vQ98602jiOCVBD4biEH_wiXwNaALUkgh3HEqXBUroEsT8dib"], "api_timestamp": 1693480837502}, 1300514282, 4727808], [9, {"ditem": [" nZGtKgMR9pg84hJ13wXtqjRbxjzhY5dakLjQSPhGlJE"], "min_block": [1251868, "xJGt7DzzCGesuPrG_M3T6foidYTA6XvSoibIMOqA5FVcNzD79eMdiwsOoW2Nt07B"], "api_timestamp": 1693481158117}, 1305242090, 4947904], [3, {"ditem": [" xEqs03RdJsH8d3Lv2Ya8Pel8K2ZN-JSgh3LdatGB4Ao"], "min_block": [1251868, "xJGt7DzzCGesuPrG_M3T6foidYTA6XvSoibIMOqA5FVcNzD79eMdiwsOoW2Nt07B"], "api_timestamp": 1693481238950}, 1310189994, 1675904], [-1, {"capture": {"ditem": ["iOyknMHhADaHK_xkpQlNoEZrrvP99c3IXfe3WewRFrw", " bP8L0uD7KHQjyI9enD0jP0tNOIdmIk5VnLHjKgBvb1o", "5-fbgNyugGqfZoa6fyQPo-rgBQXgyBjJVfDapwQX4Io", "a-eU5TMG-SQOboZTF_Z7R8Ohqvn_Kz7L3y7T2pawvyc", " mMNDc3Ww2i4Wk0VPeXfYVUgIEKfyvh7rOJjyY-WjDFU", "8rD1d-DDAhVBztEsEv-GqGvBqE1wbZp-9bsu0Pcojhw", "ppNEmZM7y8SZJasxqhRvEdvbDu3aG-liwCi3c8VFnOA", "1N9xwdWwVKjTg3zOWjgB0-Hdx25h6HeSyVQ5VQ8SK8w", "SKsp4gmX8NNyOQPu-I-I8TG93B8Y6nz6iKR8Bt5eg_o", " x4Tu6HsoQWfGN25iE5Fvpa82fGpB28rrnYjdNxAez24", " SOdFdHBaZwhojok6Q9WOPNpXgCdmKxyUmKE3cW51T2M", " rEHm5RnvNvD9Y5qaruorlvAyOkXpuhqyF8ISRmkHhPQ", "0xCnVMOVBlwT-VSe_g1-BGJ1E6ejxqecP_XGCxtpRQI", "DxIPFNNdgrx4Wke2DtZB78tVtukkSw61n6QBg5NPHB8", "CEkUvjVy_ BHcK56g4kespacaQFdGXfYm0IdFFtkm-EU", "8KCcnfhuOJH7ZediIK_qrjBatuJ2pVPli17eMaMqIoU", "AMX_Lw3taJALfPAaQJg8dsE9ja7Sz-j9uPB8fawc7i8", "TRTir- wSyVWzU0KlZyWjh6ho8Tu8TsP3fQBSA9n-8Mw", "d58UH4--pAwnlQpGk-IvSOkA9w0f3fIcODQo7rsXZ54", "spdeHzjHVhvCp__zTODoJAMal_0-oIG_0Fn-cifCY1k", "60P5mPNH0mMCLflTTrgU5B-xOmTUJa4HRqAAuFsW4jQ"], "time": [1693474598.4573686, 1693474598.457753, 1693474598.9385505, 1693474598.939684, 1693474599.2417426, 1693474606.5606308, 1693474606.5718164, 1693474606.5777023, 1693474606.5899456, 1693474606.6084876, 1693474606.9194317, 1693474606.920894, 1693474607.2051167, 1693474607.2071836, 1693474607.592393, 1693474607.5929785, 1693474607.776161, 1693474607.7766247, 1693474608.067779, 1693474608.412459, 1693474608.6029866]}, "min_block": [1251868, "xJGt7DzzCGesuPrG_M3T6foidYTA6XvSoibIMOqA5FVcNzD79eMdiwsOoW2Nt07B"], "api_timestamp": 1693481247742, "dropped": null}, 1311865898, 258469]] # ANS-104: Bundled Data v2.0 - Binary Serialization Status: Standard ## Abstract This document describes the data format and directions for reading and writing bundled binary data. Bundled data is a way of writing multiple independent data transactions (referred to as DataItems in this document) into one top level transaction. A DataItem shares many of the same properties as a normal data transaction, in that it has an owner, data, tags, target, signature, and id. It differs in that is has no ability to transfer tokens, and no reward, as the top level transaction pays the reward for all bundled data. ## Motivation Bundling multiple data transactions into one transaction provides a number of benefits: - Allow delegation of payment for a DataItem to a 3rd party, while maintaining the identity and signature of the person who created the DataItem, without them needing to have a wallet with funds - Allow multiple DataItems to be written as a group - Increase the throughput of logically independent data-writes to the Arweave network ## Reference Implementation There is a reference implementation for the creation, signing, and verification of DataItems and working with bundles in [TypeScript](https://github.com/ArweaveTeam/arweave-data) ## Specification ### 1. Transaction Format #### 1.1 Transaction Tags A bundle of DataItems MUST have the following two tags present: - `Bundle-Format` a string describing the bundling format. The format for this standard is `binary` - `Bundle-Version` a version string. The version referred to in this standard is `2.0.0` Version changes may occur due to a change in encoding algorithm in the future #### 1.2 Transaction Body Format This format for the transaction body is binary data in the following bytes format `N = number of DataItems` | Bytes | Purpose | | --------------- | ------------------------------------------------------------------ | | 32 | Numbers of data items | | `N` x 64 | Pairs of size and entry ids [size (32 bytes), entry ID (32 bytes)] | | Remaining bytes | Binary encoded data items in bundle | #### 1.3 DataItem Format A DataItem is a binary encoded object that has similar properties to a transaction | Field | Description | Encoding | Length (in bytes) | Optional | | ------------------- | ---------------------------------------------- | -------- | ------------------------- | ------------------ | | signature type | Type of key format used for the signature | Binary | 2 | :x: | | signature | A signature produced by owner | Binary | Depends on signature type | :x: | | owner | The public key of the owner | Binary | 512 | :x: | | target | An address that this DataItem is being sent to | Binary | 32 (+ presence byte) | :heavy_check_mark: | | anchor | A value to prevent replay attacks | Binary | 32 (+ presence byte) | :heavy_check_mark: | | number of tags | Number of tags | Binary | 8 | :x: | | number of tag bytes | Number of bytes used for tags | Binary | 8 | :x: | | tags | An avro array of tag objects | Binary | Variable | :x: | | data | The data contents | Binary | Variable | :x: | All optional fields will have a leading byte which describes whether the field is present (`1` for present, `0` for _not_ present). Any other value for this byte makes the DataItem invalid. A tag object is an Apache Avro encoded stream representing an object `{ name: string, value: string }`. Prefixing the tags objects with their bytes length means decoders may skip them if they wish. The `anchor` and `target` fields in DataItem are optional. The `anchor` is an arbitrary value to allow bundling gateways to provide protection from replay attacks against them or their users. ##### 1.3.1 Tag format Parsing the tags is optional, as they are prefixed by their bytes length. To conform with deployed bundles, the tag format is [Apache Avro]( https://avro.apache.org/docs/current/spec.html) with the following schema: ```json { "type": "array", "items": { "type": "record", "name": "Tag", "fields": [ { "name": "name", "type": "bytes" }, { "name": "value", "type": "bytes" } ] } } ``` Usually the name and value fields are UTF-8 encoded strings, in which case `"string"` may be specified as the field type rather than `"bytes"`, and avro will automatically decode them. To encode field and list sizes, avro uses a `long` datatype that is first zig-zag encoded, and then variable-length integer encoded, using existing encoding specifications. When encoding arrays, avro provides for a streaming approach that separates the content into blocks. ##### 1.3.1.1 ZigZag coding [ZigZag]( https://code.google.com/apis/protocolbuffers/docs/encoding.html#types) is an integer format where the sign bit is in the 1s place, such that small negative numbers have no high bits set. In surrounding code, normal integers are almost always stored in a twos-complement manner instead, which can be converted as below. Converting to ZigZag: ``` zigzag = twos_complement << 1; if (zigzag < 0) zigzag = ~zigzag; ``` Converting from ZigZag: ``` if (zigzag & 1) zigzag = ~zigzag; twos_complement = zigzag >> 1; ``` ##### 1.3.1.2 Variable-length integer coding [Variable-length integer]( https://lucene.apache.org/java/3_5_0/fileformats.html#VInt) is a 7-bit little-endian integer format, where the 8th bit of each byte indicates whether another byte (of 7 bits greater significance) follows in the stream. Converting to VInt: ``` // writes 'zigzag' to 'vint' buffer offset = 0; do { vint_byte = zigzag & 0x7f; zigzag >>= 7; if (zigzag) vint_byte |= 0x80; vint.writeUInt8(vint_byte, offset); offset += 1; } while(zigzag); ``` Converting from VInt: ``` // constructs 'zigzag' from 'vint' buffer zigzag = 0; offset = 0; do { vint_byte = vint.readUInt8(offset); zigzag |= (vint_byte & 0x7f) << (offset*7); vint_byte &= 0x80; offset += 1; } while(vint_byte); ``` ##### 1.3.1.3 Avro tag array format [Avro arrays](https://avro.apache.org/docs/current/spec.html#array_encoding) may arrive split into more than one sequence of items. Each sequence is prefixed by its length, which may be negative, in which case a byte length is inserted between the length and the sequence content. This is used in schemas of larger data to provide for seeking. The end of the array is indicated by a sequence of length zero. The complete tags format is a single avro array, consisting solely of blocks of the below format. The sequence is terminated by a block with a count of 0. The size field is only present if the count is negative, in which case its absolute value should be used. | Field | Description | Encoding | Length | Optional | | ----- | -------------------------- | ----------- | -------- | ------------------ | | count | Number of items in block | ZigZag VInt | Variable | :x: | | size | Number of bytes if count<0 | ZigZag VInt | Variable | :heavy_check_mark: | | block | Concatenated tag items | Binary | size | :x: | ##### 1.3.1.4 Avro tag item format Each item of the avro array is a pair of avro strings or bytes objects, a name and a value, each prefixed by their length. | Field | Description | Encoding | Length | Optional | | ---------- | ------------------------ | ----------- | ---------- | -------- | | name_size | Number of bytes in name | ZigZag VInt | Variable | :x: | | name | Name of the tag | Binary | name_size | :x: | | value_size | Number of bytes in value | ZigZag VInt | Variable | :x: | | value | Value of the tag | Binary | value_size | :x: | ### 2. DataItem signature and id The signature and id for a DataItem is built in a manner similar to Arweave 2.0 transaction signing. It uses the Arweave 2.0 deep-hash algorithm. The 2.0 deep-hash algorithm operates on arbitrarily nested arrays of binary data, i.e a recursive type of `DeepHashChunk = Uint8Array | DeepHashChunk[]`. There are reference implementations for the deep-hash algorithm in [TypeScript]( https://github.com/ArweaveTeam/arweave-js/blob/b1c4b2e378a1eb7dc1fbfaeee4149... ) and [Erlang]( https://github.com/ArweaveTeam/arweave/blob/b316173cd42a53a59036241f8e164b61... ) To generate a valid signature for a DataItem, the contents of the DataItem and static version tags are passed to the deep-hash algorithm to obtain a message. This message is signed by the owner of the DataItem to produce the signature. The id of the DataItem, is the SHA256 digest of this signature. The exact structure and content passed into the deep-hash algorithm to obtain the message to sign is as follows: ``` [ utf8Encoded("dataitem"), utf8Encoded("1"), owner, target, anchor, [ ... [ tag.name, tag.value ], ... [ tag.name, tag.value ], ... ], data ] ``` #### 2.1 Verifying a DataItem DataItem verification is a key to maintaining consistency within the bundle standard. A DataItem is valid iff.<sup>1</sup>: - id matches the signature (via SHA-256 of the signature) - signature matches the owner's public key - tags are all valid - an anchor isn't more than 32 bytes A tag object is valid iff.: - there are <= 128 tags - each key is <= 1024 bytes - each value is <= 3072 bytes - only contains a key and value - both the key and value are non-empty strings ### 3. Writing a Bundle of DataItems To write a bundle of DataItems, each DataItem should be constructed, signed, encoded, and placed in a transaction with the transaction body format and transaction tags specified in Section 1. #### 3.1 Nested bundle Arweave Transactions and DataItems have analogous specifications for tagging and bearing of a binary payload. As such, the ANS-104 Bundle Transaction tagging and binary data format specification can be applied to the tags and binary data payload of a DataItem. Assembling a DataItem this way provides for the nesting of ANS-104 Bundles with one-to-many relationships between "parent" and "child" bundles and theoretically unbounded levels of nesting. Additionally, nested DataItem Bundles can be mixed heterogeneously with non-Bundle DataItems at any depth in the Bundle tree. To construct an ANS-104 DataItem as a nested Bundle: - Add tags to the DataItem as described by the specification in [section 1.1](#11-transaction-tags) - Provide a binary payload for the DataItem matching the Bundle Transaction Body Format described in [section 1.2](#12-transaction-body-format), i.e. the Bundle header outlining the count, size, and IDs of the subsequent, nested DataItems, each of which should be verifiable using the method described in [section 2.1](#21-verifying-a-dataitem). Gateway GQL queries for DataItem headers should, upon request, contain a `bundledIn` field whose value indicates the parent-child relationship of the DataItem to its immediate parent. Any nested bundle should be traceable to a base layer Arweave Transaction by recursively following the bundledIn field up through the chain of parents. ### 4. Reading a Bundle of DataItems To read a bundle of DataItems, the list of bytes representing the DataItems can be partitioned using the offsets in each pair. Subsequently, each partition can be parsed to a DataItem object (`struct` in languages such as Rust/Go etc. or `JSON` in TypeScript). This allows for querying of a singleton or a bundle as a whole. #### 4.1 Indexing DataItems This format allows for indexing of specific fields in `O(N)` time. Some form of caching or indexing could be performed by gateways to improve lookup times. <sup>1 - if and only if</sup>