Here are some specifics on the data types, surprises, and questions: Originating party requested data services that were 100% onload guaranteed, specifically indicating source as an analogue signal digitisation system that did not have sufficient buffer capacity. Initial requests were for linear buffer but then changed to block file storage and public NAS capability. A similar request for SQL or distributed database storage in cloud hosting was also fielded by many services. Data structures are standard floats in spherical coordinates for 4D vectors, include some reference table indexes in most of the formats, and have some distinct ranges in a "small" selection of sample data: Time is offset (not unix) close to a western military standard but varies in density. Precision of Floats in 3D vector is trimmed, indicating a specific physical resolution. One of the electronic signal log files includes a standard signal characteristic for antenna direction in addition to location vector, typical of cell and e-war systems. Also includes values that may be rate of signaling or CPU processor speed(?). Most of the data uses index values, range is linear 0..count. Some of the data uses both an index and unique identifier, another set uses a large bit scope value assumed to be a hash, but its structure has been identified as a structured tree, possibly a known standard (described below) For each structure type, there are additional values related to the signal characteristics and some indexing/classifier but none related to a identifiable pattern other than sequentially loaded index tables. We are very concerned about the consistency of the data, one must assume that a full SPOOF is possible with calculated generation, however some selections map accurately into adjusted-coordinate 3D structures such as office buildings, houses, and viable speed tracking on highways. A party with direct access is preparing maps. Our interest is to prepare distributed processing techniques to consolidate rendering of the entire snapshot. One set is obviously electronic device data, another is most likely EM(?) tracking of coin and currency objects, another includes more precise vectors and a large unique identifier value and is extremely concerning. There is no statistical anomaly of missing data per region (coverage of entire planet), the density of records is consistent and in all small selections the data has high correlation with physical locations including terrain and structures, aircraft routes, highway speeds, and typical patterns at an accuracy that would require the same knowledge to artificially generate. More importantly: The coin & currency tracking data maps FAR TOO CLEARLY into reasonable commerce patterns, coins into and out of *registers*, bank trucks and storage. Without a full 3d model and a huge computational effort to simulate global commerce, it is more likely that a high precision radar system or sigint capability is actually tracking these targets. The large bit scope and header reference of one data set is especially concerning: 10-12 billion unique identifiers using standard genetic expression encoding values in tree form and a related signal characteristic profile. Tracked at 0.25m resolution. With signals. Log density may be due to AD sampling resolution. Data is historical, mid-year 2014. On Wed, Jun 10, 2015 at 10:37 AM, Troy Benjegerdes <hozer@hozed.org> wrote:
You don't keep 120+gbps running without some government backing you.
I can only think this is some sort of major political statement, by some people with significant political (and real) capital to spend.
Who's got the influence and money to do this, and why? I can only imagine it's some sort of reaction to the USA freedom act.
So if you think your data collection system might now be illegal, do you open source it because it'll spill the beans on the banksters who double-crossed you?
Regardless of why, how do you manage data integrity of such a large dump so you are not looking at intentionally manipulated data?
On Wed, Jun 10, 2015 at 09:17:59AM -0400, Wilfred Guerin wrote:
Files are standard DB Table dumps (packed) loading from a cluster of VPNs from torrent and NAS protocols through central europe (entry providers are all in privacy-sensitive countries) and intended to be a distributed database service; there is simply nothing big enough to handle this onload directly. (at 120+gbps bursts) Some of the services are posting public torrent data and open sql database access. Table files are set up as redundant master with cross-population and standard distribution techniques. Some of the tracking data appears to have 1 inch resolution target vectors.
On Wed, Jun 10, 2015 at 8:52 AM, Griffin Boyce <griffin@cryptolab.net> wrote:
Wilfred Guerin wrote:
Some huge *meaning close to exobyte size* data sets are circulating in storage clouds this last week, appear to be snapshots of signals intelligence metadata including vector tracking of signals targets (possibly cell phones based on movement vectors) and cross-associated metadata for their communications. Indications are that these are recon signal dumps of the american sigint system loaded by a major organized crime syndicate and cover most of last year. There is also a set of organic tracking signals, assumably covert agent communications, and another set that appears to be all American and European cash money transactions(???).
Links to more info? Are these intended to be public, or some kind of config failure?
-- ---------------------------------------------------------------------------- Troy Benjegerdes 'da hozer' hozer@hozed.org 7 elements earth::water::air::fire::mind::spirit::soul grid.coop
Never pick a fight with someone who buys ink by the barrel, nor try buy a hacker who makes money by the megahash