On Wed, Aug 12, 2015 at 7:45 PM, Mike Perry <mikeperry@torproject.org> wrote:
At what resolution is this type of netflow data typically captured?
Routers originally exported at 100% coverage, then many of them started supporting sampling at various rates (because routers were choking and buggy anyways, and netheads were happy with averages), some only do sampling. Plug flow probes into network taps and you can do whatever you want (netsec loves this and other tools).
Are we talking about all connection 5-tuples, bidirectional/total transfer byte totals, and open and close timestamps, or more (or less) detail than this?
Are timestamps always included? Are bidirectional transfer bytecounts always included? Are subsampled packet headers (or contents) sometimes/often included?
What about UDP sessions? IPv6?
Information about how UDP is treated would also be useful if/when we manage to switch to a UDP transport protocol, independent of any padding.
All of the above depends on which flow export version / aggregation you choose, until you get to v9 and IPFIX, for which you can define your fields. In short... yes. Flow endtime is last matching packet seen, but a flow can span records when the time (therefore space, ie RAM) limited mandatory expiry timers hit. UDP goes via that, TCP usually via flags. Records can span flows for which other semantic keys may not exist, as often with UDP. But DPI can also be used in the exporter to do all sorts of fun stuff and enable other downstream uses (obviously TLS / IPSEC / crypto break some things there). Tor already bundles multiple logical flows (only TCP for user today) into some number of physical TCP flows, UDP transport there might not need anything special. But consider looking at average flow lifetimes on the internet. There may be case for going longer, bundling or turfing across a range of ports to falsely trigger a record / bloat, packet switching and so forth.
and having more information about what is typically recorded in these cases would be very useful to inform how we might want to design padding and connection usage against this and other issues.
"Typical" is really defined by the use case of whoever needs the flows, be it provisioning, engineering, security, operations, billing, bigdata, etc. And only limited by the available formats, storage, postprocessing, and customization. IPFIX and https://en.wikipedia.org/wiki/NetFlow https://en.wikipedia.org/wiki/IP_Flow_Information_Export https://www.google.com/search?q=(netflow|IPFIX)+(probe|exporter|parser) http://www.freebsd.org/cgi/man.cgi?query=ng_netflow&sektion=4
I think for various reasons (including this one), we're soon going to want some degree of padding traffic on the Tor network at some point relatively soon
Really? I can haz cake nao? Or only after I pump in this 3k email and watch 3k come out the other side to someone otherwise idling ;) https://cdn.plixer.com/images/slider-3-icon.png ... and/or some other bigdata systems ...