Huge collection of Security Data Science papers

21 Jul 2014

      (looks useful)

https://pay.reddit.com/r/netsec/comments/2b8pk2/huge_collection_of_security_... 

Over the past several years I have collected and read many security research papers/slides and have started a small catalog of sorts. The topics of these papers range from intrusion detection, anomaly detection, machine learning/data mining, Internet scale data collection, malware analysis, and intrusion/breach reports. I figured this collection might useful to others. All links lead to PDFs hosted here.

I hope to clean this up (add author info, date, and publication) when I get some more time as well as adding some detailed notes I have on the various features, models, algorithms, and datasets used in many of these papers.

Here are some of my favorites (nice uses of machine learning, graph analytics, and/or anomaly detection to solve interesting security problems):

CAMP - Content Agnostic Malware Protection
Notos - Building a Dynamic Reputation System for DNS
Kopis - Detecting malware domains at the upper dns hierarchy
Pleiades - From Throw-away Traffic To Bots - Detecting The Rise Of DGA-based Malware
EXPOSURE - Finding Malicious Domains Using Passive DNS Analysis
Polonium - Tera-Scale Graph Mining for Malware Detection
Nazca - Detecting Malware Distribution in Large-Scale Networks
PAYL - Anomalous Payload-based Network Intrusion Detection
Anagram - A Content Anomaly Detector Resistant to Mimicry Attack
Here is the entire collection:

Intrusion Detection

A Close Look on n-Grams in Intrusion Detection- Anomaly Detection vs. Classiﬁcation
A Kill Chain Analysis of the 2013 Target Data Breach
A Lone Wolf No More - Supporting Network Intrusion Detection with Real-Time Intelligence
A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks
Acquiring Digital Evidence from Botnet Attacks: Procedures and Methods (PhD Thesis)
ALERT-ID - Analyze Logs of the network Element in Real Time for Intrusion Detection
Anagram - A Content Anomaly Detector Resistant to Mimicry Attack
Anomaly-based Intrusion Detection in Software as a Service
Back to Basics - Beyond Network Hygiene
Beehive - Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks
Behavioral Clustering of HTTP-based Malware and Signature Generation Using Malicious Network Traces
Beheading Hydras - Performing Effective Botnet Takedowns
Bloodhound - Searching Out Malicious Input in Network Flows for Automatic Repair Validation
Boosting the Scalability of Botnet Detection Using Adaptive Traffic Sampling
CAMP - Content Agnostic Malware Protection
Casting out demons - Sanitizing training data for anomaly sensors
CloudFence - Data Flow Tracking as a Cloud Service
Comparing anomaly detection techniques for HTTP
Cujo - Efficient detection and prevention of drive-by-download attacks
Decoy Document Deployment for Effective Masquerade Attack Detection
Detecting Spammers with SNARE - Spatio-temporal Network-level Automatic Reputation Engine
Detecting Unknown Network Attacks Using Language Models
Early Detection of Malicious Flux Networks via Large-Scale Passive DNS Traffic Analysis
Effective Anomaly Detection with Scarce Training Data
Efficient Multidimensional Aggregation for Large Scale Monitoring
EFFORT - Efficient and Effective Bot Malware Detection
ExecScent- Mining for New C and C Domains in Live Networks with Adaptive Control Protocol Templates - slides
ExecScent- Mining for New C and C Domains in Live Networks with Adaptive Control Protocol Templates
EXPOSURE - Finding Malicious Domains Using Passive DNS Analysis
FiG - Automatic Fingerprint Generation
Filtering Spam with Behavioral Blacklisting
FLIPS - Hybrid Adaptive Intrusion Prevention
HMMPayl - An Intrusion Detection System Based on Hidden Markov Models
Kopis - Detecting malware domains at the upper dns hierarchy
Large-Scale Malware Analysis, Detection, and Signature Generation
Leveraging Honest Users - Stealth Command-and-Control of Botnets - slides
Leveraging Honest Users - Stealth Command-and-Control of Botnets
Local System Security via SSHD Instrumentation
Machine Learning In Adversarial Environments
Malware vs. Big Data (Umbrella Labs)
McPAD - A Multiple Classifier System for Accurate Payload-based Anomaly Detection
Measuring and Detecting Malware Downloads in Live Network Traffic
Mining Botnet Sink Holes - slides
MISHIMA - Multilateration of Internet hosts hidden using malicious fast-ﬂux agents
Monitoring the Initial DNS Behavior of Malicious Domains
N-Gram against the Machine - On the Feasibility of the N-Gram Network Analysis for Binary Protocols
Nazca - Detecting Malware Distribution in Large-Scale Networks
Netgator - Malware Detection Using Program Interactive Challenges - slides
Network Traffic Characterization Using (p, n)-grams Packet Representation
Notos - Building a Dynamic Reputation System for DNS
On the Feasibility of Online Malware Detection with Performance Counters
On the Infeasibility of Modeling Polymorphic Shellcode
On the Mismanagement and Maliciousness of Networks
Outside the Closed World - On Using Machine Learning For Network Intrusion Detection
PAYL - Anomalous Payload-based Network Intrusion Detection
PAYL2 - Anomalous Payload-based Worm Detection and Signature Generation
Pleiades - From Throw-away Traffic To Bots - Detecting The Rise Of DGA-based Malware
Practical Comprehensive Bounds on Surreptitious Communication Over DNS - slides
Practical Comprehensive Bounds on Surreptitious Communication Over DNS
Privacy-preserving Payload-based Correlation for Accurate Malicious Traffic Detection
Revealing Botnet Membership Using DNSBL Counter-Intelligence
Revolver - An Automated Approach to the Detection of Evasive Web-based Malware
Self-organized Collaboration of Distributed IDS Sensors
SinkMiner- Mining Botnet Sinkholes for Fun and Profit
Spamming Botnets - Signatures and Characteristics
Spectrogram - A Mixture of Markov Chain models for Anomaly Detection in Web Traffic
The Security of Machine Learning
Toward Stealthy Malware Detection
Traffic Aggregation for Malware Detection
Understanding the Domain Registration Behavior of Spammers
Understanding the Network-Level Behavior of Spammers
VAST- Network Visibility Across Space and Time
Malware

A static, packer-agnostic filter to detect similar malware samples
A study of malcode-bearing documents
A survey on automated dynamic malware-analysis techniques and tools
APT1 Technical backstage (malware.lu hack backs of APT1 servers)
Automatic Analysis of Malware Behavior using Machine Learning
BitShred - Fast, Scalable Code Reuse Detection in Binary Code
BitShred - Fast, Scalable Malware Triage
Deobfuscating Embedded Malware using Probable-Plaintext Attacks
Escape from Monkey Island - Evading High-Interaction Honeyclients
Eureka - A framework for enabling static malware analysis
Extraction of Statistically Significant Malware Behaviors
Fast Automated Unpacking and Classification of Malware
FIRMA - Malware Clustering and Network Signature Generation with Mixed Network Behaviors
FuncTracker - Discovering Shared Code (to aid malware forensics) - slides
FuncTracker - Discovering Shared Code to Aid Malware Forensics Extended Abstract
Malware files clustering based on file geometry and visualization using R language
Mobile Malware Detection Based on Energy Fingerprints — A Dead End
Polonium - Tera-Scale Graph Mining for Malware Detection
Putting out a HIT - Crowdsourcing Malware Installs
Scalable Fine-grained Behavioral Clustering of HTTP-based Malware
SigMal - A Static Signal Processing Based Malware Triage
Tracking Memory Writes for Malware Classification and Code Reuse Identification
Using File Relationships in Malware Classification
VAMO - Towards a Fully Automated Malware Clustering Validity Analysis
Data Collection

Crawling BitTorrent DHTs for Fun and Proﬁt
CyberProbe - Towards Internet-Scale Active Detection of Malicious Servers
Demystifying service discovery - Implementing an internet-wide scanner
gitDigger - Creating useful wordlists from GitHub
PoisonAmplifier - A Guided Approach of Discovering Compromised Websites through Reversing Search Poisoning Attacks
ZMap - Fast Internet-Wide Scanning and its Security Applications (slides)
ZMap - Fast Internet-Wide Scanning and its Security Applications
Vulnerability Analysis/Reversing

A Preliminary Analysis of Vulnerability Scores for Attacks in Wild
Attacker Economics for Internet-scale Vulnerability Risk Assessment
Detecting Logic Vulnerabilities in E-Commerce Applications
ReDeBug - Finding Unpatched Code Clones in Entire OS Distributions
The Classification of Valuable Data in an Assumption of Breach Paradigm
Toward Black-Box Detection of Logic Flaws in Web Applications
Vulnerability Extrapolation - Assisted Discovery of Vulnerabilities using Machine Learning - slides
Vulnerability Extrapolation - Assisted Discovery of Vulnerabilities using Machine Learning
Anonymity/Privacy/OPSEC/Censorship

Anonymous Hacking Group – #OpNewblood Super Secret Security Handbook
Detecting Traffic Snooping in Tor Using Decoys
Risks and Realization of HTTPS Traffic Analysis
Selling Off Privacy at Auction
The Sniper Attack - Anonymously Deanonymizing and Disabling the Tor Network
The Velocity of Censorship - High-Fidelity Detection of Microblog Post Deletions - slides
The Velocity of Censorship - High-Fidelity Detection of Microblog Post Deletions
Tor vs. NSA
Data Mining

An Exploration of Geolocation and Traffic Visualization Using Network Flows to Aid in Cyber Defense
DSpin - Detecting Automatically Spun Content on the Web
Gyrus - A Framework for User-Intent Monitoring of Text-Based Networked Applications
Indexing Million of Packets per Second using GPUs
Multi-Label Learning with Millions of Labels - Recommending Advertiser Bid Phrases for Web Pages
Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework
Shingled Graph Disassembly - Finding the Undecideable Path
Synoptic Graphlet - Bridging the Gap between Supervised and Unsupervised Profiling of Host-level Network Traffic
Cyber Crime

Connected Colors - Unveiling the Structure of Criminal Networks
Image Matching for Branding Phishing Kit Images - slides
Image Matching for Branding Phishing Kit Images
Inside a Targeted Point-of-Sale Data Breach
Investigating Advanced Persistent Threat 1 (APT1)
Measuring pay-per-install - the Commoditization of Malware Distribution
Scambaiter - Understanding Targeted Nigerian Scams on Craigslist
Sherlock Holmes and the Case of the Advanced Persistent Threat
The Role of the Underground Market in Twitter Spam and Abuse
The Tangled Web of Password Reuse
Trafﬁcking Fraudulent Accounts - The Role of the Underground Market in Twitter Spam and Abuse
CND/CNA/CNE/CNO

Ampliﬁcation Hell - Revisiting Network Protocols for DDoS Abuse
Defending The Enterprise, the Russian Way
Protecting a Moving Target - Addressing Web Application Concept Drift
Timing of Cyber Conflict
–Jason 
@jason_trost

SECURITYDATA SCIENCERES

Huge collection of Security Data Science papers

Eugen Leitl