[spam] A Longitudal Study of Cryptographic API – a Decade of Android Malware
I started pasting this in to try to give myself better marketing karma for the list, but I stopped when I realized that it is a huge paper about how to profile malware based on their use of cryptographic libraries, which seems kind of a rather poor association, and likely produces results that are biased against p2p apps. Still, it's fun to see all the words together, and the math and shit is cool. https://arxiv.org/pdf/2205.05573.pdf ABSTRACT Cryptography has been extensively used in Android applications to guarantee secure communications, conceal critical data from reverse engineering, or ensure mobile users’ privacy. Various system-based and third-party libraries for Android provide cryptographic functionalities, and previous works mainly explored the misuse of cryptographic API in benign applications. However, the role of cryptographic API has not yet been explored in Android malware. This paper performs a comprehensive, longitudinal analysis of cryptographic API in Android malware. In particular, we analyzed 603 937 Android applications (half of them malicious, half benign) released between 2012 and 2020, gathering more than 1 million cryptographic API expressions. Our results reveal intriguing trends and insights on how and why cryptography is employed in Android malware. For instance, we point out the widespread use of weak hash functions and the late transition from insecure DES to AES. Additionally, we show that cryptography-related characteristics can help to improve the performance of learning-based systems in detecting malicious applications 1 INTRODUCTION The Android operating system has spread worldwide during the last decade, reaching almost 3 billion users in 2021 (BusinessOfApps, 2022). At the same time, security threats against Android have multiplied, as recent reports showed (McAfee Labs, 2019). Most Android applications employ cryptographic primitives to conceal critical information and securely carry out communication with internal components, applications, and web services. At the same time, it is natural to imagine that malware authors may leverage cryptography in a plethora of artful ways to serve their malevolent objectives. For instance, cryptography equips attackers with the ability to fingerprint the parameters of an infected device, encrypt users’ media files, establish a secure connection with a command- and-control server, or manage ransom payments carried out by victims infected by, e.g., ransomware. Previous research conveyed a significant effort in analyzing cryptography in benign applications. The focus was mainly related to the misuse of cryptographic application programming interface (API) in benign Android applications, i.e., on finding and eliminating vulnerabilities in the employed crypto-routines that may allow attackers to obtain sensitive information (Egele et al., 2013; Muslukhov et al., 2018; Chatzikonstantinou et al., 2016; Shuai et al., 2014). To the best of our knowledge, however, no study explored how cryptography is currently employed in malicious applications. In this paper, we answer two important research questions related to cryptography and Android malware: 1. RQ.1: Are there significant differences in how cryptography is employed in benign and malicious applications? 2. RQ.2: Can information about cryptography improve Android malware detection? We believe that answering these questions will shed more light on the mechanisms of Android malware, providing new insights for its analysis, characterization, and detection. In this paper, we propose two main contributions. First, we deliver a comprehensive comparison of how cryptography is employed in 603 937 malicious and benign applications released in the last decade. Such a comparison is carried out with an open-source[1], scalable approach that inspects (among others) the usage of hash functions, symmetric and public-key encryption, PRNGs, etc. In total, we inspect over 106 of cryptographic API expressions. 1: The code is acessible from github.com/adamjanovsky/AndroidMalwareCrypto. arXiv:2205.05573v1 [cs.CR] 11 May 20 Second, we show that cryptographic features can be used to augment the performances of state-of-the- art malware detectors, demonstrating their discriminant power in distinguishing malicious and benign applications. We also employ techniques inherited from the interpretation of learning models to point out possible connections between cryptographic API and malicious actions. The attained results show many intriguing and surprising trends. For example, in contrast to benign applications, malware authors do not typically resort to strong cryptography to perform their actions. We show that malware often favors the use of cryptographically defeated primitives, e.g., weak hash functions MD5 (Wang and Yu, 2005) or SHA-1 (Stevens et al., 2017), or symmetric encryption scheme DES (Biham and Shamir, 1991). These insights can also be especially useful to learning-based models, which can leverage these cryptographic trends to improve the detection rate of malware. We believe that the results presented in this work can constitute a seminal step to foster additional research on the relationship between cryptography and Android malware. The paper is organized as follows: In Section 2, we describe the methodology of our analysis. In Section 3, we answer the first research question. Section 4 discusses cryptographic API in relation to malware detection. The limitations of our study are discussed in Section 5. Section 6 describes the related work and Section 7 closes the paper with the conclusions. 2 METHODOLOGY This section describes the methodology we employed to extract and analyze the cryptographic API embedded in Android applications. We start by formalizing the problem, properly defining its domain and various constraints. We then show how we implemented this formalism in practice by discussing our developed analysis framework. Our findings are based on the static analysis of the Java source code obtained by the decompilation of the Android executables. 2.1 Problem Formalization We organize the problem formalization in two parts: part one treats the definition of the crypto-routines of interest for our analysis, part two describes the process of locating those routines in the application source code. I. Definition of Crypto-Routines. Given a set of Android applications, we denote the set of all possible functions F contained in their source code as: F = U∪S∪T = C∪Cc, Where U represents the set of functions defined by the user, S is the set of system-related functions contained in the Android SDK, and T is the set of functions belonging to third-party libraries. Given a set of known crypto-related functions C, our goal is to study the intersection of C and S, denoted as Fcs. In other words, Fcs is the set of cryptography-related functions that are defined in the system package (in Android represented by JCA functions). Notably, in this analysis, we discard custom cryptographic functions that users or third parties may implement. Automatic detection of such functions would be a complex task in the context of a large-scale analysis, which may lead to false positives (or negatives) without further manual inspection. In our study, we solely aim to answer what functions from Fcs the malware authors favor.
From the cryptographical perspective, the functions contained in Fcs can be divided into the following categories:
(i) Hash functions. Cryptographic hash functions such as MD5, SHA-1, or SHA-2; (ii) Symmetric encryption. Symmetric cipher primitives such as AES, DES, or RC4; (iii) Public-key encryption. Asymmetric primitives, in Android represented by the RSA cryptosystem; (iv) Digital signature algorithms. Primitives that empower digital signatures, e.g., ECDSA; (v) MAC algorithms. Primitives that construct Message Authentication Codes, also called MACs; (vi) PRNG. Functions to run pseudo-random number generators (PRNG); (vii) Key agreement protocols. Algorithms for key exchange, in JCA represented by Diffie-Hellman protocol; (viii) Others. Functions that do not fall into any of the previous categories. II. Locating Cryptographic API. All functions in Fcs are available through two Java packages in Android API: javax.crypto and java.security. Our research aims to reveal which cryptographic functions have been chosen and directly employed by the authors. Notably, Android applications typically contain third-party packages that invoke crypto functions. We aim to exclude those packages from our analysis as the application authors did not contribute to them. Thus, for each Android sample, we are interested in extracting the cryptographic API Fa ⊆ Fcs that is invoked from user-defined functions U. To obtain the functions belonging to Fa, we perform two steps: (i) We automatically detect the classes that belong to third- party or system libraries, and we exclude them from the set of classes that should be explored. By doing so, we establish the list of user-implemented functions U; (ii) We extract all references to crypto-related functions Fcs that are invoked directly from U. The first step is motivated by the discovery (Wang et al., 2015) that more than 60% of Android APK[2] code (on average) originates from third-party packages. To study user-authored code, it is therefore critical to differentiate, with reasonable certainty, whether a class belongs to a third-party library or not. This task can be extremely challenging and was extensively studied, e.g., by (Wang et al., 2015; Ma et al., 2016b; Backes et al., 2016). It does not suffice to merely search for the import clauses in the decompiled source code since the non-system packages could be renamed. This scenario is especially frequent in malicious applications, as the authors aim to defend against any forensics. Inspired by the systematic review of third-party package detectors (Zhan et al., 2020), we opted to tackle this with LibRadar, a popular third-party libraries detection tool that utilizes clustering and complex signatures to recognize such libraries (Ma et al., 2016b). In this review, LibRadar achieves the highest precision and second-highest recall while it takes approx. 5 seconds to evaluate an APK on average. The runner-up requires over 80 seconds per APK, which would be unsuitable for large-scale analysis. LibRadar was trained on a large dataset of Android applications and can reliably fingerprint more than 29 000 third-party libraries, not relying on package names. Consequently, LibRadar can identify obfuscated packages. Using LibRadar[3], we filter the identified third-party packages of an APK from subsequent cryptographic API analysis. 2: Android Application Package, an archive that encapsulates the whole Android application. 3: Since LibRadar requires large Redis database to run (preventing parallelization), we actually leveraged its lightweight version LiteRadar. Prior to doing so, we compared the output of both tools on a small subset to find out that this decision has a negligible effect on the number of detected libraries. 2.2 Crypto API Extraction Pipeline Our system generates a comprehensive report of the embedded cryptographic API, given an application dataset. As an input, configuration files for the to-be- conducted experiment are taken. Apart from other choices, the files contain a list of APKs that can be loaded from a disk or downloaded from the Internet. The APKs are then processed in parallel, and each sample traverses the following pipeline: 1. Pre-processor. This module decompiles the APKs to obtain their Java source code. Then, the third-party packages of the APKs are identified, and the whole Java source code of the APKs is extracted. 2. Crypto-extractor. This module extracts and analyzes the cryptographic function call sites in the application source code. Their filtering is achieved by matching pre-defined regular expressions. Additionally, the crypto-extractor also detects both Java and native third-party cryptographic libraries. 3. Evaluator. This module stores, organizes, and aggregates the information retrieved by the analyzed APKs to a JSON record. The evaluator outputs a report of the cryptographic usage for each APK. We designed the system in a modular fashion so that one can alter its inner workings to extract further valuable insights from the APKs. 2.3 Dataset To gain an all-around view of the cryptographic API landscape in Android applications, we leverage the Androzoo dataset (Allix et al., 2016). Currently, Androzoo is the largest available dataset of Android applications, containing more than 15 million of APKs. We sampled 302 039 benign applications and 301 898 malicious applications from Androzoo released in the years 2012-2020. We strived for uniform distribution of samples in the studied timeline. Yet, for years 2018, 2020 we could only collect a limited number of malicious samples – 19 305 and 10 039, respectively. To speed up the computation, we only gathered APKs smaller than 20MB (approximately 89% of malicious APKs in the Androzoo fulfill this criterion). To accurately discriminate malicious files, we consider an APK as malicious if it was flagged malicious by at least five antivirus scanners from the VirusTotal service[4], which should reliably eliminate benign files deemed malicious, as reported by Salem (Salem, 2020). Our samples are predominantly originating from 3 distinct sources: Google Play (60%), Anzhi (19%), and Appchina (13%). Note that the samples were deduplicated on a per-market basis (Allix et al., 2016) to avoid over-counting. 4: virustotal.com. The number of VirusTotal positive flags is already contained in the Androzoo dataset. 2.4 Cryptography and Machine Learning Statistics about cryptographic usage are undoubtedly helpful in pointing out differences between benign and malicious applications. Another intriguing question to explore is whether such statistics can be useful to recognize malicious samples from benign ones effectively. To answer this question, we propose three methods that employ machine learning techniques, described in the following. 2.4.1 Cryptographic Learning Model The first technique consists of defining a learningbased system whose structure is inspired by other popular detection systems (Daniel et al., 2014; Chen et al., 2016; Maiorca et al., 2017). In particular, the proposed system performs the following steps: (i) it takes as an input an Android application and extracts its cryptographic API usage with the pipeline described in Section 2.2; (ii) it encodes this statistics into a vector of features; (iii) it trains a machine-learning classifier to predict a benign/malicious label. The feature vector includes features that can be categorized into three sets: • Set A: flags indicating the use of third-party cryptographic libraries (both Java and native). • Set B: frequencies of specific cryptographic API constructors and imports of crypto-related classes, e.g., number of DES constructors in a sample. • Set C: aggregated statistics of call sites and imports related to categories of cryptographic primitives: hash functions, symmetric encryption schemes, and so forth. For example: how many distinct hash functions a sample uses. By joining these sets, we obtain 300 potentially informative features. These features are further filtered with a feature selection algorithm. The dataset with candidate features is split in a 9:1 ratio into training/test sets. Then, we apply two feature selection methods to drop uninformative features. First, we examine all possible pairs of features. If a pair exhibits Pearson’s correlation coefficient higher than 0.95, we drop a random feature of such a pair. Second, we remove the features that are deemed uninformative by Boruta (Kursa et al., 2010). Boruta is a supervised algorithm that iteratively replicates features, randomly permutates their values, trains a random forest, and removes redundant features based on the z-score. The feature selection process yields 189 features that are used for learning. To choose the best family of models for discriminating between malicious and benign samples on our dataset, we experimented with naive Bayes, logistic regression, support vector machines with linear kernel, random forest, gradient boosted decision trees, and multilayer perceptron. We tuned the classifiers’ hyperparameters using 10-fold cross-validation on the training dataset, optimizing for the F1 score. Subsequent evaluation yielded a random forest (which works as a majority-voting ensemble of decision trees trained on different subsets of the data) as the best-performing classifier (w.r.t. F1 score). We do not report the entire analysis here for brevity, and we stick to the random forest in the rest of the paper.
participants (1)
-
Undiscussed Horrific Abuse, One Victim of Many