[spam] A Longitudal Study of Cryptographic API – a Decade of Android Malware

Wed May 18 11:22:30 PDT 2022

I started pasting this in to try to give myself better marketing karma
for the list, but I stopped when I realized that it is a huge paper
about how to profile malware based on their use of cryptographic
libraries, which seems kind of a rather poor association, and likely
produces results that are biased against p2p apps.

Still, it's fun to see all the words together, and the math and shit is cool.

https://arxiv.org/pdf/2205.05573.pdf

ABSTRACT

Cryptography has been extensively used in Android applications to
guarantee secure communications, conceal critical data from reverse
engineering, or ensure mobile users’ privacy. Various system-based and
third-party libraries for Android provide cryptographic
functionalities, and previous works mainly explored the misuse of
cryptographic API in benign applications. However, the role of
cryptographic API has not yet been explored in Android malware. This
paper performs a comprehensive, longitudinal analysis of cryptographic
API in Android malware. In particular, we analyzed 603 937 Android
applications (half of them malicious, half benign) released between
2012 and 2020, gathering more than 1 million cryptographic API
expressions. Our results reveal intriguing trends and insights on how
and why cryptography is employed in Android malware. For instance, we
point out the widespread use of weak hash functions and the late
transition from insecure DES to AES. Additionally, we show that
cryptography-related characteristics can help to improve the
performance of learning-based systems in detecting malicious
applications

1 INTRODUCTION

The Android operating system has spread worldwide
during the last decade, reaching almost 3 billion users
in 2021 (BusinessOfApps, 2022). At the same time,
security threats against Android have multiplied, as
recent reports showed (McAfee Labs, 2019).
Most Android applications employ cryptographic
primitives to conceal critical information and securely
carry out communication with internal components,
applications, and web services. At the same time, it
is natural to imagine that malware authors may leverage
cryptography in a plethora of artful ways to serve
their malevolent objectives. For instance, cryptography
equips attackers with the ability to fingerprint the
parameters of an infected device, encrypt users’ media
files, establish a secure connection with a command-
and-control server, or manage ransom payments carried
out by victims infected by, e.g., ransomware.

Previous research conveyed a significant effort in
analyzing cryptography in benign applications. The
focus was mainly related to the misuse of cryptographic
application programming interface (API) in benign
Android applications, i.e., on finding and eliminating
vulnerabilities in the employed crypto-routines that may
allow attackers to obtain sensitive information (Egele
et al., 2013; Muslukhov et al., 2018; Chatzikonstantinou
et al., 2016; Shuai et al., 2014).

To the best of our knowledge, however, no study
explored how cryptography is currently employed in
malicious applications. In this paper, we answer two
important research questions related to cryptography
and Android malware:

1. RQ.1: Are there significant differences in how
cryptography is employed in benign and malicious
applications?

2. RQ.2: Can information about cryptography improve
Android malware detection?

We believe that answering these questions will
shed more light on the mechanisms of Android
malware, providing new insights for its analysis,
characterization, and detection. In this paper, we propose
two main contributions. First, we deliver a comprehensive
comparison of how cryptography is employed
in 603 937 malicious and benign applications released
in the last decade. Such a comparison is carried out
with an open-source[1], scalable approach that inspects
(among others) the usage of hash functions, symmetric
and public-key encryption, PRNGs, etc. In total, we
inspect over 106 of cryptographic API expressions.

1: The code is acessible from
github.com/adamjanovsky/AndroidMalwareCrypto.
arXiv:2205.05573v1 [cs.CR] 11 May 20

Second, we show that cryptographic features can
be used to augment the performances of state-of-the-
art malware detectors, demonstrating their discriminant
power in distinguishing malicious and benign
applications. We also employ techniques inherited
from the interpretation of learning models to point out
possible connections between cryptographic API and
malicious actions.

The attained results show many intriguing and surprising
trends. For example, in contrast to benign
applications, malware authors do not typically resort
to strong cryptography to perform their actions. We
show that malware often favors the use of cryptographically
defeated primitives, e.g., weak hash functions
MD5 (Wang and Yu, 2005) or SHA-1 (Stevens et al.,
2017), or symmetric encryption scheme DES (Biham
and Shamir, 1991). These insights can also be
especially useful to learning-based models, which can
leverage these cryptographic trends to improve the
detection rate of malware. We believe that the results
presented in this work can constitute a seminal step to
foster additional research on the relationship between
cryptography and Android malware.

The paper is organized as follows: In Section 2,
we describe the methodology of our analysis. In Section
3, we answer the first research question. Section 4
discusses cryptographic API in relation to malware
detection. The limitations of our study are discussed
in Section 5. Section 6 describes the related work and
Section 7 closes the paper with the conclusions.

2 METHODOLOGY

This section describes the methodology we employed
to extract and analyze the cryptographic API embedded
in Android applications. We start by formalizing
the problem, properly defining its domain and various
constraints. We then show how we implemented
this formalism in practice by discussing our developed
analysis framework. Our findings are based on the
static analysis of the Java source code obtained by the
decompilation of the Android executables.

2.1 Problem Formalization

We organize the problem formalization in two parts:
part one treats the definition of the crypto-routines of
interest for our analysis, part two describes the process
of locating those routines in the application source
code.

I. Definition of Crypto-Routines.

Given a set of Android applications, we denote the set of
all possible functions F contained in their source code as:

F = U∪S∪T = C∪Cc,

Where U represents the set of functions defined by the
user, S is the set of system-related functions contained
in the Android SDK, and T is the set of functions
belonging to third-party libraries. Given a set of known
crypto-related functions C, our goal is to study the
intersection of C and S, denoted as Fcs. In other words,
Fcs is the set of cryptography-related functions that are
defined in the system package (in Android represented
by JCA functions). Notably, in this analysis, we discard
custom cryptographic functions that users or third
parties may implement. Automatic detection of such
functions would be a complex task in the context of
a large-scale analysis, which may lead to false positives
(or negatives) without further manual inspection.

In our study, we solely aim to answer what functions
from Fcs the malware authors favor.

>From the cryptographical perspective, the functions
contained in Fcs can be divided into the following categories:

(i) Hash functions. Cryptographic
hash functions such as MD5, SHA-1, or SHA-2;

(ii) Symmetric encryption. Symmetric cipher primitives
such as AES, DES, or RC4;

(iii) Public-key encryption. Asymmetric primitives, in Android
represented by the RSA cryptosystem;

(iv) Digital signature algorithms. Primitives that empower
digital signatures, e.g., ECDSA;

(v) MAC algorithms. Primitives that construct Message
Authentication Codes, also called MACs;

(vi) PRNG. Functions to run pseudo-random number
generators (PRNG);

(vii) Key agreement protocols. Algorithms for key
exchange, in JCA represented by Diffie-Hellman protocol;

(viii) Others. Functions that do not fall into any of the previous categories.

II. Locating Cryptographic API.

All functions in Fcs are available through two Java
packages in Android API: javax.crypto and java.security.
Our research aims to reveal which cryptographic functions
have been chosen and directly employed by the
authors. Notably, Android applications typically contain
third-party packages that invoke crypto functions. We
aim to exclude those packages from our analysis as the
application authors did not contribute to them.

Thus, for each Android sample, we are interested
in extracting the cryptographic API Fa ⊆ Fcs that is
invoked from user-defined functions U. To obtain the
functions belonging to Fa, we perform two steps:

(i) We automatically detect the classes that belong to third-
party or system libraries, and we exclude them from
the set of classes that should be explored. By doing so,
we establish the list of user-implemented functions U;

(ii) We extract all references to crypto-related functions
Fcs that are invoked directly from U.

The first step is motivated by the discovery (Wang
et al., 2015) that more than 60% of Android APK[2]
code (on average) originates from third-party packages.
To study user-authored code, it is therefore critical to
differentiate, with reasonable certainty, whether a class
belongs to a third-party library or not. This task can
be extremely challenging and was extensively studied,
e.g., by (Wang et al., 2015; Ma et al., 2016b; Backes
et al., 2016). It does not suffice to merely search for the
import clauses in the decompiled source code since
the non-system packages could be renamed. This
scenario is especially frequent in malicious applications,
as the authors aim to defend against any forensics.
Inspired by the systematic review of third-party package
detectors (Zhan et al., 2020), we opted to tackle this
with LibRadar, a popular third-party libraries detection
tool that utilizes clustering and complex signatures
to recognize such libraries (Ma et al., 2016b). In this
review, LibRadar achieves the highest precision and
second-highest recall while it takes approx. 5 seconds
to evaluate an APK on average. The runner-up requires
over 80 seconds per APK, which would be unsuitable
for large-scale analysis. LibRadar was trained on a
large dataset of Android applications and can reliably
fingerprint more than 29 000 third-party libraries, not
relying on package names. Consequently, LibRadar
can identify obfuscated packages. Using LibRadar[3],
we filter the identified third-party packages of an APK
from subsequent cryptographic API analysis.

2: Android Application Package, an archive that
encapsulates the whole Android application.

3: Since LibRadar requires large Redis database to
run (preventing parallelization), we actually leveraged its
lightweight version LiteRadar. Prior to doing so, we compared
the output of both tools on a small subset to find out
that this decision has a negligible effect on the number of
detected libraries.

2.2 Crypto API Extraction Pipeline

Our system generates a comprehensive report of the
embedded cryptographic API, given an application
dataset. As an input, configuration files for the to-be-
conducted experiment are taken. Apart from other
choices, the files contain a list of APKs that can be
loaded from a disk or downloaded from the Internet.

The APKs are then processed in parallel, and each
sample traverses the following pipeline:

1. Pre-processor. This module decompiles the
APKs to obtain their Java source code. Then, the
third-party packages of the APKs are identified,
and the whole Java source code of the APKs is
extracted.

2. Crypto-extractor. This module extracts and
analyzes the cryptographic function call sites in the
application source code. Their filtering is achieved
by matching pre-defined regular expressions.
Additionally, the crypto-extractor also detects both
Java and native third-party cryptographic libraries.

3. Evaluator. This module stores, organizes, and
aggregates the information retrieved by the analyzed
APKs to a JSON record.

The evaluator outputs a report of the cryptographic
usage for each APK. We designed the system in a
modular fashion so that one can alter its inner workings
to extract further valuable insights from the APKs.

2.3 Dataset

To gain an all-around view of the cryptographic API
landscape in Android applications, we leverage the
Androzoo dataset (Allix et al., 2016). Currently, Androzoo
is the largest available dataset of Android applications,
containing more than 15 million of APKs.

We sampled 302 039 benign applications and 301 898
malicious applications from Androzoo released in the
years 2012-2020. We strived for uniform distribution
of samples in the studied timeline. Yet, for years 2018,
2020 we could only collect a limited number of malicious
samples – 19 305 and 10 039, respectively. To
speed up the computation, we only gathered APKs
smaller than 20MB (approximately 89% of malicious
APKs in the Androzoo fulfill this criterion).

To accurately discriminate malicious files, we consider
an APK as malicious if it was flagged malicious
by at least five antivirus scanners from the VirusTotal
service[4], which should reliably eliminate benign
files deemed malicious, as reported by Salem (Salem,
2020). Our samples are predominantly originating
from 3 distinct sources: Google Play (60%), Anzhi
(19%), and Appchina (13%). Note that the samples
were deduplicated on a per-market basis (Allix et al.,
2016) to avoid over-counting.

4: virustotal.com. The number of VirusTotal positive flags
is already contained in the Androzoo dataset.

2.4 Cryptography and Machine Learning

Statistics about cryptographic usage are undoubtedly
helpful in pointing out differences between benign and
malicious applications. Another intriguing question to
explore is whether such statistics can be useful to
recognize malicious samples from benign ones effectively.

To answer this question, we propose three methods
that employ machine learning techniques, described in
the following.

2.4.1 Cryptographic Learning Model

The first technique consists of defining a learningbased system whose
structure is inspired by other popular detection systems (Daniel et
al., 2014; Chen et al.,
2016; Maiorca et al., 2017). In particular, the proposed system
performs the following steps: (i) it takes
as an input an Android application and extracts its
cryptographic API usage with the pipeline described
in Section 2.2; (ii) it encodes this statistics into a vector
of features; (iii) it trains a machine-learning classifier
to predict a benign/malicious label.
The feature vector includes features that can be
categorized into three sets:
• Set A: flags indicating the use of third-party cryptographic
libraries (both Java and native).
• Set B: frequencies of specific cryptographic API
constructors and imports of crypto-related classes,
e.g., number of DES constructors in a sample.
• Set C: aggregated statistics of call sites and
imports related to categories of cryptographic
primitives: hash functions, symmetric encryption
schemes, and so forth. For example: how many
distinct hash functions a sample uses.
By joining these sets, we obtain 300 potentially
informative features. These features are further filtered with a
feature selection algorithm. The dataset
with candidate features is split in a 9:1 ratio into training/test
sets. Then, we apply two feature selection
methods to drop uninformative features. First, we examine all possible
pairs of features. If a pair exhibits
Pearson’s correlation coefficient higher than 0.95, we
drop a random feature of such a pair. Second, we remove the features
that are deemed uninformative by
Boruta (Kursa et al., 2010). Boruta is a supervised
algorithm that iteratively replicates features, randomly
permutates their values, trains a random forest, and
removes redundant features based on the z-score. The
feature selection process yields 189 features that are
used for learning.
To choose the best family of models for discriminating between
malicious and benign samples on our
dataset, we experimented with naive Bayes, logistic
regression, support vector machines with linear kernel,
random forest, gradient boosted decision trees, and
multilayer perceptron. We tuned the classifiers’ hyperparameters using
10-fold cross-validation on the
training dataset, optimizing for the F1 score. Subsequent evaluation
yielded a random forest (which works
as a majority-voting ensemble of decision trees trained
on different subsets of the data) as the best-performing
classifier (w.r.t. F1 score). We do not report the entire
analysis here for brevity, and we stick to the random
forest in the rest of the paper.