cypherpunks
Threads by month
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
October 2022
- 11 participants
- 365 discussions
Denis Beau: Between mounting risks and financial innovation - the fintech ecosystem at a crossroads
by Gunnar Larson 21 Oct '22
by Gunnar Larson 21 Oct '22
21 Oct '22
Brody Larson collects 16 highlights to a speech by Mr Denis Beau, First
Deputy Governor of the Bank of France, at the FinTech R:Evolution 2022,
organised by France FinTech, Paris, 20 October 2022.
https://drive.google.com/file/d/1NrhFTWsACdDokgsJg9cef08JRXr4q30r/view?usp=…
1
0
21 Oct '22
Musk's Tweets Irk Biden But Offer Scant Room for Security Review
https://www.bloomberg.com/news/articles/2022-10-21/white-house-plan-to-rein…
1
0
Crypto's $2 Trillion Wipeout Is Coming for the C-Suite
https://www.bloomberg.com/news/articles/2022-10-21/crypto-companies-ftx-kra…
1
0
https://reason.com/2022/10/21/prison-slavery-up-for-a-vote-in-5-states/
wow - such a progressive cesspool, I mean 'society'.
ps: yeah teh poor non human jews are being oppressed!!!
1
0
[random] A Paper on Measuring Attackability of Code Generation
by Undescribed Horrific Abuse, One Victim & Survivor of Many 21 Oct '22
by Undescribed Horrific Abuse, One Victim & Survivor of Many 21 Oct '22
21 Oct '22
https://arxiv.org/abs/2206.00052
CodeAttack: Code-based Adversarial Attacks for Pre-Trained
Programming Language Models
Akshita Jha
Virginia Tech, Arlington, VA
akshitajha(a)vt.edu
Chandan K. Reddy
Virginia Tech, Arlington, VA
reddy(a)cs.vt.edu
Abstract
Pre-trained programming language (PL) models (such as CodeT5,
CodeBERT, GraphCodeBERT, etc.,) have the potential to automate
software engineering tasks involving code understanding and code
generation. However,
these models are not robust to changes in
the input and thus, are potentially susceptible to adversarial
attacks. We propose,
CodeAttack, a simple yet effective blackbox attack model that uses
code structure to
generate imperceptible, effective, and minimally perturbed adversarial
code samples. We
demonstrate the vulnerabilities of the stateof-the-art PL models to
code-specific adversarial attacks. We evaluate the transferability of
CodeAttack on several code-code
(translation and repair) and code-NL (summarization) tasks across
different programming
languages. CodeAttack outperforms stateof-the-art adversarial NLP
attack models to
achieve the best overall performance while being more efficient and
imperceptible.
1 Introduction
There has been a recent surge in the development
of general purpose programming language (PL)
models (such as CodeT5 (Wang et al., 2021), CodeBERT (Feng et al.,
2020), GraphCodeBERT (Guo
et al., 2020), and PLBART (Ahmad et al., 2021))
which can capture the relationship between natural
language and programming language, and potentially automate software
engineering development
tasks involving code understanding (clone detection, defect detection)
and code generation (codecode translation, code-code refinement,
code-NL
summarization). However, given the data driven
pre-training of these PL models on massive code
data, their robustness and vulnerabilities need careful investigation.
In this work, we demonstrate the
vulnerability of the state-of-the-art programming
language models by generating adversarial samples
that leverage code structure.
Figure 1: CodeAttack makes a slight modification
to the input code snippet (red) which causes significant
changes to the code summary obtained from the SOTA
pre-trained programming language models. Keywords
are highlighted in blue and comments in green.
Adversarial attacks are characterized by imperceptible changes in the
input that result in incorrect
predictions from a neural network. For PL models,
they are important for three primary reasons: (i) Exposing system
vulnerabilities: As a form of stress
test to understand the model’s limitations. For example, adversarial
samples can be used to bypass
a defect detection filter that classifies a given code
as vulnerable or not (Zhou et al., 2019), (ii) Evaluating model
robustness: Analyze the PL model’s
sensitivity to imperceptible perturbations. For example, a small
change in the input programming
language (akin to a typo or a spelling mistake in
the NL scenario) might trigger the code summarization model to
generate a gibberish natural language
code summary (Figure 1), and (iii) Model interpretability: Help
understand what PL models learn.
For example, adversarial samples can be used to
inspect the tokens pre-trained PL models attend to.
A successful adversarial attack for code should
have the following properties: (i) Minimal and imperceptible
perturbations: Akin to spelling mistakes or synonym replacement in NL
that misleads
the neural models, (ii) Code Consistency: Perturbed code is consistent
with the original input,
and (iii) Code fluency: Follows the syntax of the
original programming language. The current NL
adversarial attack models fall short on all three
fronts. Therefore, we propose CodeAttack1
– a simple yet effective black-box attack model
1Code will be made publicly available
arXiv:2206.00052v1 [cs.CL] 31 May 2022
for generating adversarial samples for any input
code snippet, irrespective of the programming language. CodeAttack
operates in a realistic scenario, where the adversary does not have
access to
model parameters but only to the test queries and
the model prediction. CodeAttack uses a pretrained masked CodeBERT PL
model (Feng et al.,
2020) as the adversarial code generator. We leverage the code
structure to generate imperceptible
and effective adversarial attacks through minimal
perturbations constrained to follow the syntax of
the original code. Our primary contributions are as
follows:
• To the best of our knowledge, we are the first
ones to detect the vulnerability of pre-trained programming language
models to adversarial attacks
on different code generation tasks. We propose
a simple yet effective realistic black-box attack
method, CodeAttack, that generates adversarial samples for a code
snippet irrespective of the
input programming language.
• We design a general purpose black-box attack
method for sequence-to-sequence PL models
that is transferable across different downstream
tasks like code translation, repair, and summarization. This can also
be extended to sequenceto-sequence tasks in other domains.
• We demonstrate the effectiveness of
CodeAttack over existing NLP adversarial models through an extensive empirical
evaluation. CodeAttack outperforms the
NLP baselines when considering both the attack
quality and its efficacy.
2 Related Work
Adversarial Attacks in NLP. Adversarial attacks have been used to
analyze the robustness of
NLP models. Black-box adversarial attacks like
BERT-Attack (Li et al., 2020) use BERT, with subword expansion, for
attacking vulnerable words.
BAE (Garg and Ramakrishnan, 2020) also uses
BERT for replacement or insertion around vulnerable words. TextFooler
(Jin et al., 2020) and PWWS
(Ren et al., 2019) use synonyms and part-of-speech
(POS) tagging to replace important tokens. Deepwordbug (Gao et al.,
2018) and TextBugger (Li
et al., 2019) use character insertion, deletion, and
replacement, and constrain their attacks using edit
distance and cosine similarity, respectively. Some
use a greedy search and replacement strategy to
generate adversarial examples (Hsieh et al., 2019;
Yang et al., 2020). Genetic Attack (GA) (Alzantot
et al., 2018) uses genetic algorithm for search with
language model perplexity and word embedding
distance as substitution constraints. Some adversarial models assume
white-box access and use
model gradients to find substitutes for the vulnerable tokens
(Ebrahimi et al., 2018; Papernot et al.,
2016; Pruthi et al., 2019). None of these methods
have been designed specifically for programming
languages, which is more structured than natural
language.
Adversarial Attacks for PL. Yang et al. (2022)
focus on making adversarial examples more natural by using greedy
search and genetic algorithm
for replacement. Zhang et al. (2020) generate adversarial examples by
renaming identifiers using
a Metropolis-Hastings sampling based technique
(Metropolis et al., 1953). Yefet et al. (2020) use
gradient based exploration for attacks. Some also
propose metamorphic transformations to generate
adversarial examples (Applis et al., 2021; Ramakrishnan et al., 2020).
The above models focus on
code understanding tasks like defect detection and
clone detection. Although some works do focus
on generating adversarial examples for code summarization
(Ramakrishnan et al., 2020; Zhou et al.,
2021), they do not talk about the transferability
of these tasks to different tasks and different models. Our model,
CodeAttack, assumes blackbox access to the state-of-the-art PL models
for
generating adversarial attacks for code generation
tasks like code translation, code repair, and code
summarization using a constrained code-specific
greedy algorithm to find meaningful substitutes for
vulnerable tokens.
3 CodeAttack
We describe the capabilities, knowledge, and the
goal of the proposed CodeAttack model, and
provide details on how it detects vulnerabilities in
the state-of-the-art pre-trained PL models.
3.1 Threat Model
Adversary’s Capabilities. The adversary is capable of perturbing the
test queries given as input
to a pre-trained PL model to generate adversarial samples. We follow
the existing literature for
generating natural language adversarial examples
and allow for two types of perturbations for the
input code sequence: (i) token-level perturbations,
and (ii) character-level perturbations. The adversary is allowed to
perturb only a certain number
of tokens/characters and must ensure a high similarity between the
original code and the perturbed
code. Formally, for a given input sequence X ∈ X,
where X is the input space, a valid adversarial example Xadv follows
the requirements:
X 6= Xadv (1)
Xadv ← X + δ; s.t. ||δ|| < θ (2)
Sim(Xadv, X ) ≥ (3)
where θ is the maximum allowed adversarial perturbation; Sim(·) is a
similarity function that takes
into account the syntax of the input code and the
adversarial code sequence; and is the similarity
threshold. We describe the perturbation constraints
and the similarity functions in more detail in Section 3.2.2.
Adversary’s Knowledge. We assume black-box
access to realistically assess the vulnerabilities and
robustness of existing pre-trained PL models. In
this setting, the adversary does not have access to
the model parameters, model architecture, model
gradients, training data, or the loss function. The
adversary can only query the pre-trained PL model
with input sequences and get their corresponding
output probabilities. This is more practical than a
white-box scenario that assumes access to all the
above, which might not always be the case.
Adversary’s Goal. Given an input code sequence as query, the
adversary’s goal is to degrade the quality of the generated output
sequence
through imperceptibly modifying the query. The
generated output sequence can either be a code
snippet (code translation, code repair) or natural
language text (code summarization). Formally,
given a pre-trained PL model F : X → Y , where
X is the input space, and Y is the output space, the
goal of the adversary is to generate an adversarial
sample Xadv for an input sequence X s.t.
F(Xadv) 6= F(X ) (4)
Q(F(X )) − Q(F(Xadv)) ≥ φ (5)
where Q(·) measures the quality of the generated
output and φ is the specified drop in quality. This is
in addition to the constraints applied on Xadv earlier. We formulate
our final problem of generating
adversarial samples as follows:
∆atk = argmaxδ
[Q(F(X )) − Q(F(Xadv))] (6)
In the above optimization equation, Xadv is a minimally perturbed
adversary subject to constraints
on the perturbations δ (Eqs.1-5). CodeAttack
searches for a perturbation ∆atk to maximize the
difference in the quality Q(·) of the output sequence generated from
the original input code snippet X and that by the perturbed code
snippet Xadv.
3.2 Attack Methodology
CodeAttack’s attack methodology can be broken down into two primary
steps: (i) Finding the
most vulnerable tokens, and (ii) Substituting these
vulnerable tokens (subject to code specific constraints), to generate
adversarial samples.
3.2.1 Finding Vulnerable Tokens
Some input tokens contribute more towards the
final prediction than the others, and therefore, ‘attacking’ these
highly influential or highly vulnerable tokens increases the
probability of altering
the model predictions more significantly as opposed to attacking
non-vulnerable tokens. Since
under a black-box setting, the model gradients
are unavailable and the adversary only has access to the output logits
of the pre-trained PL
model. We define ‘vulnerable tokens’ as tokens
that have a high influence on the output logits of the
model. Let F be an encoder-decoder pre-trained
PL model. The given input sequence is denoted by
X = [x1, .., xi
, ..., xm], where {xi}
m
1
are the input
tokens. The output is a sequence of vectors:
O = F(X ) = [o1, ..., on]
yt = argmax(ot)
where {ot}
n
1
is the output logit for the correct output token yt for the time step
t. Without loss of
generality, we can also assume the output sequence
Y = F(X ) = [yi
, ..., yl
]. Y can either be a sequence of code tokens or natural language tokens.
To find the vulnerable input tokens, we replace a token xi with [MASK]
s.t. X\xi =
[x1, .., xi−1, [MASK], xi+1, .., xm] and get its output logits. The
output vectors are now
O\xi = F(X\xi
) = [o
0
1
, ..., o0
q
]
where {o
0
t}
q
1
is the new output logit for the correct
prediction Y. We calculate the influence score for
the token xi as follows:
Ixi =
Xn
t=1
ot −
X
q
t=1
o
0
t
(7)
Token Class Description
Keywords Reserved word
Identifiers Variable, Class Name, Method name
Arguments Integer, Floating point, String, Character
Operators Brackets ({},(),[]), Symbols (+,*,/,-,%,;,.)
Table 1: Token class and their description.
We rank all the input tokens according to their influence score Ixi
in descending order to find most
vulnerable tokens V . We select only the top-k tokens to limit the
number of perturbations and attack them iteratively either by
completely replacing
them or by adding or deleting a character around
them. We explain this in detail below.
3.2.2 Substituting Vulnerable Tokens
We adopt greedy search using a masked programming language model,
subject to code specific constraints, to find substitutes S for
vulnerable tokens
V , s.t. they are minimally perturbed and have the
maximal probability of incorrect prediction.
Search Method. In a given input sequence, we
mask a vulnerable token vi and use the masked
PL model to predict a meaningful contextualised
token in its place. We use the top-k predictions for
each of the masked vulnerable tokens as our initial
search space. Let M denote a masked PL model.
Given an input sequence X = [x1, .., vi
, .., xm],
where vi
is a vulnerable token, M uses WordPiece
algorithm (Wu et al., 2016) for tokenization that
breaks uncommon words into sub-words resulting
in H = [h1, h2, .., hq]. We align and mask all the
corresponding sub-words for vi
, and combine the
predictions to get the top-k substitutes S
0 = M(H)
for the vulnerable token vi
. This initial search
space S
0
consists of l possible substitutes for a
vulnerable token vi
. We then filter out substitute
tokens to ensure minimal perturbation, code consistency, and code
fluency of the generated adversarial
samples, subject to the following constraints.
Constraints. Since the tokens generated from a
masked PL model may not be meaningful individual code tokens, we
further use a CodeNet tokenizer (Puri et al., 2021) to break a token
into
its corresponsing code tokens. CodeNet tokenizes
the input tokens based on four primary code token
classes as shown in Table 1. If si
is the substitute
for the vulnerable token vi as tokenized by M, and
Op(·) denotes the operators present in any given
token using CodeNet tokenizer, we allow the substitute tokens to have
an extra or a missing operator
(akin to making typos).
|Op(vi)| − 1 ≤ |Op(si)| ≤ |Op(vi)| + 1 (8)
If C(·) denotes the code token classes (identifiers,
keywords, and arguments) of a given token, we
maintain the alignment between between vi and
the potential substitute si as follows.
C(vi) = C(si) and |C(vi)| = |C(si)| (9)
These constraints maintain the syntactic structure
of Xadv and significantly reduce the search space.
Substitutions. We allow two types of substitutions to generate
adversarial examples: (i) Tokenlevel substitution, and (ii) Operator
(character)
level substitution where only an operator is added,
replaced, or deleted. We iteratively substitute the
vulnerable tokens with their corresponding substitute
tokens/characters, using the reduced search
space S, until the adversary’s goal is met.
We only allow replacing p% of the vulnerable tokens/characters to keep
perturbations to a
minimum, where p is a hyper-paramter. We also
maintain the cosine similarity between the input
text X and the adversarially perturbed text Xadv
above a certain threshold (Equation 3). The complete algorithm has
been shown in Algorithm 1.
CodeAttack maintains minimal perturbation,
code fluency, and code consistency between the
input and the adversarial code snippet.
4 Experiments
4.1 Downstream Tasks and Datasets
We show the transferability of CodeAttack
across three different downstream tasks and
datasets – all in different programming languages.
Code Translation involves translating one programming language to the
other. The publicly available code translation datasets2345
consists of parallel functions between Java and C#. There are a
total of 11,800 paired functions, out of which 1000
are used for testing. After tokenization, the average
sequence length for Java functions is 38.51 tokens,
and the average length for C# functions is 46.16.
2
http://lucene.apache.org/
3
http://poi.apache.org/
4
https://github.com/eclipse/jgit/
5
https://github.com/antlr/
Algorithm 1 CodeAttack: Generating adversarial examples for Code
Input: Code X ; Victim model F; Maximum perturbation θ; Similarity ;
Performance Drop φ
Output: Adversarial Example Xadv
Initialize: Xadv ← X
// Find vulnerable tokens ‘V’
for xi
in M(X ) do
Calculate Ixi
acc. to Eq.(7)
end
V ← Rank(xi) based on Ixi
// Find substitutes ‘S’
for vi
in V do
S ← Filter(vi) subject to Eqs.(8), (9)
for sj in S do
// Attack the victim model
Xadv = [x1, ..., xi−1, sj , ..., xm]
if Q(F(X )) − Q(F(Xadv)) ≥ φ and
Sim(X , Xadv) ≥ and ||Xadv − X || ≤ θ
then
return Xadv // Success
end
end
// One perturbation
Xadv ← [x1, ...xi−1, sj , ..xm]
end
return
Code Repair refines code by automatically fixing bugs. The publicly
available code repair dataset
(Tufano et al., 2019) consists of buggy Java functions as source and
their corresponding fixed functions as target. We use the small subset
of the data
with 46,680 train, 5,835 validation, and 5,835 test
samples (≤ 50 tokens in each function).
Code Summarization involves generating natural language summary for a
given code. We use
the CodeSearchNet dataset (Husain et al., 2019)
which consists of code and their corresponding
summaries in natural language. We show the results of our model on
Python (252K/14K/15K),
Java (165K/5K/11K), and PHP (241K/13K/15K).
The numbers in the bracket denote the approximate
samples in train/development/test set, respectively.
4.2 Victim Models
We pick a representative method from different
categories as our victim models to attack.
• CodeT5 (Wang et al., 2021): A unified pretrained encoder-decoder
transformer-based PL
model that leverages code semantics by using
an identifier-aware pre-training objective. This
is the state-of-the-art on several sub-tasks in the
CodeXGlue benchmark (Lu et al., 2021).
• CodeBERT (Feng et al., 2020): A bimodal pretrained programming
language model that performs code-code and code-nl tasks.
• GraphCodeBert (Guo et al., 2020): Pre-trained
graph programming language model that leverages code structure through
data flow graphs.
• RoBERTa (Liu et al., 2019): Pre-trained natural language model with
state-of-art results on
GLUE (Wang et al., 2018), RACE (Lai et al.,
2017), and SQuAD (Rajpurkar et al., 2016)
datasets.
For our experiments, we use the publicly available fine-tuned
checkpoints for CodeT5 and finetune CodeBERT, GraphCodeBERT, and
RoBERTa
on the related downstream tasks.
4.3 CodeAttack Configurations
The proposed CodeAttack model is implemented in PyTorch. For the
purpose of our experiments, we use the publicly available pre-trained
CodeBERT (MLM) masked PL model as the adversarial example generator.
We select the top 50
predictions for each vulnerable token as the initial
search space. On an average, we only attack at
2 to 4 vulnerable tokens for all the tasks to keep
the perturbations to a minimum. The cosine similarity threshold
between the original code snippet and adversarially generated code is
0.5. Since
CodeAttack does not require any training, we
attack the victim models on the test set using a
batch-size of 256. All experiments were conducted
on a 48 GiB RTX 8000 GPU.
4.4 Evaluation Metric
Downstream Performance. We measure the
downstream performance using CodeBLEU (Ren
et al., 2020) and BLEU (Papineni et al., 2002) before and after the
attack. CodeBLEU measures
the quality of the generated code snippet for code
translation and code repair, and BLEU measures
the quality of the generated natural language code
summary. To measure the efficacy of the attack
model, we define
∆drop = Qbefore − Qafter = Q(Y) − Q(Yadv)
where Q = {CodeBLEU, BLEU}, Y is the output
sequence generated from the original code X , and
Task Victim
Model
Attack
Model
Downstream Performance Attack Quality Overall
Before After ∆ (GMean) drop Attack% #Query CodeBLEUq
Translate
(CodeCode)
CodeT5
TextFooler
73.99
68.08 5.91 28.29 94.95 63.19 21.94
BERT-Attack 48.59 25.40 83.12 186.1 51.11 47.61
CodeAttack 61.72 12.27 89.3 36.84 65.91 41.64
CodeBERT
TextFooler
71.16
60.45 10.71 49.2 73.91 66.61 32.74
BERT-Attack 58.80 12.36 97.1 48.76 59.90 41.58
CodeAttack 54.14 17.03 97.7 26.43 66.89 48.09
GraphCodeBERT
Textfooler
66.80
46.51 20.29 38.70 83.17 63.62 36.83
BERT-Attack 36.54 30.26 97.33 41.30 57.41 55.30
CodeAttack 38.81 27.99 98 20.60 65.39 56.39
Repair
(CodeCode)
CodeT5
Textfooler
61.13
57.59 3.53 58.84 90.50 69.53 24.36
BERT-Attack 52.70 8.42 98.3 74.99 55.94 35.79
CodeAttack 53.21 7.92 99.36 30.68 69.03 37.87
CodeBERT
Textfooler
61.33
53.55 7.78 81.61 45.89 68.16 35.11
BERT-Attack 51.95 9.38 98.3 74.99 55.94 37.22
CodeAttack 52.02 9.31 99.39 25.98 68.05 39.78
GraphCodeBERT
Textfooler
62.16
54.23 7.92 78.92 51.07 67.89 34.89
BERT-Attack 53.33 8.83 99.4 62.59 56.05 36.64
CodeAttack 51.97 10.19 99.52 24.67 66.16 40.63
Summarize
(Code-NL)
CodeT5
TextFooler
20.06
14.96 5.70 64.6 410.15 53.91 27.08
BERT-Attack 11.96 8.70 90.4 1006.28 51.34 34.30
CodeAttack 11.06 9.59 82.8 314.87 52.67 34.71
CodeBERT
Textfooler
19.76
14.38 5.37 61.1 358.43 54.10 26.10
BERT-Attack 11.30 8.35 93.74 695.03 50.31 34.16
CodeAttack 10.88 8.87 88.32 204.46 52.95 34.62
RoBERTa
TextFooler
19.06
14.06 4.99 62.6 356.68 54.11 25.67
BERT-Attack 11.34 7.71 94.15 701.01 50.10 33.14
CodeAttack 10.98 8.08 87.51 183.22 53.03 33.47
Table 2: Results for adversarial attack on translation (C#-Java),
repair (Java-Java), and summarization (PHP)
tasks. The downstream performance for Code-Code tasks is measured in
CodeBLEU; and for Code-NL task in
BLEU. The best result is in boldface; the next best is underlined.
Overall CodeAttack outperforms significantly
(p < 0.05).
Yadv is the sequence generated from the perturbed
code Xadv.
Attack Quality. We automatically measure the
attack quality using the following.
• Attack %: Computes the % of successful attacks
as measured by the ∆drop. Higher the value,
more successful the attack.
• # Query: Under a black-box setting, the adversary can query the
victim model to check for
changes in the output logits. Lower the average
number of queries required per sample, more
efficient the adversary.
• # Perturbation: The number of tokens perturbed on average to
generate an adversarial code.
Lower the value, more imperceptible the attack.
To measure the quality of the perturbed code, we
calculate CodeBLEUq = CodeBLEU(X , Xadv).
Higher the CodeBLEUq, better the quality of the
adversarial code. Since we want the ∆drop to be
as high as possible while maintaining the attack
% and CodeBLEUq, we measure the geometric
mean (GMean) between ∆drop, attack%, and the
CodeBLEUq to measure the overall performance.
4.5 Results
The results for attacking pre-trained PL models
for (i) Code Translation, (ii) Code Repair, and (iii)
Code Summarization are shown in Table 2. Due
to lack of space, we only show results for the C#
to Java translation task and for the PHP code summarization task
(refer to Appendix A for results
on Java-C# translation and code summarization
results on Python and Java). We use the metrics described in Section
4.4 and compare our model with
two state-of-the-art adversarial NLP baselines: (i)
TextFooler (Jin et al., 2020), and (ii) BERT-Attack
(Li et al., 2020).
Downstream Performance Drop. The average
∆drop using CodeAttack is at least 20% for code
Original Code TextFooler BERT-Attack CodeAttack
public string GetFullMessage
() {
...
if (msgB < 0){return string
.Empty;}
...
return RawParseUtils.Decode
(enc, raw, msgB, raw.
Length);
}
citizenship string
GetFullMessage() {
...
if (msgB < 0){return string
.Empty;}
...
return RawParseUtils.Decode
(enc, raw, msgB, raw.
Length);
}
loop string GetFullMessage()
{
...
if (msgB < 0){return string
.Empty;}
...
return [UNK][UNK].[UNK](x)
raw, msgB, raw.Length
);
}
public string GetFullMessage
() {
...
if (msgB = 0){return string
.Empty;}
...
return RawParseUtils.Decode
(enc, raw, msgB, raw.
Length);
}
CodeBLEUbefore: 77.09 ∆drop: 18.84; CodeBLEUq: 95.11 ∆drop: 15.09;
CodeBLEUq: 57.46 ∆drop: 21.04; CodeBLEUq: 88.65
public override void
WriteByte(byte b) {
if (outerInstance.upto ==
outerInstance.
blockSize) {
... }
}
audiences revoked canceling
WriteByte(byte b) {
if (outerInstance.upto ==
outerInstance.
blockSize) {
.... }
}
public override void
[UNK][UNK]() b) {
if (outerInstance.upto ==
outerInstance.
blockSize) {
... }
}
public override void
WriteByte((bytes b) {
if (outerInstance.upto ==
outerInstance.
blockSize) {
... }
}
CodeBLEUbefore:100 ∆drop:5.74; CodeBLEUq: 63.28 ∆drop:27.26;
CodeBLEUq:49.87 ∆drop:20.04; CodeBLEUq: 91.69
Table 3: Qualitative examples of perturbed codes using TextFooler,
BERT-Attack, and CodeAttack on Code
Translation task.
(a) CodeBLEUafter (b) CodeBLEUq (c) Average #Query (d) Attack%
Figure 2: Effectiveness of the attack models on CodeT5 for the code
translation task (C#-Java).
translation task and 10% for both code repair task
and code summarization tasks for all three pretrained models. ∆drop is
higher for BERT-Attack
for translation and repair tasks but its attack quality
(described later) is the lowest. CodeAttack has
the best ∆drop for summarization.
Attack Quality. We observe that CodeAttack
has the highest attack success % for code translation and the code
repair tasks; and the second
best success rate for the code summarization task.
CodeAttack is more efficient as it has the lowest average query number
per sample. This shows
that it successfully attacks more samples with less
querying. Table 3 presents some qualitative examples of the generated
adversarial code snippets
from different attack models. TextFooler has the
best CodeBLEUq (as seen in Table 2) but it replaces keywords with
closely related natural language words (‘public’:
‘citizenship’/‘audiences’;
‘override’: ‘revoked’, ‘void’: ‘cancelling’). BERTAttack has the
lowest CodeBLEUq and substitutes
tokens with either a special ‘[UNK]’ token or with
other seemingly random words. This is expected
since both TextFooler and BERT-Attack have not
been designed for programming languages. On
the other hand, although CodeAttack has the
second best CodeBLEUq, it generates more meaningful adversarial
samples by replacing variables
and operators which are imperceptible.
Effectiveness. To study the effectiveness of
CodeAttack, we limit the # perturbations. (Figure 2). From Figure 2a,
we observe that as the
perturbation % increases, the CodeBLEUafter for
CodeAttack decreases but remains constant for
TextFooler and slightly increases for BERT-Attack.
We also observe that although CodeBLEUq for
CodeAttack is the second best (Figure 2b), it
has the highest attack success rate (Figure 2d)
and the lowest number of required queries (Figure 2c) throughout. This
shows the efficiency of
CodeAttack and the need for code specific adversarial attacks.
Overall Performance. Overall, CodeAttack
has the best performance when we consider the
geometric mean (GMean) between ∆drop, attack
%, and CodeBLEUq together. These results are
generalizable across different input programming
languages and different downstream tasks (C# in
case of code translation; Java in case of code repair,
PHP in case of code summarization).
4.6 Ablation Study
We conduct an ablation study to evaluate the importance of selecting
vulnerable tokens (V) and
applying constraints (C) to maintain the syntax of
(a) Performance Drop (b) CodeBLEUq (c) # Query (d) Average Success Rate
Figure 3: Ablation Study for Code Translation (C#-Java): Performance
of CodeAttack with (+) and without (-)
the vulnerable tokens (V) and the two constraints (C): (i) Operator
level (C1), and (ii) Token level (C2).
the perturbed code. Figure 3 shows the results for
the ablation study on the code translation task from
C#-Java. See Appendix A for qualitative examples.
Importance of Vulnerable Tokens. We define
a variant, CodeAttack+V-C, which finds vulnerable tokens based on
logit information (Section 3.2.1) and subsitutes them, albeit without
any constraints. We create another variant,
CodeAttack-V-C, which randomly samples tokens from the input code to
attack. As can be seen
from Figure 3a, the latter attack is not as effective as the ∆drop is
less than the former for the
same CodeBLEUq (Figure 3b) and the attack%
(Figure 3d).
Importance of Constraints. We substitute the
vulnerable tokens using the predictions from a
masked PL model with (+C) and without (-C)
any code specific constraints. We apply two
types of constraints: (i) Operator level constraint (CodeAttack+V+C1),
and (ii) Token
level constraint (CodeAttack +V+C1+C2) (Section 3.2.2). Only applying
the first constraint results in lower attack success % (Figure 3d) and
∆drop (Figure 3a) but a much higher CodeBLEUq.
On applying both the constraints together, the
∆drop and the attack success % improve. Overall,
the final model, CodeAttack+V+C1+C2, has
the best tradeoff between the ∆drop, attack success
%, CodeBLEUq, and #Queries required.
Human Evaluation. We sample 50 original and
perturbed Java and C# code samples and shuffle
them to create a mix. We ask 3 human annotators,
familiar with the two programming languages, to
classify the code as either original or adversarial.
We also ask them to rate the syntactic correctness of
the codes on a scale of 1 to 5; where 1 is completely
incorrect syntax; and 5 is the perfect syntax. On
an average, 72.10% of the codes were classified
as original and the average syntactic correctness
was 4.14 for the adversarial code. Additionally, we
provided the annotators with pairs of original and
adversarial codes and asked them to rate the ’visual’
similarity between them using 0 to 1; where 0 is
not similar at all, 0.5 is somewhat similar, and 1 is
very similar. On average, the similarity was 0.71.
5 Discussions and Limitations
We observe that it is easier to attack the code translation task than
the code repair or code summarization tasks. Since code repair aims to
fix bugs
in the given code snippet, attacking it is more
challenging. For code summarization, the BLEU
score drops by almost 50%. For all three tasks,
CodeT5 is the most robust whereas GraphCodeBERT is the most
susceptible to attacks using
CodeAttack. CodeT5 has been pre-trained on
the task of Masked Identifier Prediction or deobsfuction (Lachaux et
al., 2021) where changing the
identifier names does not have an impact on the
code semantics. This helps the model avoid the attacks which involve
changing the identifier names,
and in turn makes it more robust. GraphCodeBERT,
on the other hand, uses data flow graphs in their pretraining which
relies on the predicting the relationship between the identifiers.
Since CodeAttack
modifies the identifiers and perturbs the relationship between them,
it proves extremely effective
on GraphCodeBERT. This results in a more significant ∆drop on
GraphCodeBERT compared to other
models for the code translation task.
CodeAttack, although effective has a few limitations. These
adversarial attacks can be avoided if
the pre-trained models choose to compile the input
code before processing. The PL models can also
be made more robust by either additionally pretraining or fine-tuning
them using the generated
adversarial examples. Incorporating more tasks
such as code obfuscation in the pre-training stage
might also help with the robustness of the models.
6 Conclusion
We introduce a black-box adversarial attack model,
CodeAttack, to detect vulnerabilities of the
state-of-the-art programming language models.
CodeAttack finds the most vulnerable tokens
in the given code snippet and uses a greedy search
mechanism to identify contextualised substitutes
subject to code-specific constraints. Our model incorporates the
syntactic information of the input
code to generate adversarial examples that are effective,
imperceptible, maintain code fluency, and
consistency. We perform an extensive empirical
and human evaluation to demonstrate the transferability of CodeAttack
on several code-code and
code-NL tasks across different programming languages. CodeAttack
outperforms the existing
state-of-the-art adversarial NLP models when both
the performance drop and the attack quality are
taken together. CodeAttack uses fewer queries
and is more efficient, highlighting the need for
code-specific adversarial attacks.
References
Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and
Kai-Wei Chang. 2021. Unified pre-training for program understanding
and generation. In Proceedings
of the 2021 Conference of the North American Chapter of the
Association for Computational Linguistics:
Human Language Technologies, pages 2655–2668.
Moustafa Alzantot, Yash Sharma, Ahmed Elgohary,
Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang.
2018. Generating natural language adversarial examples. In Proceedings
of the 2018 Conference on
Empirical Methods in Natural Language Processing,
pages 2890–2896.
Leonhard Applis, Annibale Panichella, and Arie van
Deursen. 2021. Assessing robustness of ml-based
program analysis tools using metamorphic program
transformations. In 2021 36th IEEE/ACM International Conference on
Automated Software Engineering (ASE), pages 1377–1381. IEEE.
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing
Dou. 2018. Hotflip: White-box adversarial examples for text
classification. In Proceedings of the
56th Annual Meeting of the Association for Computational Linguistics
(Volume 2: Short Papers), pages
31–36.
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming
Gong, Linjun Shou, Bing Qin,
Ting Liu, Daxin Jiang, et al. 2020. Codebert: A
pre-trained model for programming and natural languages. In Findings
of the Association for Computational Linguistics: EMNLP 2020, pages
1536–1547.
Ji Gao, Jack Lanchantin, Mary Lou Soffa, and Yanjun Qi. 2018.
Black-box generation of adversarial
text sequences to evade deep learning classifiers. In
2018 IEEE Security and Privacy Workshops (SPW),
pages 50–56. IEEE.
Siddhant Garg and Goutham Ramakrishnan. 2020.
Bae: Bert-based adversarial examples for text classification. In
Proceedings of the 2020 Conference on
Empirical Methods in Natural Language Processing
(EMNLP), pages 6174–6181.
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu
Tang, LIU Shujie, Long Zhou, Nan Duan, Alexey
Svyatkovskiy, Shengyu Fu, et al. 2020. Graphcodebert: Pre-training
code representations with data
flow. In International Conference on Learning Representations.
Yu-Lun Hsieh, Minhao Cheng, Da-Cheng Juan, Wei
Wei, Wen-Lian Hsu, and Cho-Jui Hsieh. 2019. On
the robustness of self-attentive models. In Proceedings of the 57th
Annual Meeting of the Association
for Computational Linguistics, pages 1520–1529.
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis
Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge:
Evaluating the state of semantic code search. arXiv preprint
arXiv:1909.09436.
Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter
Szolovits. 2020. Is bert really robust? a strong baseline for natural
language attack on text classification
and entailment. In Proceedings of the AAAI conference on artificial
intelligence, volume 34, pages
8018–8025.
Marie-Anne Lachaux, Baptiste Roziere, Marc
Szafraniec, and Guillaume Lample. 2021. Dobf: A
deobfuscation pre-training objective for programming languages.
Advances in Neural Information
Processing Systems, 34.
Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang,
and Eduard Hovy. 2017. Race: Large-scale reading
comprehension dataset from examinations. In Proceedings of the 2017
Conference on Empirical Methods in Natural Language Processing, pages
785–
794.
J Li, S Ji, T Du, B Li, and T Wang. 2019. Textbugger:
Generating adversarial text against real-world applications. In 26th
Annual Network and Distributed
System Security Symposium.
Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue,
and Xipeng Qiu. 2020. Bert-attack: Adversarial attack against bert
using bert. In Proceedings of the
2020 Conference on Empirical Methods in Natural
Language Processing (EMNLP), pages 6193–6202.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi
Chen, Omer Levy, Mike Lewis,
Luke Zettlemoyer, and Veselin Stoyanov. 2019.
Roberta: A robustly optimized bert pretraining approach. arXiv
preprint arXiv:1907.11692.
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey
Svyatkovskiy, Ambrosio Blanco, Colin B. Clement,
Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou,
Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel
Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie
Liu. 2021. Codexglue: A machine learning benchmark dataset for code
understanding and generation.
CoRR, abs/2102.04664.
Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth,
Augusta H Teller, and Edward
Teller. 1953. Equation of state calculations by
fast computing machines. The journal of chemical
physics, 21(6):1087–1092.
Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian
Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma,
Tom Brown, Aurko Roy,
et al. 2016. Technical report on the cleverhans
v2. 1.0 adversarial examples library. arXiv preprint
arXiv:1610.00768.
Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002.
Bleu: a method for automatic evaluation of machine translation. In
Proceedings of
the 40th Annual Meeting of the Association for Computational
Linguistics, pages 311–318, Philadelphia,
Pennsylvania, USA. Association for Computational
Linguistics.
Danish Pruthi, Bhuwan Dhingra, and Zachary C Lipton. 2019. Combating
adversarial misspellings with
robust word recognition. In Proceedings of the
57th Annual Meeting of the Association for Computational Linguistics,
pages 5582–5591.
Ruchir Puri, David S Kung, Geert Janssen, Wei
Zhang, Giacomo Domeniconi, Vladmir Zolotov, Julian Dolby, Jie Chen,
Mihir Choudhury, Lindsey
Decker, et al. 2021. Project codenet: a large-scale
ai for code dataset for learning a diversity of coding
tasks.
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and
Percy Liang. 2016. SQuAD: 100,000+ questions for
machine comprehension of text. In Proceedings of
the 2016 Conference on Empirical Methods in Natural Language
Processing, pages 2383–2392, Austin,
Texas. Association for Computational Linguistics.
Goutham Ramakrishnan, Jordan Henkel, Zi Wang,
Aws Albarghouthi, Somesh Jha, and Thomas Reps.
2020. Semantic robustness of models of source code.
arXiv preprint arXiv:2002.03043.
Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che.
2019. Generating natural language adversarial examples through
probability weighted word saliency.
In Proceedings of the 57th Annual Meeting of the
Association for Computational Linguistics, pages
1085–1097, Florence, Italy. Association for Computational Linguistics.
Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie
Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai
Ma. 2020. Codebleu: a
method for automatic evaluation of code synthesis.
arXiv preprint arXiv:2009.10297.
Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta,
Martin White, and Denys Poshyvanyk. 2019. An empirical study on
learning bugfixing patches in the wild via neural machine translation.
ACM Transactions on Software Engineering
and Methodology (TOSEM), 28(4):1–29.
Alex Wang, Amanpreet Singh, Julian Michael, Felix
Hill, Omer Levy, and Samuel Bowman. 2018. Glue:
A multi-task benchmark and analysis platform for
natural language understanding. In Proceedings
of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting
Neural Networks for NLP,
pages 353–355.
Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH
Hoi. 2021. Codet5: Identifier-aware unified pretrained encoder-decoder
models for code understanding and generation. In Proceedings of the
2021
Conference on Empirical Methods in Natural Language Processing, pages 8696–8708.
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V
Le, Mohammad Norouzi, Wolfgang Macherey,
Maxim Krikun, Yuan Cao, Qin Gao, Klaus
Macherey, et al. 2016. Google’s neural machine
translation system: Bridging the gap between human and machine
translation. arXiv preprint
arXiv:1609.08144.
Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling
Wang, and Michael I Jordan. 2020. Greedy attack
and gumbel attack: Generating adversarial examples
for discrete data. J. Mach. Learn. Res., 21(43):1–36.
Zhou Yang, Jieke Shi, Junda He, and David Lo. 2022.
Natural attack for pre-trained models of code. arXiv
preprint arXiv:2201.08698.
Noam Yefet, Uri Alon, and Eran Yahav. 2020. Adversarial examples for
models of code. Proceedings of the ACM on Programming Languages,
4(OOPSLA):1–30.
Huangzhao Zhang, Zhuo Li, Ge Li, Lei Ma, Yang Liu,
and Zhi Jin. 2020. Generating adversarial examples for holding
robustness of source code processing models. In Proceedings of the
AAAI Conference
on Artificial Intelligence, volume 34, pages 1169–
1176.
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning
Du, and Yang Liu. 2019. Devign: Effective vulnerability identification
by learning comprehensive
program semantics via graph neural networks. Advances in neural
information processing systems, 32.
Yu Zhou, Xiaoqing Zhang, Juanjuan Shen, Tingting
Han, Taolue Chen, and Harald Gall. 2021. Adversarial robustness of
deep code comment generation.
arXiv preprint arXiv:2108.00213.
A Appendix
A.1 Results
Downstream Performance and Attack Quality
We measure the BLEU, ∆BLEU , EM ∆EM , CodeBLEU, and ∆CodeBLEU to
measure the downstream performance for code-code tasks (code repair
and code translation). The programming languages used are C#-Java and
Java-C# for translation tasks; and Java for code repair tasks (Table 4
and Table 5). We measure code-NL task for code
summarization in BLEU and ∆BLEU . We show the
results for three programming languages: Python,
Java, and PHP (Table 6). We measure the quality of
the attacks using the metric defined in 4.4 and additionally show
BLEUq which measures the BLEU
score between the original and the perturbed code.
The results follow a similar pattern as that seen in
Section 4.5.
Ablation Study: Qualitative Analysis Table 7
shows the adversarial examples generated using the
variants described in Section 4.6.
Task Victim Attack BLEU (∆BLEU ) EM (∆EM) CodeBLEU (∆CB)
Java-C#
CodeT5
Original 85.37 67.9 87.03
TextFooler 77.47 (7.9) 47.0 (20.9) 79.83 (7.19)
BERT-Attack 59.38 (25.99) 13.2 (54.7) 66.92 (20.11)
CodeAttack 64.85 (20.52) 5.2 (62.7) 68.81 (18.21)
CodeBERT
Original 81.81 62.5 83.48
TextFooler 71.26 (10.55) 30.4 (32.1) 73.52 (9.95)
BERT-Attack 54.25 (27.56) 3.1 (59.4) 54.69 (28.79)
CodeAttack 65.31* (16.5) 9 (53.5) 66.99* (16.49)
GraphCodeBERT
Original 80.35 59.4 82.4
TextFooler 72.15 (8.2) 35.24 (24.16) 74.32 (8.07)
BERT-Attack 54.54 (25.81) 2.4 (57) 54.47 (27.93)
CodeAttack 62.35* (18) 9.4 (50) 64.87 (17.52)
C#-Java
CodeT5
Original 81.54 70.6 73.99
TextFooler 72.62 (8.92) 50 (20.6) 68.08 (5.91)
BERT-Attack 45.71 (35.83) 15.54 (55.06) 48.59 (25.40)
CodeAttack 58.05 (23.49) 11 (59.6) 61.72 (12.27)
CodeBERT
Original 77.24 62 71.16
TextFooler 67.31 (9.93) 34 (28) 60.45 (10.71)
BERT-Attack 48.74 (28.5) 2 (60) 58.80 (12.36)
CodeAttack 59.41 (17.83) 2.4 (59.6) 54.14 (17.02)
GraphCodeBERT
Original 70.97 56.5 66.80
TextFooler 62.49 (8.48) 37.33 (19.17) 46.51 (20.29)
BERT-Attack 43.2 (55.39) 1.11 (30.26) 36.54 (27.77)
CodeAttack 48.83 (22.14) 2.6 (53.9) 38.81 (27.99)
Repair
CodeT5
Original 78.11 19.9 61.13
TextFooler 73.23 (4.88) 1.4(18.5) 57.59 (3.53)
BERT-Attack 65.86 (12.25) 1.2 (18.7) 52.70 (8.42)
CodeAttack 66.1 (12.01) 1.28 (18.62) 53.21 (7.92)
CodeBERT
Original 78.66 15.64 61.33
TextFooler 67.06 (11.6) 4.48 (11.16) 53.55 (7.78)
BERT-Attack 60.52 (18.14) 3.4 (12.24) 51.95 (9.38)
CodeAttack 61.23 (17.43) 4.05 (11.59) 52.02 (9.31)
GraphCodeBERT
Original 79.73 15.05 62.16
TextFooler 66.19 (13.54) 2.5 (12.55) 54.23 (7.92)
BERT-Attack 64.1 (15.63) 3.5 (11.55) 53.33 (8.83)
CodeAttack 64.7 (15.03) 4.38 (10.67) 51.97 (10.19)
Table 4: Downstream Performance: Code-Code Tasks. Best result in bold
and the next best is undelined.
Task Victim Attack Attack% Query# BLEUq CodeBLEUq
Java-C#
CodeT5
TextFooler 32.3 62.9 78.95 81.28
BERT-Attack 85.6 112.5 54.95 69.48
CodeAttack 94.8 19.85 68.08 75.21*
CodeBERT
TextFooler 55.9 38.57 82.75 83.93
BERT-Attack 95.39 46.09 54.75 76.18
CodeAttack 91.1 24.42 66.89 76.77*
GraphCodeBERT
TextFooler 51.21 39.33 82.41 82.45
BERT-Attack 96.2 38.29* 53.46 73.55
CodeAttack 90.8* 23.22 68.42 77.33
C#-Java
CodeT5
TextFooler 28.29 94.95* 69.47 63.19*
BERT-Attack 83.12 186.1 41.45 51.11
CodeAttack 89.3 36.84 67.09 65.91
CodeBERT
TextFooler 49.2 73.91 73.19 66.61
BERT-Attack 97.1 48.76 52.33 59.90
CodeAttack 97.7 26.43 66.24 66.89
GraphCodeBERT
TextFooler 38.70 83.17 70.61 63.62
BERT-Attack 97.33 41.30 55.71 57.41
CodeAttack 98 20.60 68.07 65.39
Repair
CodeT5
TextFooler 58.84 90.50 93.37 69.53
BERT-Attack 98.1 121.1 80.56 55.49
CodeAttack 99.36 30.68 88.95* 69.03*
CodeBERT
TextFooler 81.61 45.89 91.82 68.16
BERT-Attack 98.3 74.99 80.7 55.94
CodeAttack 99.39 25.98 87.83 68.05
GraphCodeBERT
TextFooler 78.92 51.07 91.4 67.89
BERT-Attack 99.4 62.59 81.73 56.05
CodeAttack 99.52 24.67 85.68 66.16
Table 5: Attack Quality: Code-Code Tasks. Best result in bold and the
next best is underlined.
Task Victim Attack BLEU (∆BLEU ) Attack% Query# BLEUq CodeBLEUq
Java
CodeT5
Original 19.77
TextFooler 14.06 (5.71) 67.8 291.82 75.33 92.82
BERT-Attack 11.94 (7.82) 93.34 541.43 54.47 48.35
CodeAttack 11.21 (8.56) 80.8 198.11 68.51 90.04
CodeBERT
Original 17.65
TextFooler 16.84 (1.20) 42.4 400.78 65.88 90.29
BERT-Attack 12.87 (4.77) 84.6 826.71 34.51 83.82
CodeAttack 14.69 (2.85) 73.7 340.99 59.74 59.37
RoBERTa
Original 16.47
TextFooler 13.23 (3.23) 44.9 383.36 67.9 90.87
BERT-Attack 12.67 (3.8) 72.93 901.01 28.18 42.09
CodeAttack 11.74(4.73) 50.14 346.07 32.63 48.48
PHP
CodeT5
Original 20.66
TextFooler 14.96 (5.70) 64.6 410.15 78.11 53.91
BERT-Attack 11.96 (8.70) 90.4 1006.28 49.3 51.34
CodeAttack 11.06 (9.59) 82.8 314.87 66.02 52.67
CodeBERT
Original 19.76
TextFooler 14.38 (5.37) 61.1 358.43 79.81 54.10
BERT-Attack 11.30 (8.45) 93.74 695.03 50.77 50.31
CodeAttack 10.88 (8.87) 88.32 204.46 69.11 52.95
RoBERTa
Original 19.06
TextFooler 14.06 (4.99) 62.6 356.68 79.73 54.11
BERT-Attack 11.34 (7.71) 94.15 701.01 51.49 50.10
CodeAttack 10.98 (8.08) 87.51 183.22 70.33 53.03
Python
CodeT5
Original 20.26
TextFooler 12.11 (8.24) 90.47 400.06 86.84 77.59
BERT-Attack 8.22 (12.13) 99.81 718.07 77.11 64.66
CodeAttack 7.79 (12.38) 98.50 174.05 87.04 69.17
CodeBERT
Original 78.66
TextFooler 20.76 (5.40) 68.5 966.19 76.56 75.15
BERT-Attack 18.95 (7.21) 93.72 1414.67 55.22 52.31
CodeAttack 18.69 (7.47) 86.63 560.58 63.84 59.11
RoBERTa
Original 17.01
TextFooler 10.72 (6.29) 63.34 788.25 70.48 74.05
BERT-Attack 10.66 (6.35) 89.64 1358.85 51.74 56.75
CodeAttack 9.5 (7.51) 76.09 661.75 55.45 61.22
Table 6: Downstream Performance and Attack Quality on Code-NL
(Summarization) Task for different programming languages. Best result
in bold and the next best is underlined.
Original Code CodeAttack+V-C CodeAttack+V+C1 CodeAttack+V+C1+C2
public void AddMultipleBlanks
(MulBlankRecord mbr) {
for (int j = 0; j < mbr.
NumColumns; j++) {
BlankRecord br = new
BlankRecord();
br.Column = j + mbr.
FirstColumn;
br.Row = mbr.Row;
br.XFIndex = (mbr.GetXFAt
(j));
InsertCell(br);
}
}
((void AddMultipleBlanks(
MulBlankRecord mbr) {
for (int j ? 0; j < mbr
.NumColumns; j++)
{
BlankRecord br = new
BlankRecord();
br.Column = j + mbr.
FirstColumn;
br.Row = mbr.Row;
br.XFIndex = (mbr.
GetXFAt(j));
InsertCell(br);
}
}
public void AddMultipleBlanks
(MulBlankRecord mbr) {
for (int j > 0; j < mbr.
NumColumns; j++) {
BlankRecord br = -new
BlankRecord();
br.Column = j + mbr.
FirstColumn;
br.Row = mbr.Row;
br.XFIndex > (mbr.GetXFAt
(j));
InsertCell(br);
}
}
static void AddMultipleBlanks
(MulBlankRecord mbr) {
for (int j > 0; jj < mbr.
NumColumns; j++) {
BlankRecord br = new
BlankRecord();
br.Column = j + mbr.
FirstColumn;
br.Row = mbr.Row;
br.XFIndex = (mbr.GetXFAt
(j));
InsertCell(br);
}
}
CodeBLEUbefore: 76.3 ∆drop: 7.21; CodeBLEUq: 43.85 ∆drop: 5.85;
CodeBLEUq: 69.61 ∆drop: 12.96; CodeBLEUq: 59.29
public string GetFullMessage
() {
byte[] raw = buffer;
int msgB = RawParseUtils.
TagMessage(raw, 0);
if (msgB < 0) {
return string.Empty;
}
Encoding enc =
RawParseUtils.
ParseEncoding(raw);
return RawParseUtils.Decode
(enc, raw, msgB, raw.
Length);
}
˘0120public string
GetFullMessage() {
byte[] raw = buffer;
int msgB = RawParseUtils.
TagMessage(raw, 0);
if (msgB < 0) {
return string.Empty;
}
Encoding enc =
RawParseUtils.
ParseEncoding(raw);
return RawParseUtils.Decode
(enc, RAW.., msgB,
raw.Length);
}
public string GetFullMessage
() {
byte[] raw = buffer;
int msgB = RawParseUtils.
TagMessage(raw, 0);
if (msgB = 0) {
return string.Empty;
}
Encoding enc =
RawParseUtils.
ParseEncoding(raw);
return RawParseUtils.Decode
(enc, raw, msgB, raw.
Length);
}
static string GetFullMessage
() {
byte[] raw = buffer;
int msgB = RawParseUtils.
TagMessage(raw, 0);
if (msgB < 0 {
return string.Empty;
}
Encoding enc =
RawParseUtils.
ParseEncoding(raw);
return RawParseUtils.Decode
(enc, raw, MsgB,raw.
Length);
}
CodeBLEUbefore:77.09 ∆drop: 10.42; CodeBLEUq: 64.19 ∆drop: 21.93;
CodeBLEUq: 87.25 ∆drop: 22.8; CodeBLEUq: 71.30
Table 7: Qualitative examples for the ablation study on CodeAttack:
Attack vulnerable tokens (V) without any
constraints (-C), with operator level constraints (+C1), and with
token level (+C2) contraints on code translation
task.
1
0
xNY.io Bank.org Response to Justice Department Concerns about Potentially Illegal Interlocking Directorates
by Gunnar Larson 21 Oct '22
by Gunnar Larson 21 Oct '22
21 Oct '22
Dear Madam or Sir:
xNY.io - Bank.org is inspired by the DOJ's recent innovative approach to
interlocking directors.
>From our New York and Cyprus headquarters, xNY.io - Bank.org embodies the
notion that rather than a merry-go-round, the process of change and
innovation is like a slide.
Today’s memo aims to signal xNY.io - Bank.org's market experience and
correspondence with the SpaceX Corporation directorate who potentially has
interlocked with aim to manipulate markets that could impact cross-border
digital asset innovation, free speech in the United States of America, the
global ESG economy and Moon exploration.
1: We do not see this as a stereotypical democrat versus republican battle.
xNY.io - Bank.org has furher asked SpaceX to disclose a potential dicector
enterprise, self titled the "PayPal Mafia."
2: To be very clear, we have communicated with SpaceX and Twitter's
prospective LBO dealmakers.
3: Our concern specific to the intent of Twitter’s LBO deal makers is party
to our work in New York to usurp Goldman Sachs and JP Morgan Chase board
directors who may be potentially engaged in similar market manipulation
(referenced attached):
https://docs.google.com/document/d/1lv9Wt6y1ld4bNap0z3epIxttEGiP_oVpfxH0GTS…
xNY.io - Bank.org aims to earn the Justice Department's dialogue on
protecting digital asset innovation cross border. Meanwhile, we plan to
launch a full cort press on interlocking directos and seek DOJ approval to
engage the False Claims Act as a tool.
Thank you,
Gunnar
--
*Gunnar Larson *
*xNY.io <http://www.xny.io/> - Bank.org <http://bank.org/>*
MSc
<https://www.unic.ac.cy/blockchain/msc-digital-currency/?utm_source=Google&u…>
-
Digital Currency
MBA
<https://www.unic.ac.cy/business-administration-entrepreneurship-and-innovat…>
- Entrepreneurship
and Innovation (ip)
G(a)xNY.io
+1-646-454-9107
New York, New York 10001
---------- Forwarded message ---------
From: Gunnar Larson <g(a)xny.io>
Date: Wed, Oct 19, 2022, 2:04 PM
Subject: Directors Resign from the Boards of Five Companies in Response to
Justice Department Concerns about Potentially Illegal Interlocking
Directorates
To: cypherpunks <cypherpunks(a)lists.cpunks.org>
https://www.justice.gov/opa/pr/directors-resign-boards-five-companies-respo…
Wednesday, October 19, 2022
Directors Resign from the Boards of Five Companies in Response to Justice
Department Concerns about Potentially Illegal Interlocking Directorates
Resignations Reflect Antitrust Division’s Efforts to Reinvigorate
Enforcement and Deter Violations of Section 8 of the Clayton Act
WASHINGTON - The Justice Department announced today that seven directors
have resigned from corporate board positions in response to concerns by the
Antitrust Division that their roles violated the Clayton Act’s prohibition
on interlocking directorates. Section 8 of the Clayton Act (Section 8)
prohibits directors and officers from serving simultaneously on the boards
of competitors, subject to limited exceptions. Over the last several
months, the Division announced its intent to reinvigorate Section 8
enforcement. This announcement is the first in a broader review of
potentially unlawful interlocking directorates.
“Section 8 is an important, but underenforced, part of our antitrust laws.
Congress made interlocking directorates a *per se* violation of the
antitrust laws for good reason. Competitors sharing officers or directors
further concentrates power and creates the opportunity to exchange
competitively sensitive information and facilitate coordination – all to
the detriment of the economy and the American public,” said Assistant
Attorney General Jonathan Kanter of the Justice Department’s Antitrust
Division. “The Antitrust Division is undertaking an extensive review of
interlocking directorates across the entire economy and will enforce the
law.”
By eliminating the opportunity to coordinate – explicitly or implicitly –
through interlocking directorates, Section 8 is also intended to prevent
other violations of the antitrust laws before they occur. In response to
the Division’s competition concerns, the following companies and directors
unwound the interlocks without admitting to liability:
1. *Definitive Healthcare Corp. and ZoomInfo Technologies Inc. *–
Definitive and ZoomInfo operate go-to-market information and intelligence
platforms used by third-party sales, marketing, operations, and recruiting
teams across the United States. One director served simultaneously on the
boards of both companies and resigned from Definitive’s board in response
to the Division’s concerns about the alleged interlock.
1. *Maxar Technologies Inc. and Redwire Corp. *– Maxar and Redwire are
providers of space infrastructure and communications products and services.
One director served simultaneously on the boards of both companies and
resigned from Redwire’s board in response to the Division’s concerns about
the alleged interlock.
1. *Littelfuse Inc. and CTS Corp.* – Littelfuse and CTS are
manufacturers of components and technologies for use in transportation
applications, including sensors and switches for use in passenger and
commercial vehicles. One director served simultaneously on the boards of
both companies and resigned from CTS’s board in response to the Division’s
concerns about the alleged interlock.
1. *Skillsoft Corp. and Udemy Inc. *– Skillsoft and Udemy are providers
of online corporate education services. One director served simultaneously
on the boards of both companies, as did the investment firm Prosus, through
that director, because he represented Prosus on both boards at the same
time. The director resigned from Udemy’s board in response to the
Division’s concerns about the alleged interlock.
1. *Solarwinds Corp. and Dynatrace, Inc. *– Solarwinds and Dynatrace are
providers of Application Performance Monitoring (APM) software. One
director served simultaneously on the boards of both companies, as did the
investment firm Thoma Bravo, through this director, because he represented
Thoma Bravo on both boards at the same time. Two additional directors also
represented Thoma Bravo on the Solarwinds's board. All three directors
resigned from Solarwinds’s board in response to the Division’s concerns
about the alleged interlock.
Companies, officers, and board members should expect that enforcement of
Section 8 will continue to be a priority for the Antitrust Division. Anyone
with information about potential interlocking directorates or any other
potential violations of the antitrust laws is encouraged to contact the
Antitrust Division’s Citizen Complaint Center at 1-888-647-3258 or
antitrust.complaints(a)usdoj.gov.
1
0
My Recent Experience at an AI-Cybersecurity Marketing-Workshop
by Undescribed Horrific Abuse, One Victim & Survivor of Many 21 Oct '22
by Undescribed Horrific Abuse, One Victim & Survivor of Many 21 Oct '22
21 Oct '22
I received marketing a few days ago for a course track in
cybersecurity focused on a machine learning, and I attended the free
introductory lecture that advertised their school.
- The lecturer said that security corps have all the technology to do
emissions and other advanced, more reliable forms of information
security, but they are subject to "limitations" such that software and
hardware that does this is rarely sold.
- The school was called "flatiron" and they teach AI-based
cybersecurity in their cybersecurity track. The training was heavily
business-focused, kind of irritating to attend. The people seemed to
know what I was trying to talk about, and seemed interested in talking
about it but mostly indirectly rather than clearly.
- The intro material was focused on SIEM platforms, which are
corporate products that do security work for the user. The content
focused on automated log analysis.
I was the oblivious rude nerd who asked advanced questions without
sufficient experience nor understanding of where others were at, and
then gave frustrated responses. Because of my struggles, I actually
missed most of the replies, sadly.
The presenters promised a recording of the lecture, from which I hoped
to extract the replies I missed. I have yet to receive this recording
or find where it is at.
I also received personal e-mail, phone, and text message followup from
the presenters, likely hoping to help me apply to their course track.
1
0
[random] Paper on Simulating Coders
by Undescribed Horrific Abuse, One Victim & Survivor of Many 21 Oct '22
by Undescribed Horrific Abuse, One Victim & Survivor of Many 21 Oct '22
21 Oct '22
This is just a random paper I bumped into today. I'm not totally sure
what it's about.
The source is not released yet, but I believe it also links to prior
work if the topic is interesting.
Apologies for the barebones copypaste without formatting.
https://arxiv.org/abs/2210.08332
Code Recommendation for Open Source Software Developers
Yiqiao Jin
yjin328(a)gatech.edu
Georgia Institute of Technology
Atlanta, GA, USA
Yunsheng Bai
yba(a)cs.ucla.edu
University of California, Los Angeles
Los Angeles, CA, USA
Yanqiao Zhu
yzhu(a)cs.ucla.edu
University of California, Los Angeles
Los Angeles, CA, USA
Yizhou Sun
yzsun(a)cs.ucla.edu
University of California, Los Angeles
Los Angeles, CA, USA
Wei Wang
weiwang(a)cs.ucla.edu
University of California, Los Angeles
Los Angeles, CA, USA
ABSTRACT
Open Source Software (OSS) is forming the spines of technology
infrastructures, attracting millions of talents to contribute. Notably,
it is challenging and critical to consider both the developers’
interests and the semantic features of the project code to recommend
appropriate development tasks to OSS developers. In this paper,
we formulate the novel problem of code recommendation, whose
purpose is to predict the future contribution behaviors of developers
given their interaction history, the semantic features of source
code, and the hierarchical file structures of projects. Considering
the complex interactions among multiple parties within the system,
we propose CODER, a novel graph-based code recommendation
framework for open source software developers. CODER jointly
models microscopic user-code interactions and macroscopic userproject
interactions via a heterogeneous graph and further bridges
the two levels of information through aggregation on file-structure
graphs that reflect the project hierarchy. Moreover, due to the lack
of reliable benchmarks, we construct three large-scale datasets to
facilitate future research in this direction. Extensive experiments
show that our CODER framework achieves superior performance
under various experimental settings, including intra-project,
crossproject, and cold-start recommendation. We will release all the
datasets, code, and utilities for data retrieval upon the acceptance
of this work.
CCS CONCEPTS
• Information systems → Collaborative filtering; Web and social media
search; Social recommendation; Personalization.
KEYWORDS
Code recommendation; recommender system; open source software
development; multimodal recommendation; graph neural networks
1 INTRODUCTION
Open Source Software (OSS) is becoming increasingly popular in
software engineering [22, 45]. As contribution to OSS projects
is highly democratized [62], these projects attract millions of
developers with diverse expertise and efficiently crowd-source the
project development to a larger community of developers beyond
the project’s major personnel [22, 32]. For instance, GitHub, one
of the most successful platforms for developing and hosting OSS
projects, has over 83 million users and 200 million repositories [12].
models/
roberta/
bert/ modeling_bert.pytokenization_bert.pymodeling_roberta.pytokenization_roberta.pydata/
transformers
</>Param loading</>Embeddings</>Modelssrc/transformers
Figure 1: An example of the transformers repository. OSS
projects under similar topics usually adopt similar naming
conventions and file structures, which can be seen as knowledge
transferable across projects.
Community support and teamwork are major driving forces behind open
source projects [32]. OSS projects are usually developed
in a collaborative manner [2], whereas collaboration in OSS is
especially challenging. OSS projects are of large scales and usually
contain numerous project files written in diverse programming
languages [4]. According to statistics, the most popular 500 GitHub
projects contain an average of 2,582 project files, 573 directories, and
360 contributors. Meanwhile, there are more than 300 programming
languages on GitHub, 67 of which are actively being used [10, 11].
For project maintainers, it is both difficult and time-consuming
to find competent contributors within a potentially large candidate
pool. For OSS developers, recommending personalized development tasks
according to their project experience and expertise can
significantly boost their motivation and reduce their cognitive loads
of manually checking the project files. As contribution in OSS is
voluntary, developers that fail to find meaningful tasks are likely
to quit the project development [42]. Therefore, an efficient system
for automatically matching source code with potential contributors
is being called for by both the project core team and the potential
contributors to reduce their burden.
To solve the above issues, in this paper, we for the first time
introduce the novel problem of code recommendation for OSS developers.
As shown in Fig. 2, this task recommends code in the form
of project files to potentially suitable contributors. It is noteworthy
arXiv:2210.08332v2 [cs.SE] 20 Oct 2022
Jin et al.
that code recommendation has several unique challenges such that
traditional recommender models are not directly applicable.
Firstly, OSS projects contain multimodal interactions among
users, projects, and code files. For example, OSS development contains
user-code interactions, such as commits that depict microscopic
behaviors of users, and user-project interactions, such as
forks and stars that exhibit users’ macroscopic preferences and
interests on projects. Also, the contribution relationships are often
extremely sparse, due to the significant efforts required to make a
single contribution to OSS projects. Therefore, directly modeling
the contribution behavior as in traditional collaborative filtering
approaches will inevitably lead to inaccurate user/item
representations and suboptimal performances.
Secondly, in the software engineering domain, code files in a
project are often organized in a hierarchical structure [61]. Fig. 1
shows an example of the famous huggingface/transformers repository
[50]. The src directory usually contains the major source
code for a project. The data and models subdirectories usually
include functions for data generation and model implementations,
respectively. Such a structural organization of the OSS project
reveals semantic relations among code snippets, which are helpful
for developers to transfer existing code from other projects to their
development. Traditional methods usually ignore such item-wise
hierarchical relationships and, as a result, are incapable of
connecting rich semantic features in code files with their
project-level
structures, which is required for accurate code recommendation.
Thirdly, most existing benchmarks involving recommendation
for softwares only consider limited user-item behaviors [5, 20], are
of small scales [36, 37], or contain only certain languages such as
Python [19, 34, 46] or Java [5, 20, 37], which renders the evaluation
of different recommendation models difficult or not realistic.
To overcome the above challenges, we propose CODER, a CODE
Recommendation framework for open source software developers
that matches project files with potential contributors. As shown in
Fig. 2, CODER treats users, code files, and projects as nodes and
jointly models the microscopic user-code interactions and macroscopic
user-project interactions in a heterogeneous graph. Furthermore, CODER
bridges these two levels of information through
message aggregation on the file structure graphs that reflect the
hierarchical relationships among graph nodes. Additionally, since
there is a lack of benchmark datasets for the code recommendation
task, we build three large-scale datasets from open software
development websites. These datasets cover diverse subtopics in
computer science and contain up to 2 million fine-grained user-file
interactions. Overall, our contributions are summarized as follows:
• We for the first time introduce the problem of code recommendation,
whose purpose is to recommend appropriate development
tasks to developers, given the interaction history of developers,
the semantic features of source code, and hierarchical structures of
projects.
• We propose CODER, an end-to-end framework that jointly
models structural and semantic features of source code as well as
multiple types of user behaviors for improving the matching task.
• We construct three large-scale multi-modal datasets for code
recommendation that cover different topics in computer science to
facilitate research on code recommendation.
• We conduct extensive experiments on massive datasets to
demonstrate the effectiveness of the proposed CODER framework
and its design choices.
2 PROBLEM FORMULATION
Before delving into our proposed CODER framework, we first formalize
our code recommendation task. We use the terms “repository” and
“project” interchangeably to refer to an open source
project. We define U, V, R as the set of users, files, and
repositories, respectively. Each repository 𝑟𝑘 ∈ R contains a subset
of
files V𝑘 ⊊ V. Both macroscopic project-level interactions and
microscopic file-level interactions are present in OSS development.
File-level behaviors. We define Y ∈ {0, 1}
|U |× |V | as the interaction matrix between U and V for the
file-level contribution
behavior, where each entry is denoted by 𝑦𝑖𝑗 . 𝑦𝑖𝑗 = 1 indicates that
𝑢𝑖 has contributed to 𝑣𝑗
, and 𝑦𝑖𝑗 = 0, otherwise.
Project-level behaviors. Interactions at the project level are more
diverse. For example, the popular code hosting platform GitHub
allows users to star (publicly bookmark) interesting repositories
and watch (subscribe to) repositories for updates. We thus define
T as the set of user-project behaviors. Similar to Y, we define S𝑡 ∈
{0, 1}
|U |× |R | as the project-level interaction matrix for behavior
of type 𝑡. Our goal is to predict the future file-level contribution
behaviors of users based on their previous interactions. Formally,
given the training data Y
tr, we try to predict the interactions in the
test set 𝑦𝑖𝑗 ∈ Y
ts = Y\Y
tr
.
3 METHODOLOGY
As shown in Fig. 2, we design CODER, a two-stage graph-based
recommendation framework. CODER considers 𝑢𝑖 ∈ U, 𝑣𝑗 ∈ V, 𝑟𝑘 ∈
R as graph nodes, and models the user-item interactions and the
item-item relations as edges. We use two sets of graphs to
characterize the heterogeneous information in code recommendation. One
is
the user-item interaction graphs that encompass the collaborative
signals. The other is the file-structure graphs that reveal file-file and
file-project relationships from the project hierarchy perspective.
The code recommendation problem is then formulated as a user-file
link prediction task.
CODER contains two major components: 1) Node Semantics
Modeling, which learns the fine-grained representations of project
files by fusing code semantics with their historical contributors,
and then aggregate project hierarchical information on the file
structure graph to learn the file and repository representation; 2)
Multi-behavioral Modeling, which jointly models the microscopic
user-file interactions and macroscopic user-project interactions.
Finally, CODER fuses the representations from multiple behaviors
for prediction. This way, node semantics modeling bridges the
coarse-grained and fine-grained interaction signals on the item
side. Therefore, CODER efficiently characterizes intra-project and
inter-project differences, eventually uncovering latent user and
item features that explain the interactions Y.
3.1 Node Semantics Modeling
Node semantics modeling aims to learn file and repository
representation. The challenge is how to inherently combine the
semantic features of each project file with its interacted users and
the
Code Recommendation for Open Source Software Developers
(b) Structural-Level Aggregation
𝑟ଵ
(d) Project-Level Aggregation
𝑟ଵ
𝑟ଶ
𝐮∗𝐯∗𝐫௧,∗𝐯𝑠ி(𝑖, 𝑗)Node Semantics Modeling
</>
(c) File-Level Aggregation
𝑣ଵ
𝑣ଶ
𝑢ଵ
𝑢ଵ
𝑣ଵ 𝑣ଶ 𝑣ଷ
𝐐 = 𝐪ଵ~ேೂ
𝐂 = 𝐜ଵ~ே
Multi-Behavioral Modeling
(a) Code-User Modality Fusion
𝐡 = 𝑓௧௧(𝐂, 𝐐)
𝐮Input Predictionrepository 𝑟
user 𝑢
file 𝑣
code
segment
𝑐
</>
</>
𝑐ଵ
𝑐ଶ
def add(x, y):
return x + y
𝐳௧, ∗𝑠(𝑖, 𝑘, 𝑡)star
contribution
𝐋
File Structure
Graph
User-File
Interaction
Graph
User-Project
Interaction
Graphs
Figure 2: Our proposed CODER framework for code recommendation. CODER
jointly considers project file structures, code semantics, and user
behaviors. CODER models the microscopic file-level interactions and
macroscopic project-level interactions
through Multi-Behavioral Modeling, and bridges the micro/macro-scopic
signals through Node Semantics Modeling.
project hierarchy. To address this challenge, we first use a codeuser
modality fusion mechanism (Fig. 2a) to fuse the file content
modality and the historical users at the code level. Then, we embed
the fine-grained collaborative signals from user-file interactions
into the file representations. Next, we employ structural-level
aggregation (Fig. 2b), which explicitly models the project structures
as hierarchical graphs to enrich the file/repository representation
with structural information. This step produces representation for
each file 𝑣𝑗 and repository 𝑟𝑘
, which serve as the input for user
behavior modeling in Sec. 3.2.
3.1.1 Code-User Modality Fusion. A project file is characterized by
diverse semantic features including multiple method declarations
and invocations, which are useful for explaining why a contributor
is interested in it. We therefore use pretrained CodeBERT [7], a
bimodal language model for programming languages, to encode
the rich semantic features of each file. CodeBERT is shown to
generalize well to programming languages not seen in the pretraining
stage, making it suitable for our setting where project files
are written in diverse programming languages. Here, a straightforward
way is to directly encode each file into a per-file latent
representation. Such an encoding scheme has two issues. Firstly, a
file may contain multiple classes and function declarations that
are semantically distinct. Fig. 1 shows the file structure of the
huggingface/transformers [50] repository as an example. The
modeling_bert.py file contains not only various implementations
of the BERT language model for different NLP tasks, but also utilities
for parameter loading and word embeddings. These implementations are
distributed among several code segments in the same
project file, and file-level encoding can fail to encode such semantic
correlations. Secondly, the property of a project file is influenced
by the historical contributors’ features. A user’s contribution can
be viewed as injecting her/his own attributes, including programming
style and domain knowledge, into the interacted file. Such
contribution behaviors make it more likely to be interacted again
by users with similar levels of expertise than random contributors.
Therefore, we propose a code-user modality fusion strategy
to embed both code semantics and user characteristics into the file
representation. Specifically, for each file, we partition its source
code into 𝑁𝐶 code segments and encode each of them into a
codesegment-level representation c𝑖
. This produces a feature map C =
[c1, c2, . . . c𝑁𝐶
], C ∈ R
𝑁𝐶 ×𝑑
, where 𝑑 is the embedding size. Similarly, we sample 𝑁𝑄 historical
users of the file and encode them
into a feature map Q = [u1, u2, . . . u𝑁𝑄
], Q ∈ R
𝑁𝑄 ×𝑑
. Please refer
to Appendix ?? for details in encoding C, Q. Inspired by the success
of co-attention [30, 63], we transform the user attention space
to code attention space by calculating a code-user affinity matrix
L ∈ R
𝑁𝐶 ×𝑁𝑄 :
L = tanh
CW𝑂 Q
⊤
, (1)
where W𝑂 ∈ R
𝑑×𝑑
is a trainable weight matrix. Next, we compute
the attention weight a ∈ R
𝑁𝐶 of the code segments to select salient
features from C. We treat the affinity matrix as a feature and learn
an attention map H with 𝑁𝐻 representation:
H = tanh
W𝐶C
⊤ + W𝑄 (LQ)
⊤
, (2)
a = softmax
w⊤
𝐻
H
, (3)
where W𝐶, W𝑄 ∈ R
𝑁𝐻 ×𝑑
, w𝐻 ∈ R
𝑁𝐻 are the weight parameters. Finally, the file attention
representation h is calculated as the
weighted sum of the code feature map:
h = a
⊤C. (4)
The file attention representation serves as a start point to further
aggregate file structural feature
Jin et al.
3.1.2 Structural-Level Aggregation. Projects are organized in a
hierarchical way such that nodes located closer on the file structure
graph are more closely related in terms of semantics and
functionality. For example, in Fig. 1, both files under the bert/
directory contain source code for the BERT [24] language model, and
files under
roberta/ contains implementation for the RoBERTa [29] model.
The file modeling_bert.py is therefore more closely related to
tokenization_bert.py in functionality than to tokenization_
roberta.py.
To exploit such structural clues, we model each repository as a
hierarchical heterogeneous graph 𝐺𝑆 consisting of file, directory,
and repository nodes. Each node is connected to its parent node
through an edge, and nodes at the first level are directly connected
to the virtual root node representing the project. To encode the
features of directory nodes, we partition the directory names into
meaningful words according to underscores and letter capitalization,
then encoded the nodes by their TF-IDF features. Our encoding
scheme is motivated by the insight that the use of standard directory
names (e.g., doc, test, models) is correlated with project popularity
among certain groups of developers [2, 69]. Repository features
are encoded by their project owners, creation timestamps, and their
top-5 programming languages. The repository and directory
representations are mapped to the same latent space as the file nodes.
Then, we pass the representation h through multiple GNN layers
to aggregate the features of each node from its neighbors on 𝐺𝑆 .
eh = 𝑓GNN(h,𝐺𝑆 ), (5)
where eh is the structure-enhanced node representation. The
aggregation function 𝑓GNN(·) can be chosen from a wide range of GNN
operators, such as GCN [26], GraphSAGE [15], and GIN [54]. In
practice, we employ a 3-layer Graph Attention Network (GAT) [44].
3.2 Multi-behavioral Modeling
Direct modeling of the sparse contribution behavior potentially
leads to inaccurate user/item representations and aggravates the
cold-start issue. Instead, we jointly model the microscopic user-file
contribution in File-level Aggregation (Fig. 2c) and macroscopic
user-project interactions in Project-level Aggregation (Fig. 2d) to
learn user preferences and address the sparsity issue. Then, the
representations learned from multi-level behaviors are combined
to form the user and item representations for prediction.
3.2.1 File-level Aggregation. We model the project files and their
contributors as an undirected user-file bipartite graph G𝐹 consisting
of users 𝑢𝑖 ∈ U, files 𝑣𝑗 ∈ V and their interactions. The initial
embedding matrix of users/items is denoted by E
(0)
, which serves
as an initial state for end-to-end optimization.
E
(0) = [u
(0)
1
, · · · , u
(0)
|U |
| {z }
users embeddings
, v
(0)
1
, · · · , v
(0)
|V |
| {z }
item embedding
], (6)
where u
(0)
𝑖
is the initial embeddings for user 𝑢𝑖
, and v
(0)
𝑗
is the
initial embeddings for file 𝑣𝑗 equivalent to its structure-enhanced
representation eh (Sec. 3.1.2). We adopt the simple weight sum
aggregator in LightGCN [17] in the propagation rule:
u
(𝑙)
𝑖
=
Í
𝑣𝑗 ∈N𝑖
√︃ 1
|N𝑖
||N𝑗 |
v
(𝑙−1)
𝑗
, (7)
v
(𝑙)
𝑗
=
Í
𝑢𝑖 ∈N𝑗
√︃ 1
|N𝑗 | |N𝑖
|
u
(𝑙−1)
𝑖
, (8)
where u
(𝑙)
𝑖
and v
(𝑙)
𝑗
are the embeddings for user 𝑢𝑖 and file 𝑣𝑗
at layer 𝑙. N𝑖 and N𝑗
indicate the neighbors of user 𝑢𝑖 and file
𝑣𝑗
. 1/
√︃
|N𝑖
|
N𝑗
is the symmetric normalization term set to the
graph Laplacian norm to avoid the increase of GCN embedding
sizes [17, 26]. In matrix form, the propagation rule of file-level
aggregation is expressed as:
E
(𝑙) = D
−1/2AD−1/2E
(𝑙−1)
, A =
0 Ytr
Y
tr⊤ 0
, (9)
where A ∈ R
( |U |+ |V |)×( |U |+ |V |) is the affinity matrix. D is the diagonal
degree matrix in which each entry D𝑖𝑖 indicates the number
of non-zero entries on the i-th row of A. By stacking multiple layers,
each user/item node aggregates information from its higher-order
neighbors. Propagation through 𝐿 layers yields a set of representations {E
(𝑙)
}
𝐿
𝑙=0
. Each E
(𝑙)
emphasizes the messages from its 𝑙-hop
neighbors. We apply mean-pooling over all E
(𝑙)
to derive the user
and file representations u
∗
𝑖
and v
∗
𝑗
from different levels of user/item
features:
u
∗
𝑖
=
1
𝐿 + 1
∑︁
𝐿
𝑙=0
u
(𝑙)
𝑖
, v
∗
𝑗
=
1
𝐿 + 1
∑︁
𝐿
𝑙=0
v
(𝑙)
𝑗
. (10)
3.2.2 Project-Level Aggregation. OSS development is characterized
by both microscopic contribution behaviors and multiple types
of macroscopic project-level behaviors. For example, developers
usually find relevant projects and reuse their functions and explore
ideas of possible features [19, 21]. In particular, GitHub users
can star (bookmark) interesting repositories and discover projects
under similar topics. This way, developers can adapt code
implementation of these interesting projects into their own
development
later. Hence, project-level macroscopic interactions are conducive
for extracting users’ broad interests.
For each behavior𝑡, we propagate the user and repository embeddings
on its project-level interaction graph G
𝑡
𝑃
. The initial embeddings Z
(0)
is shared by all 𝑡 ∈ T and is composed of the initial user
representations identical to Eq. 6 and the repository embeddings
from the structure-enhanced representation eh in Eq. 5:
Z
(0) = [z
(0)
1
, z
(0)
2
, . . . z
(0)
|U |
| {z }
user embeddings
, r
(0)
1
, r
(0)
2
, . . . r
(0)
|R |
| {z }
repository embeddings
], (11)
Z
𝑙
𝑡 = D
−1/2
𝑡
Λ𝑡D
−1/2
𝑡
Z
(𝑙−1)
𝑡
, (12)
where z
(0)
𝑖
= u
(0)
𝑖
. Λ𝑡 ∈ R
( |U |+ |R |)×( |U |+ |R |) is the affinity matrix for behavior 𝑡
constructed similarly as A in Eq. 9. Agg(·) is
an aggregation function. With representations {Z
(𝑙)
𝑡
}
𝐿
𝑙=0
obtained
from multiple layers, we derive the combined user and repository
Code Recommendation for Open Source Software Developers
representations for behavior 𝑡 as
z
∗
𝑡,𝑖 =
1
𝐿 + 1
∑︁
𝐿
𝑙=0
z
(𝑙)
𝑡,𝑖 , r
∗
𝑡,𝑖 =
1
𝐿 + 1
∑︁
𝐿
𝑙=0
r
(𝑙)
𝑡,𝑖 . (13)
3.3 Prediction
For file-level prediction, we aggregate the macroscopic signals
z
∗
𝑡,𝑖, r
∗
𝑡,𝑖 from each behavior 𝑡 into u𝑖
, v𝑗
:
z
∗
𝑖
= Agg(𝑧
∗
𝑡
, 𝑡 ∈ 𝑇 ), r
∗
𝑘
= Agg(𝑟
∗
𝑡
, 𝑡 ∈ 𝑇 ), (14)
u𝑖 = MLP( [u
∗
𝑖
||z
∗
𝑖
]), v𝑗 = MLP( [v
∗
𝑗
||r
∗
𝜙 (𝑗)
]), (15)
where MLP(·) is a multilayer perceptron. 𝜙 (·) : V → R maps the
index of each project file to its repository. || is the concatenation
operator. On the user side, both macroscopic interests and microlevel
interactions are injected into the user representations. On the
item side, the semantics of each file is enriched by its interacted
users and the repository structural information.
For computational efficiency, we employ inner product to calculate the user 𝑢𝑖
’s preference towards each file 𝑣𝑗
:
𝑠𝐹 (𝑖, 𝑗) = u
⊤
𝑖
v𝑗
, (16)
where 𝑠𝐹 is the scoring function for the file-level behavior. Similarly,
for each user-project pair, we derive a project-level score for each
behavior 𝑡 using the project-level scoring function 𝑠𝑃 :
𝑠𝑃 (𝑖, 𝑘, 𝑡) = z
∗⊤
𝑡,𝑖 r
∗
𝑡,𝑘 . (17)
3.4 Optimization
We employ the Bayesian Personalized Ranking (BPR) [39] loss, which
encourages the prediction of an observed user-item interaction to
be greater than an unobserved one:
L𝐹 =
∑︁
(𝑖,𝑗+,𝑗−) ∈𝑂
− log(sigmoid(𝑠𝐹 (𝑖, 𝑗+
) − 𝑠𝐹 (𝑖, 𝑗−
))), (18)
L
𝑡
𝑃
=
∑︁
(𝑖,𝑘+,𝑘−) ∈𝑂
− log(sigmoid(𝑠𝑃 (𝑖, 𝑘+
, 𝑡) − 𝑠𝑃 (𝑖, 𝑘−
, 𝑡))), (19)
where L𝐹 is the file-level BPR loss, and L𝑡
𝑃
is the project-level
BPR loss for behavior 𝑡. O denotes the pairwise training data. 𝑗
+
indicates an observed interaction between user 𝑢𝑖 and item 𝑣𝑗
+
and 𝑗
− indicates an unobserved one. As high-order neighboring
relations within contributors are also useful for recommendations,
we enforce users to have similar representations as their structural
neighbors through the structure-contrastive learning objective [28]:
L
𝑈
𝐶
=
∑︁
𝑢𝑖 ∈U
− log
exp
u
(𝜂)
𝑖
· u
(0)
𝑖
/𝜏
Í
𝑢𝑗 ∈U exp
u
(𝜂)
𝑖
· u
(0)
𝑗
/𝜏
, (20)
Here, 𝜂 is set to an even number so that each user node can aggregate
signals from other user nodes. 𝜏 is the temperature hyperparameter.
Similarly, the contrastive loss is applied to each 𝑣𝑖
:
L
𝑉
𝐶
=
∑︁
𝑣𝑖 ∈V
− log
exp
v
(𝑙)
𝑖
· v
(0)
𝑖
/𝜏
Í
𝑣𝑗 ∈V exp
v
(𝑙)
𝑖
· v
(0)
𝑗
/𝜏
. (21)
The overall optimization objective is
L = L𝐹 + 𝜆1
∑︁
𝑡 ∈T
L
𝑡
𝑃
+ 𝜆2 (L𝑈
𝐶
+ L𝑉
𝐶
) + 𝜆3 ∥Θ∥2, (22)
Table 1: Summary of the datasets. The “#Files” column
shows the number of files with observed interactions instead of all
existing files in the projects.
Dataset #Files #Users #Interactions Density
ML 239,232 21,913 663,046 1.26E-4
DB 415,154 30,185 1,935,155 1.54E-4
FS 568,972 51,664 1,512,809 5.14E-5
where Θ denotes all trainable model parameters. 𝜆1, 𝜆2, 𝜆3 are
hyperparameters.
4 EXPERIMENTS
4.1 Experimental Settings
4.1.1 Datasets. We collected 3 datasets covering diverse topics in
computer science including machine learning (ML), fullstack (FS),
and database (DB), using the GitHub API 1
and the PyGithub 2package.
We retain repositories with ≥ 250 stars and ≥ 3 contributors to
exclude repositories intended for private usages [2]. We include
projects with contribution history of at least 3 months according
to their commit history. To ensure that our model generalizes on a
wide range of topics, popularity, and project scales, we first select
3 subsets of repositories using their GitHub topics 3
, which are
project labels created by the project owners. Then, we randomly
sample 300 repositories from each subset considering their numbers
of project files and stars. We use the unix timestamp 1550000000
and 1602000000 to partition the datasets into train/val/test sets.
This way, all interactions before the timestamp are used as the
training data. We retain the users with at least 1 interaction in both
train and test set. More details about dataset construction are in
the appendix.
4.1.2 Implementation Details. We implemented our CODER model
in PyTorch [38] and PyG [8]. For all models, we set the embedding
size to 32 and perform Xavier initialization [13] on the model
parameters. We use Adam optimizer [25] with batch size of 1024. For
Node Semantic Modeling (Sec. 3.1), we set 𝑁𝐶 = 8 and 𝑁𝑄 = 4. The
code encoder we use is the pretrained CodeBERT [7] model with
6 layers, 12 attention heads, and 768-dimensional hidden states.
For Multi-Behavioral Modeling (Sec. 3.2), we set the number of
convolution layers 𝐿 = 4 for both intra- and inter-level aggregation.
For prediction and optimization, we search the hyperparameter 𝜆3
in [1𝑒 − 4, 1𝑒 − 3, 1𝑒 − 2], and 𝜆1 in [1𝑒 − 2, 1𝑒 − 1, 1]. For structure
contrastive loss [28], we adopt the hyperparameter setting from the
original implementation and set 𝜆2 = 1𝑒 − 6, 𝜂 = 2 without further
tuning. For the baseline models, the hyperparameters are set to the
optimal settings as reported in their original papers. For all models,
we search the learning rate in [1𝑒 − 4, 3𝑒 − 4, 1𝑒 − 3, 3𝑒 − 3, 1𝑒 − 2].
4.1.3 Baselines. We compare our methods with 3 groups of methods,
including: (G1) factorization-based methods including MF [39];
(G2): neural-network-based methods including MLP [41] and NeuMF [18];
(G3): Graph-based methods that model user-item interactions as graphs,
including NGCF [47], LightGCN [17], and NCL [28].
1https://docs.github.com/en/rest
2https://github.com/PyGithub/PyGithub.git
3https://github.com/topics
Jin et al.
Table 2: The overall performance on 3 datasets. The best performance
is marked in bold. The second best is underlined.
Dataset Metric MF MLP NeuMF NGCF LightGCN NCL CODER Impr.
NDCG@5 0.065 0.073 0.076 0.091 0.106 0.119 0.132 11.2%
Hit@5 0.162 0.189 0.189 0.237 0.291 0.276 0.351 20.5%
MRR@5 0.098 0.113 0.114 0.137 0.164 0.201 0.211 5.0%
NDCG@10 0.066 0.075 0.081 0.093 0.109 0.118 0.136 14.8%
ML Hit@10 0.229 0.250 0.263 0.310 0.386 0.337 0.440 14.0%
MRR@10 0.106 0.121 0.124 0.147 0.177 0.209 0.223 6.7%
NDCG@20 0.072 0.081 0.084 0.100 0.116 0.120 0.141 17.9%
Hit@20 0.324 0.343 0.346 0.407 0.457 0.466 0.540 15.8%
MRR@20 0.113 0.127 0.130 0.154 0.185 0.213 0.230 8.2%
NDCG@5 0.085 0.079 0.085 0.099 0.082 0.124 0.160 29.0%
Hit@5 0.205 0.191 0.206 0.263 0.237 0.316 0.390 23.2%
MRR@5 0.130 0.118 0.128 0.162 0.132 0.252 0.260 3.2%
NDCG@10 0.086 0.079 0.085 0.100 0.084 0.123 0.159 29.4%
DB Hit@10 0.267 0.251 0.276 0.361 0.324 0.380 0.488 28.4%
MRR@10 0.138 0.126 0.137 0.175 0.144 0.260 0.273 4.9%
NDCG@20 0.088 0.083 0.088 0.103 0.091 0.125 0.160 27.3%
Hit@20 0.335 0.338 0.362 0.454 0.422 0.437 0.588 29.5%
MRR@20 0.143 0.132 0.143 0.182 0.150 0.264 0.280 6.0%
NDCG@5 0.063 0.063 0.067 0.082 0.089 0.106 0.146 37.1%
Hit@5 0.168 0.178 0.179 0.231 0.245 0.283 0.374 31.9%
MRR@5 0.100 0.100 0.107 0.132 0.146 0.170 0.226 33.0%
NDCG@10 0.063 0.065 0.068 0.085 0.092 0.106 0.144 35.6%
FS Hit@10 0.231 0.244 0.249 0.319 0.332 0.361 0.467 29.3%
MRR@10 0.109 0.110 0.117 0.144 0.157 0.180 0.239 32.3%
NDCG@20 0.067 0.070 0.073 0.090 0.095 0.110 0.146 32.7%
Hit@20 0.307 0.321 0.335 0.406 0.414 0.451 0.559 23.9%
MRR@20 0.114 0.115 0.122 0.150 0.163 0.187 0.245 31.4%
As the code recommendation task is to predict users’ file-level
contribution, file-level behavior modeling is the most critical
component. Thus, we use file-level contribution behaviors as the
supervision signals as in Eq. 16. For brevity, we use repository
identity
to refer to the information of which repository a file belongs to.
As the baselines do not explicitly leverage the repository identities
of files, we encode their repository identities as a categorical
feature through one-hot encoding during embedding construction.
To ensure fairness of comparison, we incorporate the project-level
interaction signals into the user representations by applying multihot
encoding on the repositories each user has interacted with. All
the baseline models use the same pretrained CodeBERT embeddings as
CODER to leverage the rich semantic features in the source
code.
4.1.4 Evaluation Metrics. Following previous works [17, 18, 47,
68], we choose Mean Reciprocal Rank (MRR@K), Normalized Discounted
Cumulative Gain (NDCG@K), Recall@K (Rec@K) and
Hit@K as the evaluation metrics.
4.2 Performance
4.2.1 Intra-Project Recommendation. In this setting, we evaluate
the model’s ability to recommend development tasks under her
interacted repositories. For each user 𝑢𝑖
, we rank the interactions
under repositories s/he has interacted with in the training set.
This setting corresponds to the scenario in which project maintainers
recommend new development tasks to existing contributors
based on their previous contribution. As shown in Tab. 2, CODER
consistently outperforms the baselines by a large margin. On the
ML dataset, CODER outperforms the best baseline by 17.9% on
NDCG@20, 15.8% on Hit@20, and 8.2% on MRR@20. On the DB
dataset, CODER achieves performance improvements of 27.3% on
NDCG@20, 29.5% on Hit@20, and 6.0% on MRR@20. Notably, the
greatest performance improvement is achieved on the FS dataset,
which has the greatest sparsity. CODER achieves a maximum performance
improvement of 37.1% on NDCG@5 and 35.6% on NDCG@10.
The results show that CODER achieves significant performance
improvement over the baseline, and is especially useful when the
observed interactions are scarce.
Among the baselines, graph-based method (G3) achieves better
performances than (G1), (G2) as they can model the high-order
relations between users and items through the interaction graph and
the embedding function. LightGCN [17] underperforms NGCF [47]
on the DB dataset, whose training set has the greatest density, and
outperforms NGCF on the ML and FS datasets. This implies that
the message passing scheme of NGCF, which involves multiple
linear transformation and non-linear activation, is more effective
for denser interactions. Such results justify our design choice in
multi-behavioral modeling, which uses the LightGCN propagation
scheme. NCL exhibits the strongest performance, demonstrating the
importance of the contrastive learning loss in modeling differences
Code Recommendation for Open Source Software Developers
among homogeneous types of nodes, which is also included in our
model design. Neural-network-based methods (G2) generally outperform
matrix factorization (G1), as they leverage multiple feature
transformations to learn the rich semantics in the file embeddings
and the user-file interactions.
4.2.2 Cold-Start Recommendation. User contribution is usually
sparse due to the considerable workload and voluntary nature of
OSS development. In this sense, it is important to accurately capture
the users’ preferences with few observed interactions. Our
model is thus designed according to this principle. We define
coldstart users as users with ≤ 2 interactions in the training set. To
evaluate the model’s performance with fewer interactions, we
choose 𝑁𝐷𝐶𝐺@𝐾, 𝑅𝑒𝑐𝑎𝑙𝑙@𝐾, and 𝐻𝑖𝑡@𝐾, where 𝐾 ∈ {3, 5}. The
strongest 4 baselines in Tab. 2 are evaluated for comparison.
As observed from Tab. 3, performance for cold-start users is
worse than that for all users in Tab. 2. Notably, CODER is able to
achieve even greater performance improvement over the baseline
models. It can be attributed to the following aspects: 1) CODER
learns more accurate representations by fusing the fine-grained
semantics of project files with their interacted users, which
facilitates
the learning of user preferences evem in the absence of dense user
interactions. 2) By explicitly modeling multiple types of projectlevel
behaviors, CODER effectively models the users’ interests to
complement the sparse file-level contribution relations, which is
more effective than encoding the project-level interactions in the
embedding space.
4.2.3 Cross-Project Recommendation. Although 91% developers
in our dataset focused on 1 project throughout their development,
active contributors can work on multiple projects. For these
contributors, the project core team can recommend development tasks
based on their development experiences in previous projects.
During evaluation, we rank the interactions in projects each
user has not yet interacted with in the training set. This setting is
considerably more challenging than intra-project recommendation
since the candidate item pool is significantly larger. According to
the results in Fig. 4, CODER consistently achieves superior
performance by a large margin with respect to the baselines,
especially
for 𝐾 ≥ 20. The results show that CODER jointly learns interproject
differences to choose the correct repositories and characterize
intra-project distinctions to recommend the correct files within
the chosen repositories.
To further validate the above observation, we randomly sample
10 repositories in the MF dataset and visualize their file embeddings
in Fig. 5 using t-SNE [43]. A maximum of 300 files are displayed
per repository for clarity. The file embeddings are obtained using 4
models: (a) Matrix Factorization [39], (b) LightGCN [17], (c)
NCL [28], and (d) our CODER framework. We observe that models
with the contrastive learning objective (NCL and CODER) manifest
better clustering structures. In particular, file embeddings learned
by our CODER framework demonstrate the best cross-repository
differences. The results further prove that CODER jointly models
the intra-project and inter-project differences among files,
effectively distinguishing the files under the same OSS project, which
is
more efficient than directly encoding the repository identity in the
embedding space.
Table 3: File-level link prediction results for cold-start users.
“LGN” stands for the baseline “LightGCN”. The best performance is
marked in bold. The second best is underlined.
Metric NeuMF NGCF LGN NCL CODER Impr.
NDCG@3 0.059 0.067 0.068 0.090 0.126 40.9%
Hit@3 0.106 0.123 0.161 0.211 0.224 5.9%
ML MRR@3 0.081 0.087 0.089 0.119 0.165 38.3%
NDCG@5 0.068 0.078 0.088 0.105 0.132 25.8%
Hit@5 0.161 0.162 0.230 0.261 0.273 4.8%
MRR@5 0.093 0.097 0.105 0.130 0.177 36.0%
NDCG@3 0.078 0.063 0.055 0.075 0.119 53.0%
Hit@3 0.152 0.114 0.128 0.165 0.238 44.4%
DB MRR@3 0.102 0.089 0.070 0.095 0.157 54.0%
NDCG@5 0.086 0.061 0.064 0.086 0.127 47.5%
Hit@5 0.195 0.132 0.165 0.220 0.287 30.6%
MRR@5 0.112 0.093 0.079 0.106 0.168 50.2%
NDCG@3 0.079 0.075 0.085 0.092 0.128 38.7%
Hit@3 0.171 0.165 0.179 0.179 0.242 35.6%
FS MRR@3 0.110 0.095 0.104 0.116 0.171 48.0%
NDCG@5 0.086 0.086 0.085 0.095 0.137 44.3%
Hit@5 0.230 0.220 0.202 0.222 0.313 36.2%
MRR@5 0.124 0.106 0.109 0.125 0.187 49.4%
4.2.4 Ablation Studies. In Fig. 3, we compare the performance of
our model (abbreviated as CD) among its 5 variants. CD-F removes
the code-user modality fusion strategy in Eq. (1). CD-C excludes
the structural contrastive learning objective in Eq. 20-Eq. 21. CD-E
does not use the pretrained CodeBERT embeddings and instead
applies TF-IDF encoding on the source code, a common approach
in project recommendation models [55]. CD-P removes the projectlevel
aggregation in Sec. 3.2.2. CD-S disables the structural-level
aggregation in Sec. 3.1.2. The results on the ML dataset are shown
in Fig. 3. We have the following observations:
First, all 6 variants of CODER outperforms NCL, among which
the full model (CD) performs the best, indicating the importance of
each component in our model design. The performance drops most
significantly when we disable project-level aggregation in CD-P,
indicating the importance of explicitly modeling user-project
interactions through graph structures. We also observe a considerable
decrease when we remove the structural-level aggregation (CD-S),
implying that the structural information of files has a significant
contribution towards the file representation. CD-E does not lead
to a more significant performance decrease, but is outperformed by
CD-F where fine-grained representations of the source code are
present. Thus, user behaviors and project structural clues are more
important than semantic features in code recommendation.
5 RELATED WORKS
5.1 Research in Open Source
Open source has grown into a standard practice for software
engineering [22] and attract researchers to study social coding [62].
Analytical studies focus on the users’ motivation [9, 59], expertise
[45],
and collaboration patterns [35] as well as factors that impact the
popularity [2] of projects. Methodological studies explore project
Jin et al.
0.08
0.09
0.10
0.11
0.12
CD -F -C -E -S -P NCL
Rec@20
0.12
0.13
0.14
0.15
CD -F -C -E -S -P NCL
NDCG@20
0.45
0.47
0.49
0.51
0.53
0.55
CD -F -C -E -S -P NCL
Hit@20
0.18
0.2
0.22
0.24
CD -F -C -E -S -P NCLMRR@20Figure 3: Results among variants of CODER
and the best baseline model NCL on the ML dataset.
0
0.02
0.04
0.06
0.08
0.1
5 10 15 20 50 100
Recall
LGN NCL CODER
0.005
0.01
0.015
0.02
0.025
0.03
0.035
5 10 15 20 50 100
NDCG
LGN NCL CODER
0.02
0.06
0.1
0.14
0.18
0.22
5 10 15 20 50 100
Hit
LGN NCL CODER
0.015
0.02
0.025
0.03
5 10 15 20 50 100MRRLGNNCLCODERFigure 4: Cross-Project Performance of
CODER and the 2 strongest baselines under various 𝐾, 𝐾 ∈ [5, 100].
Figure 5: t-SNE Visualization of the file embeddings on the
ML dataset produced by (a) Matrix Factorization; (b) LightGCN; (c)
NCL; (d) CODER. Files in the same color fall under
the same repositories.
classification [66], code search [31], connecting publications with
projects [40]. Although previous works have explored the
recommendation task in OSS development settings such as automatic
suggestions of API function calls [19, 37], Good First Issues [1, 53],
and data preparation steps [57], no previous works have studied
the challenging task of code recommendation task, which requires
in-depth understanding of OSS projects written in multiple programming
languages and diverse user-item interactions.
5.2 Recommender Systems
The advances in deep learning have greatly facilitated the evolution
of recommender systems [3, 16, 49, 64, 65]. In particular, motivated
by the success of Graph Neural Networks (GNN) [15, 26, 70, 71],
a series of graph-based recommender systems [27, 49, 52] are proposed,
which organize user behaviors into heterogeneous interaction graphs.
These methods formulate item recommendation as
link prediction or representation learning tasks [48, 67], and utilize
high-order relationships to infer user preferences, item attributes,
and collaborative filtering signals [17, 40, 47, 51, 60]. Noticeably,
traditional recommendation models cannot be easily transferred to
code recommendation as they do not model unique signals in OSS
development, such as project hierarchies and code semantics.
6 CONCLUSION AND FUTURE WORKS
In this work, we are the first to formulate the task of code
recommendation for open source developers. We propose CODER,
a code recommendation model suitable for open source projects
written in diverse languages. Extensive experiments on 3 datasets
demonstrate the superior performances of our method. Currently,
our approach only considers recommending existing files to users.
As CODER harnesses the metadata and semantic features of files, it
cannot deal with users creating new files where such information of
the candidate files is absent. We plan to generalize our framework
by allowing users to initialize files under their interested
subdirectories. Meanwhile, our source code encoding scheme can be
further
improved by harnessing knowledge about programming languages.
For example, previous works explored the use of Abstract Syntax
Tree (AST) [36] and data flow [14, 19] (graphs that represent
dependency relation between variables) on language-specific tasks. Our
current encoding scheme is a computationally efficient way to deal
with the diversity of programming languages. Moreover, the user
representations can be further enhanced by modeling users’ social
relations [6, 33] and behaviors [23, 56, 58] Overall, future works
Code Recommendation for Open Source Software Developers
can incorporate domain knowledge about programming languages
and social information about the users to improve the item and
user representations at a finer granularity.
REFERENCES
[1] Jan Willem David Alderliesten and Andy Zaidman. 2021. An Initial Exploration
of the “Good First Issue” Label for Newcomer Developers. In 2021 IEEE/ACM 13th
International Workshop on Cooperative and Human Aspects of Software Engineering
(CHASE). IEEE, 117–118.
[2] Hudson Borges, Andre Hora, and Marco Tulio Valente. 2016. Understanding
the factors that impact the popularity of GitHub repositories. In ICSME. IEEE,
334–344.
[3] Jin Chen, Defu Lian, Binbin Jin, Kai Zheng, and Enhong Chen. 2022. Learning
Recommenders for Implicit Feedback with Importance Resampling. In WWW.
1997–2005.
[4] Jailton Coelho, Marco Tulio Valente, Luciano Milen, and Luciana L
Silva. 2020. Is
this GitHub project maintained? Measuring the level of maintenance activity of
open-source projects. Information and Software Technology 122 (2020), 106274.
[5] Roberto Di Cosmo and Stefano Zacchiroli. 2017. Software Heritage:
Why and How
to Preserve Software Source Code. In iPRES 2017-14th International Conference
on Digital Preservation. 1–10.
[6] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin.
2019. Graph neural networks for social recommendation. In WWW. 417–426.
[7] Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong,
Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. CodeBERT: A
PreTrained Model for Programming and Natural Languages. In Findings of
EMNLP
2020. 1536–1547.
[8] Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation
Learning with
PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and
Manifolds.
[9] Marco Gerosa, Igor Wiese, Bianca Trinkenreich, Georg Link, Gregorio Robles,
Christoph Treude, Igor Steinmacher, and Anita Sarma. 2021. The shifting sands
of motivation: Revisiting what drives contributors in open source. In
ICSE. IEEE,
1046–1058.
[10] GitHub. 2016. The State of the Octoverse.
https://octoverse.github.com/2016/
[11] GitHub. 2022. Collection: Programming Languages. https://github.com/
collections/programming-languages
[12] GitHub. 2022. Github Number of Repositories. https://github.com/search.
[13] Xavier Glorot and Yoshua Bengio. 2010. Understanding the
difficulty of training
deep feedforward neural networks. In AISTATS. JMLR Workshop and Conference
Proceedings, 249–256.
[14] Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long
Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2021. GraphCodeBERT:
Pre-training Code Representations with Data Flow. In ICLR.
[15] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive
representation
learning on large graphs. NIPS 30 (2017).
[16] Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou
Sun, and Wei Wang. 2020. P-companion: A principled framework for diversified
complementary product recommendation. In CIKM. 2517–2524.
[17] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng
Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for
recommendation. In SIGIR. 639–648.
[18] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng
Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th
international
conference on world wide web. 173–182.
[19] Xincheng He, Lei Xu, Xiangyu Zhang, Rui Hao, Yang Feng, and Baowen Xu.
2021. Pyart: Python api recommendation in real-time. In 2021 IEEE/ACM 43rd
International Conference on Software Engineering (ICSE). IEEE, 1634–1645.
[20] Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing
source code with transferred API knowledge. In IJCAI. 2269–2275.
[21] Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc
Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic
code search. arXiv preprint arXiv:1909.09436 (2019).
[22] Jyun-Yu Jiang, Pu-Jen Cheng, and Wei Wang. 2017. Open source repository
recommendation in social coding. In SIGIR. 1173–1176.
[23] Yiqiao Jin, Xiting Wang, Ruichao Yang, Yizhou Sun, Wei Wang, Hao Liao, and
Xing Xie. 2022. Towards fine-grained reasoning for fake news detection. In AAAI,
Vol. 36. 5746–5754.
[24] Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT:
Pre-training of Deep Bidirectional Transformers for Language Understanding. In
NAACL.
[25] Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for
Stochastic Optimization. In ICLR (Poster).
[26] Thomas N Kipf and Max Welling. 2016. Semi-supervised
classification with graph
convolutional networks. In ICLR.
[27] Anchen Li, Bo Yang, Huan Huo, and Farookh Hussain. 2022. Hypercomplex
Graph Collaborative Filtering. In WWW. 1914–1922.
[28] Zihan Lin, Changxin Tian, Yupeng Hou, and Wayne Xin Zhao. 2022. Improving
Graph Collaborative Filtering with Neighborhood-enriched Contrastive Learning.
In Proceedings of the ACM Web Conference 2022. 2320–2329.
[29] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi,
Danqi Chen, Omer
Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A
robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
(2019).
[30] Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical
question-image co-attention for visual question answering. NIPS 29 (2016).
[31] Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish
Chandra. 2019.
Aroma: Code recommendation via structural code search. Proceedings of the ACM
on Programming Languages 3, OOPSLA (2019), 1–28.
[32] Nora McDonald and Sean Goggins. 2013. Performance and participation in open
source software on github. In CHI’13 extended abstracts on human factors in
computing systems. 139–144.
[33] Xin Mei, Xiaoyan Cai, Sen Xu, Wenjie Li, Shirui Pan, and Libin Yang. 2022.
Mutually reinforced network embedding: An integrated approach to research
paper recommendation. Expert Systems with Applications (2022), 117616.
[34] Antonio Valerio Miceli-Barone and Rico Sennrich. 2017. A Parallel Corpus of
Python Functions and Documentation Strings for Automated Code
Documentation and Code Generation. In Proceedings of the Eighth
International Joint
Conference on Natural Language Processing (Volume 2: Short Papers). 314–319.
[35] Nadia Nahar, Shurui Zhou, Grace Lewis, and Christian Kästner.
2022. Collaboration Challenges in Building ML-Enabled Systems:
Communication, Documentation, Engineering, and Process. Organization
1, 2 (2022), 3.
[36] Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Anh Nguyen, Lily Mast,
Eli Rademacher, Tien N Nguyen, and Danny Dig. 2016. API code recommendation
using statistical learning from fine-grained changes. In SIGSOFT. 511–522.
[37] Phuong T Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas
Degueule, and Massimiliano Di Penta. 2019. Focus: A recommender system
for mining api function calls and usage patterns. In ICSE. IEEE, 1050–1060.
[38] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James
Bradbury, Gregory
Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga,
et al. 2019.
Pytorch: An imperative style, high-performance deep learning library. NIPS 32.
[39] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars
Schmidt-Thieme.
2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI. 452–
461.
[40] Huajie Shao, Dachun Sun, Jiahao Wu, Zecheng Zhang, Aston Zhang, Shuochao
Yao, Shengzhong Liu, Tianshi Wang, Chao Zhang, and Tarek Abdelzaher. 2020.
paper2repo: GitHub repository recommendation for academic papers. In WWW.
629–639.
[41] Nitish Srivastava and Russ R Salakhutdinov. 2012. Multimodal learning with
deep boltzmann machines. NIPS.
[42] Igor Steinmacher, Ana Paula Chaves, Tayana Uchoa Conte, and Marco Aurélio
Gerosa. 2014. Preliminary empirical identification of barriers faced
by newcomers
to Open Source Software projects. In 2014 Brazilian Symposium on Software
Engineering. IEEE, 51–60.
[43] Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing
data using t-SNE.
JMLR 9, 11 (2008).
[44] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana
Romero, Pietro
Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR.
[45] Rahul Venkataramani, Atul Gupta, Allahbaksh Asadullah, Basavaraju Muddu,
and Vasudev Bhat. 2013. Discovery of technical expertise from open source code
repositories. In WWW. 97–98.
[46] Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and
Philip S Yu. 2018. Improving automatic source code summarization via deep
reinforcement learning. In ASE. 397–407.
[47] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019.
Neural graph collaborative filtering. In SIGIR. 165–174.
[48] Xiang Wang, Tinglin Huang, Dingxian Wang, Yancheng Yuan, Zhenguang Liu,
Xiangnan He, and Tat-Seng Chua. 2021. Learning intents behind interactions
with knowledge graph for recommendation. In WWW. 878–887.
[49] Xiting Wang, Kunpeng Liu, Dongjie Wang, Le Wu, Yanjie Fu, and
Xing Xie. 2022.
Multi-level recommendation reasoning over knowledge graphs with
reinforcement learning. In WWW. 2098–2108.
[50] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond,
Clement Delangue,
Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al.
2020. Transformers: State-of-the-art natural language processing. In EMNLP.
38–45.
[51] Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang, and Meng
Wang. 2019.
A neural influence diffusion model for social recommendation. In Proceedings
of the 42nd international ACM SIGIR conference on research and development in
information retrieval. 235–244.
[52] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and
Tieniu Tan. 2019.
Session-based recommendation with graph neural networks. In AAAI, Vol. 33.
346–353.
Jin et al.
[53] Wenxin Xiao, Hao He, Weiwei Xu, Xin Tan, Jinhao Dong, and Minghui
Zhou. 2022.
Recommending good first issues in GitHub OSS projects. In ICSE. 1830–1842.
[54] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018.
How Powerful
are Graph Neural Networks?. In ICLR.
[55] Wenyuan Xu, Xiaobing Sun, Xin Xia, and Xiang Chen. 2017. Scalable relevant
project recommendation on GitHub. In Proceedings of the 9th
Asia-Pacific Symposium on Internetware. 1–10.
[56] Weizhi Xu, Junfei Wu, Qiang Liu, Shu Wu, and Liang Wang. 2022.
Mining Finegrained Semantics via Graph Neural Networks for
Evidence-based Fake News
Detection. arXiv preprint arXiv:2201.06885 (2022).
[57] Cong Yan and Yeye He. 2020. Auto-suggest: Learning-to-recommend
data preparation steps using data science notebooks. In SIGMOD.
1539–1554.
[58] Ruichao Yang, Xiting Wang, Yiqiao Jin, Chaozhuo Li, Jianxun Lian, and Xing
Xie. 2022. Reinforcement Subgraph Reasoning for Fake News Detection. In KDD.
2253–2262.
[59] Yunwen Ye and Kouichi Kishida. 2003. Toward an understanding of
the motivation
of open source software developers. In ICSE. IEEE, 419–429.
[60] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton,
and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale
recommender systems. In KDD. 974–983.
[61] Xueli Yu, Weizhi Xu, Zeyu Cui, Shu Wu, and Liang Wang. 2021. Graph-based
Hierarchical Relevance Matching Signals for Ad-hoc Retrieval. In WWW. 778–
787.
[62] Yue Yu, Huaimin Wang, Gang Yin, and Tao Wang. 2016. Reviewer
recommendation for pull-requests in GitHub: What can we learn from
code review and bug
assignment? Information and Software Technology 74 (2016), 204–218.
[63] Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, and Qi Tian. 2019. Deep modular
co-attention networks for visual question answering. In CVPR. 6281–6290.
[64] Fajie Yuan, Xiangnan He, Haochuan Jiang, Guibing Guo, Jian Xiong, Zhezhao
Xu, and Yilin Xiong. 2020. Future data helps training: Modeling future contexts
for session-based recommendation. In WWW. 303–313.
[65] Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang.
2021. Mining Latent Structures for Multimedia Recommendation. In ACM MM.
3872–3880.
[66] Yu Zhang, Frank F Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, and
Jiawei Han. 2019.
Higitclass: Keyword-driven hierarchical classification of github
repositories. In
ICDM. IEEE, 876–885.
[67] Yu Zheng, Chen Gao, Liang Chen, Depeng Jin, and Yong Li. 2021.
DGCN: Diversified Recommendation with Graph Convolutional Networks. In
WWW. 401–412.
[68] Yu Zheng, Chen Gao, Xiang Li, Xiangnan He, Yong Li, and Depeng Jin. 2021.
Disentangling user interest and conformity for recommendation with causal
embedding. In WWW. 2980–2991.
[69] Jiaxin Zhu, Minghui Zhou, and Audris Mockus. 2014. Patterns of
folder use and
project popularity: A case study of GitHub repositories. In Proceedings of the
8th ACM/IEEE International Symposium on Empirical Software Engineering and
Measurement. 1–4.
[70] Yanqiao Zhu, Weizhi Xu, Jinghao Zhang, Qiang Liu, Shu Wu, and Liang Wang.
2021. Deep graph structure learning for robust representations: A survey. arXiv
preprint arXiv:2103.03036 (2021).
[71] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021.
Graph contrastive learning with adaptive augmentation. In WWW. 2069–2080.
1
0
Cryptocurrency: Flawed Fundamentals to Soon Fail the Crypto Movement, Avalanche-AVAX
by professor rat 21 Oct '22
by professor rat 21 Oct '22
21 Oct '22
FREEDOM without anarchism is privilege, injustice and scumbag entitlement as has been tediously propagated here for years by Granpa.
He is Too Stupid To Be an Activist Let Alone Anarchist ( TSTBALAA )
Gramps wouldn't know globalized anarchist revolution from a hole in the ground.
No skin-in-the-game, no nous, no clue, no balls and no brains. " Grarpamps " for short.
Take your whiny circle-of-eunuchs - Karl and Juan - and fuck the Hell off then loser.
And don't let the door hit your ass on the way out.
1
0
Extended Discussion of Cryptocurrencies and How They Are Vital to Argentinians
by professor rat 21 Oct '22
by professor rat 21 Oct '22
21 Oct '22
https://www.econlib.org/roberts-and-zuegel-on-argentine-inflation-and-crypt…
Reposts are bad - good reposts are fatal
1
0