[random] Paper on Simulating Coders

Fri Oct 21 05:07:53 PDT 2022

This is just a random paper I bumped into today. I'm not totally sure
what it's about.
The source is not released yet, but I believe it also links to prior
work if the topic is interesting.
Apologies for the barebones copypaste without formatting.

https://arxiv.org/abs/2210.08332

Code Recommendation for Open Source Software Developers
Yiqiao Jin
yjin328 at gatech.edu
Georgia Institute of Technology
Atlanta, GA, USA
Yunsheng Bai
yba at cs.ucla.edu
University of California, Los Angeles
Los Angeles, CA, USA
Yanqiao Zhu
yzhu at cs.ucla.edu
University of California, Los Angeles
Los Angeles, CA, USA
Yizhou Sun
yzsun at cs.ucla.edu
University of California, Los Angeles
Los Angeles, CA, USA
Wei Wang
weiwang at cs.ucla.edu
University of California, Los Angeles
Los Angeles, CA, USA
ABSTRACT
Open Source Software (OSS) is forming the spines of technology
infrastructures, attracting millions of talents to contribute. Notably,
it is challenging and critical to consider both the developers’
interests and the semantic features of the project code to recommend
appropriate development tasks to OSS developers. In this paper,
we formulate the novel problem of code recommendation, whose
purpose is to predict the future contribution behaviors of developers
given their interaction history, the semantic features of source
code, and the hierarchical file structures of projects. Considering
the complex interactions among multiple parties within the system,
we propose CODER, a novel graph-based code recommendation
framework for open source software developers. CODER jointly
models microscopic user-code interactions and macroscopic userproject
interactions via a heterogeneous graph and further bridges
the two levels of information through aggregation on file-structure
graphs that reflect the project hierarchy. Moreover, due to the lack
of reliable benchmarks, we construct three large-scale datasets to
facilitate future research in this direction. Extensive experiments
show that our CODER framework achieves superior performance
under various experimental settings, including intra-project,
crossproject, and cold-start recommendation. We will release all the
datasets, code, and utilities for data retrieval upon the acceptance
of this work.
CCS CONCEPTS
• Information systems → Collaborative filtering; Web and social media
search; Social recommendation; Personalization.
KEYWORDS
Code recommendation; recommender system; open source software
development; multimodal recommendation; graph neural networks
1 INTRODUCTION
Open Source Software (OSS) is becoming increasingly popular in
software engineering [22, 45]. As contribution to OSS projects
is highly democratized [62], these projects attract millions of
developers with diverse expertise and efficiently crowd-source the
project development to a larger community of developers beyond
the project’s major personnel [22, 32]. For instance, GitHub, one
of the most successful platforms for developing and hosting OSS
projects, has over 83 million users and 200 million repositories [12].
models/
roberta/
bert/ modeling_bert.pytokenization_bert.pymodeling_roberta.pytokenization_roberta.pydata/
transformers
</>Param loading</>Embeddings</>Modelssrc/transformers
Figure 1: An example of the transformers repository. OSS
projects under similar topics usually adopt similar naming
conventions and file structures, which can be seen as knowledge
transferable across projects.
Community support and teamwork are major driving forces behind open
source projects [32]. OSS projects are usually developed
in a collaborative manner [2], whereas collaboration in OSS is
especially challenging. OSS projects are of large scales and usually
contain numerous project files written in diverse programming
languages [4]. According to statistics, the most popular 500 GitHub
projects contain an average of 2,582 project files, 573 directories, and
360 contributors. Meanwhile, there are more than 300 programming
languages on GitHub, 67 of which are actively being used [10, 11].
For project maintainers, it is both difficult and time-consuming
to find competent contributors within a potentially large candidate
pool. For OSS developers, recommending personalized development tasks
according to their project experience and expertise can
significantly boost their motivation and reduce their cognitive loads
of manually checking the project files. As contribution in OSS is
voluntary, developers that fail to find meaningful tasks are likely
to quit the project development [42]. Therefore, an efficient system
for automatically matching source code with potential contributors
is being called for by both the project core team and the potential
contributors to reduce their burden.
To solve the above issues, in this paper, we for the first time
introduce the novel problem of code recommendation for OSS developers.
As shown in Fig. 2, this task recommends code in the form
of project files to potentially suitable contributors. It is noteworthy
arXiv:2210.08332v2 [cs.SE] 20 Oct 2022
Jin et al.
that code recommendation has several unique challenges such that
traditional recommender models are not directly applicable.
Firstly, OSS projects contain multimodal interactions among
users, projects, and code files. For example, OSS development contains
user-code interactions, such as commits that depict microscopic
behaviors of users, and user-project interactions, such as
forks and stars that exhibit users’ macroscopic preferences and
interests on projects. Also, the contribution relationships are often
extremely sparse, due to the significant efforts required to make a
single contribution to OSS projects. Therefore, directly modeling
the contribution behavior as in traditional collaborative filtering
approaches will inevitably lead to inaccurate user/item
representations and suboptimal performances.
Secondly, in the software engineering domain, code files in a
project are often organized in a hierarchical structure [61]. Fig. 1
shows an example of the famous huggingface/transformers repository
[50]. The src directory usually contains the major source
code for a project. The data and models subdirectories usually
include functions for data generation and model implementations,
respectively. Such a structural organization of the OSS project
reveals semantic relations among code snippets, which are helpful
for developers to transfer existing code from other projects to their
development. Traditional methods usually ignore such item-wise
hierarchical relationships and, as a result, are incapable of
connecting rich semantic features in code files with their
project-level
structures, which is required for accurate code recommendation.
Thirdly, most existing benchmarks involving recommendation
for softwares only consider limited user-item behaviors [5, 20], are
of small scales [36, 37], or contain only certain languages such as
Python [19, 34, 46] or Java [5, 20, 37], which renders the evaluation
of different recommendation models difficult or not realistic.
To overcome the above challenges, we propose CODER, a CODE
Recommendation framework for open source software developers
that matches project files with potential contributors. As shown in
Fig. 2, CODER treats users, code files, and projects as nodes and
jointly models the microscopic user-code interactions and macroscopic
user-project interactions in a heterogeneous graph. Furthermore, CODER
bridges these two levels of information through
message aggregation on the file structure graphs that reflect the
hierarchical relationships among graph nodes. Additionally, since
there is a lack of benchmark datasets for the code recommendation
task, we build three large-scale datasets from open software
development websites. These datasets cover diverse subtopics in
computer science and contain up to 2 million fine-grained user-file
interactions. Overall, our contributions are summarized as follows:
• We for the first time introduce the problem of code recommendation,
whose purpose is to recommend appropriate development
tasks to developers, given the interaction history of developers,
the semantic features of source code, and hierarchical structures of
projects.
• We propose CODER, an end-to-end framework that jointly
models structural and semantic features of source code as well as
multiple types of user behaviors for improving the matching task.
• We construct three large-scale multi-modal datasets for code
recommendation that cover different topics in computer science to
facilitate research on code recommendation.
• We conduct extensive experiments on massive datasets to
demonstrate the effectiveness of the proposed CODER framework
and its design choices.
2 PROBLEM FORMULATION
Before delving into our proposed CODER framework, we first formalize
our code recommendation task. We use the terms “repository” and
“project” interchangeably to refer to an open source
project. We define U, V, R as the set of users, files, and
repositories, respectively. Each repository 𝑟𝑘 ∈ R contains a subset
of
files V𝑘 ⊊ V. Both macroscopic project-level interactions and
microscopic file-level interactions are present in OSS development.
File-level behaviors. We define Y ∈ {0, 1}
|U |× |V | as the interaction matrix between U and V for the
file-level contribution
behavior, where each entry is denoted by 𝑦𝑖𝑗 . 𝑦𝑖𝑗 = 1 indicates that
𝑢𝑖 has contributed to 𝑣𝑗
, and 𝑦𝑖𝑗 = 0, otherwise.
Project-level behaviors. Interactions at the project level are more
diverse. For example, the popular code hosting platform GitHub
allows users to star (publicly bookmark) interesting repositories
and watch (subscribe to) repositories for updates. We thus define
T as the set of user-project behaviors. Similar to Y, we define S𝑡 ∈
{0, 1}
|U |× |R | as the project-level interaction matrix for behavior
of type 𝑡. Our goal is to predict the future file-level contribution
behaviors of users based on their previous interactions. Formally,
given the training data Y
tr, we try to predict the interactions in the
test set 𝑦𝑖𝑗 ∈ Y
ts = Y\Y
tr
.
3 METHODOLOGY
As shown in Fig. 2, we design CODER, a two-stage graph-based
recommendation framework. CODER considers 𝑢𝑖 ∈ U, 𝑣𝑗 ∈ V, 𝑟𝑘 ∈
R as graph nodes, and models the user-item interactions and the
item-item relations as edges. We use two sets of graphs to
characterize the heterogeneous information in code recommendation. One
is
the user-item interaction graphs that encompass the collaborative
signals. The other is the file-structure graphs that reveal file-file and
file-project relationships from the project hierarchy perspective.
The code recommendation problem is then formulated as a user-file
link prediction task.
CODER contains two major components: 1) Node Semantics
Modeling, which learns the fine-grained representations of project
files by fusing code semantics with their historical contributors,
and then aggregate project hierarchical information on the file
structure graph to learn the file and repository representation; 2)
Multi-behavioral Modeling, which jointly models the microscopic
user-file interactions and macroscopic user-project interactions.
Finally, CODER fuses the representations from multiple behaviors
for prediction. This way, node semantics modeling bridges the
coarse-grained and fine-grained interaction signals on the item
side. Therefore, CODER efficiently characterizes intra-project and
inter-project differences, eventually uncovering latent user and
item features that explain the interactions Y.
3.1 Node Semantics Modeling
Node semantics modeling aims to learn file and repository
representation. The challenge is how to inherently combine the
semantic features of each project file with its interacted users and
the
Code Recommendation for Open Source Software Developers
(b) Structural-Level Aggregation
𝑟ଵ
(d) Project-Level Aggregation
𝑟ଵ
𝑟ଶ
𝐮௜∗𝐯௝∗𝐫௧,௞∗𝐯௝𝑠ி(𝑖, 𝑗)Node Semantics Modeling
</>
(c) File-Level Aggregation
𝑣ଵ
𝑣ଶ
𝑢ଵ
𝑢ଵ
𝑣ଵ 𝑣ଶ 𝑣ଷ
𝐐 = 𝐪ଵ~ேೂ
𝐂 = 𝐜ଵ~ே಴
Multi-Behavioral Modeling
(a) Code-User Modality Fusion
𝐡 = 𝑓஼௢௔௧௧(𝐂, 𝐐)
𝐮௜Input Predictionrepository 𝑟௞
user 𝑢௜
file 𝑣௝
code
segment
𝑐௜
</>
</>
𝑐ଵ
𝑐ଶ
def add(x, y):
return x + y
𝐳௧,௜ ∗𝑠௉(𝑖, 𝑘, 𝑡)star
contribution
𝐋
File Structure
Graph
User-File
Interaction
Graph
User-Project
Interaction
Graphs
Figure 2: Our proposed CODER framework for code recommendation. CODER
jointly considers project file structures, code semantics, and user
behaviors. CODER models the microscopic file-level interactions and
macroscopic project-level interactions
through Multi-Behavioral Modeling, and bridges the micro/macro-scopic
signals through Node Semantics Modeling.
project hierarchy. To address this challenge, we first use a codeuser
modality fusion mechanism (Fig. 2a) to fuse the file content
modality and the historical users at the code level. Then, we embed
the fine-grained collaborative signals from user-file interactions
into the file representations. Next, we employ structural-level
aggregation (Fig. 2b), which explicitly models the project structures
as hierarchical graphs to enrich the file/repository representation
with structural information. This step produces representation for
each file 𝑣𝑗 and repository 𝑟𝑘
, which serve as the input for user
behavior modeling in Sec. 3.2.
3.1.1 Code-User Modality Fusion. A project file is characterized by
diverse semantic features including multiple method declarations
and invocations, which are useful for explaining why a contributor
is interested in it. We therefore use pretrained CodeBERT [7], a
bimodal language model for programming languages, to encode
the rich semantic features of each file. CodeBERT is shown to
generalize well to programming languages not seen in the pretraining
stage, making it suitable for our setting where project files
are written in diverse programming languages. Here, a straightforward
way is to directly encode each file into a per-file latent
representation. Such an encoding scheme has two issues. Firstly, a
file may contain multiple classes and function declarations that
are semantically distinct. Fig. 1 shows the file structure of the
huggingface/transformers [50] repository as an example. The
modeling_bert.py file contains not only various implementations
of the BERT language model for different NLP tasks, but also utilities
for parameter loading and word embeddings. These implementations are
distributed among several code segments in the same
project file, and file-level encoding can fail to encode such semantic
correlations. Secondly, the property of a project file is influenced
by the historical contributors’ features. A user’s contribution can
be viewed as injecting her/his own attributes, including programming
style and domain knowledge, into the interacted file. Such
contribution behaviors make it more likely to be interacted again
by users with similar levels of expertise than random contributors.
Therefore, we propose a code-user modality fusion strategy
to embed both code semantics and user characteristics into the file
representation. Specifically, for each file, we partition its source
code into 𝑁𝐶 code segments and encode each of them into a
codesegment-level representation c𝑖
. This produces a feature map C =
[c1, c2, . . . c𝑁𝐶
], C ∈ R
𝑁𝐶 ×𝑑
, where 𝑑 is the embedding size. Similarly, we sample 𝑁𝑄 historical
users of the file and encode them
into a feature map Q = [u1, u2, . . . u𝑁𝑄
], Q ∈ R
𝑁𝑄 ×𝑑
. Please refer
to Appendix ?? for details in encoding C, Q. Inspired by the success
of co-attention [30, 63], we transform the user attention space
to code attention space by calculating a code-user affinity matrix
L ∈ R
𝑁𝐶 ×𝑁𝑄 :
L = tanh
CW𝑂 Q
⊤

, (1)
where W𝑂 ∈ R
𝑑×𝑑
is a trainable weight matrix. Next, we compute
the attention weight a ∈ R
𝑁𝐶 of the code segments to select salient
features from C. We treat the affinity matrix as a feature and learn
an attention map H with 𝑁𝐻 representation:
H = tanh
W𝐶C
⊤ + W𝑄 (LQ)
⊤

, (2)
a = softmax
w⊤
𝐻
H

, (3)
where W𝐶, W𝑄 ∈ R
𝑁𝐻 ×𝑑
, w𝐻 ∈ R
𝑁𝐻 are the weight parameters. Finally, the file attention
representation h is calculated as the
weighted sum of the code feature map:
h = a
⊤C. (4)
The file attention representation serves as a start point to further
aggregate file structural feature
Jin et al.
3.1.2 Structural-Level Aggregation. Projects are organized in a
hierarchical way such that nodes located closer on the file structure
graph are more closely related in terms of semantics and
functionality. For example, in Fig. 1, both files under the bert/
directory contain source code for the BERT [24] language model, and
files under
roberta/ contains implementation for the RoBERTa [29] model.
The file modeling_bert.py is therefore more closely related to
tokenization_bert.py in functionality than to tokenization_
roberta.py.
To exploit such structural clues, we model each repository as a
hierarchical heterogeneous graph 𝐺𝑆 consisting of file, directory,
and repository nodes. Each node is connected to its parent node
through an edge, and nodes at the first level are directly connected
to the virtual root node representing the project. To encode the
features of directory nodes, we partition the directory names into
meaningful words according to underscores and letter capitalization,
then encoded the nodes by their TF-IDF features. Our encoding
scheme is motivated by the insight that the use of standard directory
names (e.g., doc, test, models) is correlated with project popularity
among certain groups of developers [2, 69]. Repository features
are encoded by their project owners, creation timestamps, and their
top-5 programming languages. The repository and directory
representations are mapped to the same latent space as the file nodes.
Then, we pass the representation h through multiple GNN layers
to aggregate the features of each node from its neighbors on 𝐺𝑆 .
eh = 𝑓GNN(h,𝐺𝑆 ), (5)
where eh is the structure-enhanced node representation. The
aggregation function 𝑓GNN(·) can be chosen from a wide range of GNN
operators, such as GCN [26], GraphSAGE [15], and GIN [54]. In
practice, we employ a 3-layer Graph Attention Network (GAT) [44].
3.2 Multi-behavioral Modeling
Direct modeling of the sparse contribution behavior potentially
leads to inaccurate user/item representations and aggravates the
cold-start issue. Instead, we jointly model the microscopic user-file
contribution in File-level Aggregation (Fig. 2c) and macroscopic
user-project interactions in Project-level Aggregation (Fig. 2d) to
learn user preferences and address the sparsity issue. Then, the
representations learned from multi-level behaviors are combined
to form the user and item representations for prediction.
3.2.1 File-level Aggregation. We model the project files and their
contributors as an undirected user-file bipartite graph G𝐹 consisting
of users 𝑢𝑖 ∈ U, files 𝑣𝑗 ∈ V and their interactions. The initial
embedding matrix of users/items is denoted by E
(0)
, which serves
as an initial state for end-to-end optimization.
E
(0) = [u
(0)
1
, · · · , u
(0)
|U |
| {z }
users embeddings
, v
(0)
1
, · · · , v
(0)
|V |
| {z }
item embedding
], (6)
where u
(0)
𝑖
is the initial embeddings for user 𝑢𝑖
, and v
(0)
𝑗
is the
initial embeddings for file 𝑣𝑗 equivalent to its structure-enhanced
representation eh (Sec. 3.1.2). We adopt the simple weight sum
aggregator in LightGCN [17] in the propagation rule:
u
(𝑙)
𝑖
=
Í
𝑣𝑗 ∈N𝑖
√︃ 1
|N𝑖
||N𝑗 |
v
(𝑙−1)
𝑗
, (7)
v
(𝑙)
𝑗
=
Í
𝑢𝑖 ∈N𝑗
√︃ 1
|N𝑗 | |N𝑖
|
u
(𝑙−1)
𝑖
, (8)
where u
(𝑙)
𝑖
and v
(𝑙)
𝑗
are the embeddings for user 𝑢𝑖 and file 𝑣𝑗
at layer 𝑙. N𝑖 and N𝑗
indicate the neighbors of user 𝑢𝑖 and file
𝑣𝑗
. 1/
√︃
|N𝑖
|

N𝑗

is the symmetric normalization term set to the
graph Laplacian norm to avoid the increase of GCN embedding
sizes [17, 26]. In matrix form, the propagation rule of file-level
aggregation is expressed as:
E
(𝑙) = D
−1/2AD−1/2E
(𝑙−1)
, A =

0 Ytr
Y
tr⊤ 0

, (9)
where A ∈ R
( |U |+ |V |)×( |U |+ |V |) is the affinity matrix. D is the diagonal
degree matrix in which each entry D𝑖𝑖 indicates the number
of non-zero entries on the i-th row of A. By stacking multiple layers,
each user/item node aggregates information from its higher-order
neighbors. Propagation through 𝐿 layers yields a set of representations {E
(𝑙)
}
𝐿
𝑙=0
. Each E
(𝑙)
emphasizes the messages from its 𝑙-hop
neighbors. We apply mean-pooling over all E
(𝑙)
to derive the user
and file representations u
∗
𝑖
and v
∗
𝑗
from different levels of user/item
features:
u
∗
𝑖
=
1
𝐿 + 1
∑︁
𝐿
𝑙=0
u
(𝑙)
𝑖
, v
∗
𝑗
=
1
𝐿 + 1
∑︁
𝐿
𝑙=0
v
(𝑙)
𝑗
. (10)
3.2.2 Project-Level Aggregation. OSS development is characterized
by both microscopic contribution behaviors and multiple types
of macroscopic project-level behaviors. For example, developers
usually find relevant projects and reuse their functions and explore
ideas of possible features [19, 21]. In particular, GitHub users
can star (bookmark) interesting repositories and discover projects
under similar topics. This way, developers can adapt code
implementation of these interesting projects into their own
development
later. Hence, project-level macroscopic interactions are conducive
for extracting users’ broad interests.
For each behavior𝑡, we propagate the user and repository embeddings
on its project-level interaction graph G
𝑡
𝑃
. The initial embeddings Z
(0)
is shared by all 𝑡 ∈ T and is composed of the initial user
representations identical to Eq. 6 and the repository embeddings
from the structure-enhanced representation eh in Eq. 5:
Z
(0) = [z
(0)
1
, z
(0)
2
, . . . z
(0)
|U |
| {z }
user embeddings
, r
(0)
1
, r
(0)
2
, . . . r
(0)
|R |
| {z }
repository embeddings
], (11)
Z
𝑙
𝑡 = D
−1/2
𝑡
Λ𝑡D
−1/2
𝑡
Z
(𝑙−1)
𝑡
, (12)
where z
(0)
𝑖
= u
(0)
𝑖
. Λ𝑡 ∈ R
( |U |+ |R |)×( |U |+ |R |) is the affinity matrix for behavior 𝑡
constructed similarly as A in Eq. 9. Agg(·) is
an aggregation function. With representations {Z
(𝑙)
𝑡
}
𝐿
𝑙=0
obtained
from multiple layers, we derive the combined user and repository
Code Recommendation for Open Source Software Developers
representations for behavior 𝑡 as
z
∗
𝑡,𝑖 =
1
𝐿 + 1
∑︁
𝐿
𝑙=0
z
(𝑙)
𝑡,𝑖 , r
∗
𝑡,𝑖 =
1
𝐿 + 1
∑︁
𝐿
𝑙=0
r
(𝑙)
𝑡,𝑖 . (13)
3.3 Prediction
For file-level prediction, we aggregate the macroscopic signals
z
∗
𝑡,𝑖, r
∗
𝑡,𝑖 from each behavior 𝑡 into u𝑖
, v𝑗
:
z
∗
𝑖
= Agg(𝑧
∗
𝑡
, 𝑡 ∈ 𝑇 ), r
∗
𝑘
= Agg(𝑟
∗
𝑡
, 𝑡 ∈ 𝑇 ), (14)
u𝑖 = MLP( [u
∗
𝑖
||z
∗
𝑖
]), v𝑗 = MLP( [v
∗
𝑗
||r
∗
𝜙 (𝑗)
]), (15)
where MLP(·) is a multilayer perceptron. 𝜙 (·) : V → R maps the
index of each project file to its repository. || is the concatenation
operator. On the user side, both macroscopic interests and microlevel
interactions are injected into the user representations. On the
item side, the semantics of each file is enriched by its interacted
users and the repository structural information.
For computational efficiency, we employ inner product to calculate the user 𝑢𝑖
’s preference towards each file 𝑣𝑗
:
𝑠𝐹 (𝑖, 𝑗) = u
⊤
𝑖
v𝑗
, (16)
where 𝑠𝐹 is the scoring function for the file-level behavior. Similarly,
for each user-project pair, we derive a project-level score for each
behavior 𝑡 using the project-level scoring function 𝑠𝑃 :
𝑠𝑃 (𝑖, 𝑘, 𝑡) = z
∗⊤
𝑡,𝑖 r
∗
𝑡,𝑘 . (17)
3.4 Optimization
We employ the Bayesian Personalized Ranking (BPR) [39] loss, which
encourages the prediction of an observed user-item interaction to
be greater than an unobserved one:
L𝐹 =
∑︁
(𝑖,𝑗+,𝑗−) ∈𝑂
− log(sigmoid(𝑠𝐹 (𝑖, 𝑗+
) − 𝑠𝐹 (𝑖, 𝑗−
))), (18)
L
𝑡
𝑃
=
∑︁
(𝑖,𝑘+,𝑘−) ∈𝑂
− log(sigmoid(𝑠𝑃 (𝑖, 𝑘+
, 𝑡) − 𝑠𝑃 (𝑖, 𝑘−
, 𝑡))), (19)
where L𝐹 is the file-level BPR loss, and L𝑡
𝑃
is the project-level
BPR loss for behavior 𝑡. O denotes the pairwise training data. 𝑗
+
indicates an observed interaction between user 𝑢𝑖 and item 𝑣𝑗
+
and 𝑗
− indicates an unobserved one. As high-order neighboring
relations within contributors are also useful for recommendations,
we enforce users to have similar representations as their structural
neighbors through the structure-contrastive learning objective [28]:
L
𝑈
𝐶
=
∑︁
𝑢𝑖 ∈U
− log
exp
u
(𝜂)
𝑖
· u
(0)
𝑖
/𝜏

Í
𝑢𝑗 ∈U exp
u
(𝜂)
𝑖
· u
(0)
𝑗
/𝜏
  , (20)
Here, 𝜂 is set to an even number so that each user node can aggregate
signals from other user nodes. 𝜏 is the temperature hyperparameter.
Similarly, the contrastive loss is applied to each 𝑣𝑖
:
L
𝑉
𝐶
=
∑︁
𝑣𝑖 ∈V
− log
exp
v
(𝑙)
𝑖
· v
(0)
𝑖
/𝜏

Í
𝑣𝑗 ∈V exp
v
(𝑙)
𝑖
· v
(0)
𝑗
/𝜏
  . (21)
The overall optimization objective is
L = L𝐹 + 𝜆1
∑︁
𝑡 ∈T
L
𝑡
𝑃
+ 𝜆2 (L𝑈
𝐶
+ L𝑉
𝐶
) + 𝜆3 ∥Θ∥2, (22)
Table 1: Summary of the datasets. The “#Files” column
shows the number of files with observed interactions instead of all
existing files in the projects.
Dataset #Files #Users #Interactions Density
ML 239,232 21,913 663,046 1.26E-4
DB 415,154 30,185 1,935,155 1.54E-4
FS 568,972 51,664 1,512,809 5.14E-5
where Θ denotes all trainable model parameters. 𝜆1, 𝜆2, 𝜆3 are
hyperparameters.
4 EXPERIMENTS
4.1 Experimental Settings
4.1.1 Datasets. We collected 3 datasets covering diverse topics in
computer science including machine learning (ML), fullstack (FS),
and database (DB), using the GitHub API 1
and the PyGithub 2package.
We retain repositories with ≥ 250 stars and ≥ 3 contributors to
exclude repositories intended for private usages [2]. We include
projects with contribution history of at least 3 months according
to their commit history. To ensure that our model generalizes on a
wide range of topics, popularity, and project scales, we first select
3 subsets of repositories using their GitHub topics 3
, which are
project labels created by the project owners. Then, we randomly
sample 300 repositories from each subset considering their numbers
of project files and stars. We use the unix timestamp 1550000000
and 1602000000 to partition the datasets into train/val/test sets.
This way, all interactions before the timestamp are used as the
training data. We retain the users with at least 1 interaction in both
train and test set. More details about dataset construction are in
the appendix.
4.1.2 Implementation Details. We implemented our CODER model
in PyTorch [38] and PyG [8]. For all models, we set the embedding
size to 32 and perform Xavier initialization [13] on the model
parameters. We use Adam optimizer [25] with batch size of 1024. For
Node Semantic Modeling (Sec. 3.1), we set 𝑁𝐶 = 8 and 𝑁𝑄 = 4. The
code encoder we use is the pretrained CodeBERT [7] model with
6 layers, 12 attention heads, and 768-dimensional hidden states.
For Multi-Behavioral Modeling (Sec. 3.2), we set the number of
convolution layers 𝐿 = 4 for both intra- and inter-level aggregation.
For prediction and optimization, we search the hyperparameter 𝜆3
in [1𝑒 − 4, 1𝑒 − 3, 1𝑒 − 2], and 𝜆1 in [1𝑒 − 2, 1𝑒 − 1, 1]. For structure
contrastive loss [28], we adopt the hyperparameter setting from the
original implementation and set 𝜆2 = 1𝑒 − 6, 𝜂 = 2 without further
tuning. For the baseline models, the hyperparameters are set to the
optimal settings as reported in their original papers. For all models,
we search the learning rate in [1𝑒 − 4, 3𝑒 − 4, 1𝑒 − 3, 3𝑒 − 3, 1𝑒 − 2].
4.1.3 Baselines. We compare our methods with 3 groups of methods,
including: (G1) factorization-based methods including MF [39];
(G2): neural-network-based methods including MLP [41] and NeuMF [18];
(G3): Graph-based methods that model user-item interactions as graphs,
including NGCF [47], LightGCN [17], and NCL [28].
1https://docs.github.com/en/rest
2https://github.com/PyGithub/PyGithub.git
3https://github.com/topics
Jin et al.
Table 2: The overall performance on 3 datasets. The best performance
is marked in bold. The second best is underlined.
Dataset Metric MF MLP NeuMF NGCF LightGCN NCL CODER Impr.
NDCG at 5 0.065 0.073 0.076 0.091 0.106 0.119 0.132 11.2%
Hit at 5 0.162 0.189 0.189 0.237 0.291 0.276 0.351 20.5%
MRR at 5 0.098 0.113 0.114 0.137 0.164 0.201 0.211 5.0%
NDCG at 10 0.066 0.075 0.081 0.093 0.109 0.118 0.136 14.8%
ML Hit at 10 0.229 0.250 0.263 0.310 0.386 0.337 0.440 14.0%
MRR at 10 0.106 0.121 0.124 0.147 0.177 0.209 0.223 6.7%
NDCG at 20 0.072 0.081 0.084 0.100 0.116 0.120 0.141 17.9%
Hit at 20 0.324 0.343 0.346 0.407 0.457 0.466 0.540 15.8%
MRR at 20 0.113 0.127 0.130 0.154 0.185 0.213 0.230 8.2%
NDCG at 5 0.085 0.079 0.085 0.099 0.082 0.124 0.160 29.0%
Hit at 5 0.205 0.191 0.206 0.263 0.237 0.316 0.390 23.2%
MRR at 5 0.130 0.118 0.128 0.162 0.132 0.252 0.260 3.2%
NDCG at 10 0.086 0.079 0.085 0.100 0.084 0.123 0.159 29.4%
DB Hit at 10 0.267 0.251 0.276 0.361 0.324 0.380 0.488 28.4%
MRR at 10 0.138 0.126 0.137 0.175 0.144 0.260 0.273 4.9%
NDCG at 20 0.088 0.083 0.088 0.103 0.091 0.125 0.160 27.3%
Hit at 20 0.335 0.338 0.362 0.454 0.422 0.437 0.588 29.5%
MRR at 20 0.143 0.132 0.143 0.182 0.150 0.264 0.280 6.0%
NDCG at 5 0.063 0.063 0.067 0.082 0.089 0.106 0.146 37.1%
Hit at 5 0.168 0.178 0.179 0.231 0.245 0.283 0.374 31.9%
MRR at 5 0.100 0.100 0.107 0.132 0.146 0.170 0.226 33.0%
NDCG at 10 0.063 0.065 0.068 0.085 0.092 0.106 0.144 35.6%
FS Hit at 10 0.231 0.244 0.249 0.319 0.332 0.361 0.467 29.3%
MRR at 10 0.109 0.110 0.117 0.144 0.157 0.180 0.239 32.3%
NDCG at 20 0.067 0.070 0.073 0.090 0.095 0.110 0.146 32.7%
Hit at 20 0.307 0.321 0.335 0.406 0.414 0.451 0.559 23.9%
MRR at 20 0.114 0.115 0.122 0.150 0.163 0.187 0.245 31.4%
As the code recommendation task is to predict users’ file-level
contribution, file-level behavior modeling is the most critical
component. Thus, we use file-level contribution behaviors as the
supervision signals as in Eq. 16. For brevity, we use repository
identity
to refer to the information of which repository a file belongs to.
As the baselines do not explicitly leverage the repository identities
of files, we encode their repository identities as a categorical
feature through one-hot encoding during embedding construction.
To ensure fairness of comparison, we incorporate the project-level
interaction signals into the user representations by applying multihot
encoding on the repositories each user has interacted with. All
the baseline models use the same pretrained CodeBERT embeddings as
CODER to leverage the rich semantic features in the source
code.
4.1.4 Evaluation Metrics. Following previous works [17, 18, 47,
68], we choose Mean Reciprocal Rank (MRR at K), Normalized Discounted
Cumulative Gain (NDCG at K), Recall at K (Rec at K) and
Hit at K as the evaluation metrics.
4.2 Performance
4.2.1 Intra-Project Recommendation. In this setting, we evaluate
the model’s ability to recommend development tasks under her
interacted repositories. For each user 𝑢𝑖
, we rank the interactions
under repositories s/he has interacted with in the training set.
This setting corresponds to the scenario in which project maintainers
recommend new development tasks to existing contributors
based on their previous contribution. As shown in Tab. 2, CODER
consistently outperforms the baselines by a large margin. On the
ML dataset, CODER outperforms the best baseline by 17.9% on
NDCG at 20, 15.8% on Hit at 20, and 8.2% on MRR at 20. On the DB
dataset, CODER achieves performance improvements of 27.3% on
NDCG at 20, 29.5% on Hit at 20, and 6.0% on MRR at 20. Notably, the
greatest performance improvement is achieved on the FS dataset,
which has the greatest sparsity. CODER achieves a maximum performance
improvement of 37.1% on NDCG at 5 and 35.6% on NDCG at 10.
The results show that CODER achieves significant performance
improvement over the baseline, and is especially useful when the
observed interactions are scarce.
Among the baselines, graph-based method (G3) achieves better
performances than (G1), (G2) as they can model the high-order
relations between users and items through the interaction graph and
the embedding function. LightGCN [17] underperforms NGCF [47]
on the DB dataset, whose training set has the greatest density, and
outperforms NGCF on the ML and FS datasets. This implies that
the message passing scheme of NGCF, which involves multiple
linear transformation and non-linear activation, is more effective
for denser interactions. Such results justify our design choice in
multi-behavioral modeling, which uses the LightGCN propagation
scheme. NCL exhibits the strongest performance, demonstrating the
importance of the contrastive learning loss in modeling differences
Code Recommendation for Open Source Software Developers
among homogeneous types of nodes, which is also included in our
model design. Neural-network-based methods (G2) generally outperform
matrix factorization (G1), as they leverage multiple feature
transformations to learn the rich semantics in the file embeddings
and the user-file interactions.
4.2.2 Cold-Start Recommendation. User contribution is usually
sparse due to the considerable workload and voluntary nature of
OSS development. In this sense, it is important to accurately capture
the users’ preferences with few observed interactions. Our
model is thus designed according to this principle. We define
coldstart users as users with ≤ 2 interactions in the training set. To
evaluate the model’s performance with fewer interactions, we
choose 𝑁𝐷𝐶𝐺@𝐾, 𝑅𝑒𝑐𝑎𝑙𝑙@𝐾, and 𝐻𝑖𝑡@𝐾, where 𝐾 ∈ {3, 5}. The
strongest 4 baselines in Tab. 2 are evaluated for comparison.
As observed from Tab. 3, performance for cold-start users is
worse than that for all users in Tab. 2. Notably, CODER is able to
achieve even greater performance improvement over the baseline
models. It can be attributed to the following aspects: 1) CODER
learns more accurate representations by fusing the fine-grained
semantics of project files with their interacted users, which
facilitates
the learning of user preferences evem in the absence of dense user
interactions. 2) By explicitly modeling multiple types of projectlevel
behaviors, CODER effectively models the users’ interests to
complement the sparse file-level contribution relations, which is
more effective than encoding the project-level interactions in the
embedding space.
4.2.3 Cross-Project Recommendation. Although 91% developers
in our dataset focused on 1 project throughout their development,
active contributors can work on multiple projects. For these
contributors, the project core team can recommend development tasks
based on their development experiences in previous projects.
During evaluation, we rank the interactions in projects each
user has not yet interacted with in the training set. This setting is
considerably more challenging than intra-project recommendation
since the candidate item pool is significantly larger. According to
the results in Fig. 4, CODER consistently achieves superior
performance by a large margin with respect to the baselines,
especially
for 𝐾 ≥ 20. The results show that CODER jointly learns interproject
differences to choose the correct repositories and characterize
intra-project distinctions to recommend the correct files within
the chosen repositories.
To further validate the above observation, we randomly sample
10 repositories in the MF dataset and visualize their file embeddings
in Fig. 5 using t-SNE [43]. A maximum of 300 files are displayed
per repository for clarity. The file embeddings are obtained using 4
models: (a) Matrix Factorization [39], (b) LightGCN [17], (c)
NCL [28], and (d) our CODER framework. We observe that models
with the contrastive learning objective (NCL and CODER) manifest
better clustering structures. In particular, file embeddings learned
by our CODER framework demonstrate the best cross-repository
differences. The results further prove that CODER jointly models
the intra-project and inter-project differences among files,
effectively distinguishing the files under the same OSS project, which
is
more efficient than directly encoding the repository identity in the
embedding space.
Table 3: File-level link prediction results for cold-start users.
“LGN” stands for the baseline “LightGCN”. The best performance is
marked in bold. The second best is underlined.
Metric NeuMF NGCF LGN NCL CODER Impr.
NDCG at 3 0.059 0.067 0.068 0.090 0.126 40.9%
Hit at 3 0.106 0.123 0.161 0.211 0.224 5.9%
ML MRR at 3 0.081 0.087 0.089 0.119 0.165 38.3%
NDCG at 5 0.068 0.078 0.088 0.105 0.132 25.8%
Hit at 5 0.161 0.162 0.230 0.261 0.273 4.8%
MRR at 5 0.093 0.097 0.105 0.130 0.177 36.0%
NDCG at 3 0.078 0.063 0.055 0.075 0.119 53.0%
Hit at 3 0.152 0.114 0.128 0.165 0.238 44.4%
DB MRR at 3 0.102 0.089 0.070 0.095 0.157 54.0%
NDCG at 5 0.086 0.061 0.064 0.086 0.127 47.5%
Hit at 5 0.195 0.132 0.165 0.220 0.287 30.6%
MRR at 5 0.112 0.093 0.079 0.106 0.168 50.2%
NDCG at 3 0.079 0.075 0.085 0.092 0.128 38.7%
Hit at 3 0.171 0.165 0.179 0.179 0.242 35.6%
FS MRR at 3 0.110 0.095 0.104 0.116 0.171 48.0%
NDCG at 5 0.086 0.086 0.085 0.095 0.137 44.3%
Hit at 5 0.230 0.220 0.202 0.222 0.313 36.2%
MRR at 5 0.124 0.106 0.109 0.125 0.187 49.4%
4.2.4 Ablation Studies. In Fig. 3, we compare the performance of
our model (abbreviated as CD) among its 5 variants. CD-F removes
the code-user modality fusion strategy in Eq. (1). CD-C excludes
the structural contrastive learning objective in Eq. 20-Eq. 21. CD-E
does not use the pretrained CodeBERT embeddings and instead
applies TF-IDF encoding on the source code, a common approach
in project recommendation models [55]. CD-P removes the projectlevel
aggregation in Sec. 3.2.2. CD-S disables the structural-level
aggregation in Sec. 3.1.2. The results on the ML dataset are shown
in Fig. 3. We have the following observations:
First, all 6 variants of CODER outperforms NCL, among which
the full model (CD) performs the best, indicating the importance of
each component in our model design. The performance drops most
significantly when we disable project-level aggregation in CD-P,
indicating the importance of explicitly modeling user-project
interactions through graph structures. We also observe a considerable
decrease when we remove the structural-level aggregation (CD-S),
implying that the structural information of files has a significant
contribution towards the file representation. CD-E does not lead
to a more significant performance decrease, but is outperformed by
CD-F where fine-grained representations of the source code are
present. Thus, user behaviors and project structural clues are more
important than semantic features in code recommendation.
5 RELATED WORKS
5.1 Research in Open Source
Open source has grown into a standard practice for software
engineering [22] and attract researchers to study social coding [62].
Analytical studies focus on the users’ motivation [9, 59], expertise
[45],
and collaboration patterns [35] as well as factors that impact the
popularity [2] of projects. Methodological studies explore project
Jin et al.
0.08
0.09
0.10
0.11
0.12
CD -F -C -E -S -P NCL
Rec at 20
0.12
0.13
0.14
0.15
CD -F -C -E -S -P NCL
NDCG at 20
0.45
0.47
0.49
0.51
0.53
0.55
CD -F -C -E -S -P NCL
Hit at 20
0.18
0.2
0.22
0.24
CD -F -C -E -S -P NCLMRR at 20Figure 3: Results among variants of CODER
and the best baseline model NCL on the ML dataset.
0
0.02
0.04
0.06
0.08
0.1
5 10 15 20 50 100
Recall
LGN NCL CODER
0.005
0.01
0.015
0.02
0.025
0.03
0.035
5 10 15 20 50 100
NDCG
LGN NCL CODER
0.02
0.06
0.1
0.14
0.18
0.22
5 10 15 20 50 100
Hit
LGN NCL CODER
0.015
0.02
0.025
0.03
5 10 15 20 50 100MRRLGNNCLCODERFigure 4: Cross-Project Performance of
CODER and the 2 strongest baselines under various 𝐾, 𝐾 ∈ [5, 100].
Figure 5: t-SNE Visualization of the file embeddings on the
ML dataset produced by (a) Matrix Factorization; (b) LightGCN; (c)
NCL; (d) CODER. Files in the same color fall under
the same repositories.
classification [66], code search [31], connecting publications with
projects [40]. Although previous works have explored the
recommendation task in OSS development settings such as automatic
suggestions of API function calls [19, 37], Good First Issues [1, 53],
and data preparation steps [57], no previous works have studied
the challenging task of code recommendation task, which requires
in-depth understanding of OSS projects written in multiple programming
languages and diverse user-item interactions.
5.2 Recommender Systems
The advances in deep learning have greatly facilitated the evolution
of recommender systems [3, 16, 49, 64, 65]. In particular, motivated
by the success of Graph Neural Networks (GNN) [15, 26, 70, 71],
a series of graph-based recommender systems [27, 49, 52] are proposed,
which organize user behaviors into heterogeneous interaction graphs.
These methods formulate item recommendation as
link prediction or representation learning tasks [48, 67], and utilize
high-order relationships to infer user preferences, item attributes,
and collaborative filtering signals [17, 40, 47, 51, 60]. Noticeably,
traditional recommendation models cannot be easily transferred to
code recommendation as they do not model unique signals in OSS
development, such as project hierarchies and code semantics.
6 CONCLUSION AND FUTURE WORKS
In this work, we are the first to formulate the task of code
recommendation for open source developers. We propose CODER,
a code recommendation model suitable for open source projects
written in diverse languages. Extensive experiments on 3 datasets
demonstrate the superior performances of our method. Currently,
our approach only considers recommending existing files to users.
As CODER harnesses the metadata and semantic features of files, it
cannot deal with users creating new files where such information of
the candidate files is absent. We plan to generalize our framework
by allowing users to initialize files under their interested
subdirectories. Meanwhile, our source code encoding scheme can be
further
improved by harnessing knowledge about programming languages.
For example, previous works explored the use of Abstract Syntax
Tree (AST) [36] and data flow [14, 19] (graphs that represent
dependency relation between variables) on language-specific tasks. Our
current encoding scheme is a computationally efficient way to deal
with the diversity of programming languages. Moreover, the user
representations can be further enhanced by modeling users’ social
relations [6, 33] and behaviors [23, 56, 58] Overall, future works
Code Recommendation for Open Source Software Developers
can incorporate domain knowledge about programming languages
and social information about the users to improve the item and
user representations at a finer granularity.
REFERENCES
[1] Jan Willem David Alderliesten and Andy Zaidman. 2021. An Initial Exploration
of the “Good First Issue” Label for Newcomer Developers. In 2021 IEEE/ACM 13th
International Workshop on Cooperative and Human Aspects of Software Engineering
(CHASE). IEEE, 117–118.
[2] Hudson Borges, Andre Hora, and Marco Tulio Valente. 2016. Understanding
the factors that impact the popularity of GitHub repositories. In ICSME. IEEE,
334–344.
[3] Jin Chen, Defu Lian, Binbin Jin, Kai Zheng, and Enhong Chen. 2022. Learning
Recommenders for Implicit Feedback with Importance Resampling. In WWW.
1997–2005.
[4] Jailton Coelho, Marco Tulio Valente, Luciano Milen, and Luciana L
Silva. 2020. Is
this GitHub project maintained? Measuring the level of maintenance activity of
open-source projects. Information and Software Technology 122 (2020), 106274.
[5] Roberto Di Cosmo and Stefano Zacchiroli. 2017. Software Heritage:
Why and How
to Preserve Software Source Code. In iPRES 2017-14th International Conference
on Digital Preservation. 1–10.
[6] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin.
2019. Graph neural networks for social recommendation. In WWW. 417–426.
[7] Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong,
Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. CodeBERT: A
PreTrained Model for Programming and Natural Languages. In Findings of
EMNLP
2020. 1536–1547.
[8] Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation
Learning with
PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and
Manifolds.
[9] Marco Gerosa, Igor Wiese, Bianca Trinkenreich, Georg Link, Gregorio Robles,
Christoph Treude, Igor Steinmacher, and Anita Sarma. 2021. The shifting sands
of motivation: Revisiting what drives contributors in open source. In
ICSE. IEEE,
1046–1058.
[10] GitHub. 2016. The State of the Octoverse.
https://octoverse.github.com/2016/
[11] GitHub. 2022. Collection: Programming Languages. https://github.com/
collections/programming-languages
[12] GitHub. 2022. Github Number of Repositories. https://github.com/search.
[13] Xavier Glorot and Yoshua Bengio. 2010. Understanding the
difficulty of training
deep feedforward neural networks. In AISTATS. JMLR Workshop and Conference
Proceedings, 249–256.
[14] Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long
Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2021. GraphCodeBERT:
Pre-training Code Representations with Data Flow. In ICLR.
[15] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive
representation
learning on large graphs. NIPS 30 (2017).
[16] Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou
Sun, and Wei Wang. 2020. P-companion: A principled framework for diversified
complementary product recommendation. In CIKM. 2517–2524.
[17] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng
Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for
recommendation. In SIGIR. 639–648.
[18] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng
Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th
international
conference on world wide web. 173–182.
[19] Xincheng He, Lei Xu, Xiangyu Zhang, Rui Hao, Yang Feng, and Baowen Xu.
2021. Pyart: Python api recommendation in real-time. In 2021 IEEE/ACM 43rd
International Conference on Software Engineering (ICSE). IEEE, 1634–1645.
[20] Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing
source code with transferred API knowledge. In IJCAI. 2269–2275.
[21] Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc
Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic
code search. arXiv preprint arXiv:1909.09436 (2019).
[22] Jyun-Yu Jiang, Pu-Jen Cheng, and Wei Wang. 2017. Open source repository
recommendation in social coding. In SIGIR. 1173–1176.
[23] Yiqiao Jin, Xiting Wang, Ruichao Yang, Yizhou Sun, Wei Wang, Hao Liao, and
Xing Xie. 2022. Towards fine-grained reasoning for fake news detection. In AAAI,
Vol. 36. 5746–5754.
[24] Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT:
Pre-training of Deep Bidirectional Transformers for Language Understanding. In
NAACL.
[25] Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for
Stochastic Optimization. In ICLR (Poster).
[26] Thomas N Kipf and Max Welling. 2016. Semi-supervised
classification with graph
convolutional networks. In ICLR.
[27] Anchen Li, Bo Yang, Huan Huo, and Farookh Hussain. 2022. Hypercomplex
Graph Collaborative Filtering. In WWW. 1914–1922.
[28] Zihan Lin, Changxin Tian, Yupeng Hou, and Wayne Xin Zhao. 2022. Improving
Graph Collaborative Filtering with Neighborhood-enriched Contrastive Learning.
In Proceedings of the ACM Web Conference 2022. 2320–2329.
[29] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi,
Danqi Chen, Omer
Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A
robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
(2019).
[30] Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical
question-image co-attention for visual question answering. NIPS 29 (2016).
[31] Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish
Chandra. 2019.
Aroma: Code recommendation via structural code search. Proceedings of the ACM
on Programming Languages 3, OOPSLA (2019), 1–28.
[32] Nora McDonald and Sean Goggins. 2013. Performance and participation in open
source software on github. In CHI’13 extended abstracts on human factors in
computing systems. 139–144.
[33] Xin Mei, Xiaoyan Cai, Sen Xu, Wenjie Li, Shirui Pan, and Libin Yang. 2022.
Mutually reinforced network embedding: An integrated approach to research
paper recommendation. Expert Systems with Applications (2022), 117616.
[34] Antonio Valerio Miceli-Barone and Rico Sennrich. 2017. A Parallel Corpus of
Python Functions and Documentation Strings for Automated Code
Documentation and Code Generation. In Proceedings of the Eighth
International Joint
Conference on Natural Language Processing (Volume 2: Short Papers). 314–319.
[35] Nadia Nahar, Shurui Zhou, Grace Lewis, and Christian Kästner.
2022. Collaboration Challenges in Building ML-Enabled Systems:
Communication, Documentation, Engineering, and Process. Organization
1, 2 (2022), 3.
[36] Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Anh Nguyen, Lily Mast,
Eli Rademacher, Tien N Nguyen, and Danny Dig. 2016. API code recommendation
using statistical learning from fine-grained changes. In SIGSOFT. 511–522.
[37] Phuong T Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas
Degueule, and Massimiliano Di Penta. 2019. Focus: A recommender system
for mining api function calls and usage patterns. In ICSE. IEEE, 1050–1060.
[38] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James
Bradbury, Gregory
Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga,
et al. 2019.
Pytorch: An imperative style, high-performance deep learning library. NIPS 32.
[39] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars
Schmidt-Thieme.
2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI. 452–
461.
[40] Huajie Shao, Dachun Sun, Jiahao Wu, Zecheng Zhang, Aston Zhang, Shuochao
Yao, Shengzhong Liu, Tianshi Wang, Chao Zhang, and Tarek Abdelzaher. 2020.
paper2repo: GitHub repository recommendation for academic papers. In WWW.
629–639.
[41] Nitish Srivastava and Russ R Salakhutdinov. 2012. Multimodal learning with
deep boltzmann machines. NIPS.
[42] Igor Steinmacher, Ana Paula Chaves, Tayana Uchoa Conte, and Marco Aurélio
Gerosa. 2014. Preliminary empirical identification of barriers faced
by newcomers
to Open Source Software projects. In 2014 Brazilian Symposium on Software
Engineering. IEEE, 51–60.
[43] Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing
data using t-SNE.
JMLR 9, 11 (2008).
[44] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana
Romero, Pietro
Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR.
[45] Rahul Venkataramani, Atul Gupta, Allahbaksh Asadullah, Basavaraju Muddu,
and Vasudev Bhat. 2013. Discovery of technical expertise from open source code
repositories. In WWW. 97–98.
[46] Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and
Philip S Yu. 2018. Improving automatic source code summarization via deep
reinforcement learning. In ASE. 397–407.
[47] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019.
Neural graph collaborative filtering. In SIGIR. 165–174.
[48] Xiang Wang, Tinglin Huang, Dingxian Wang, Yancheng Yuan, Zhenguang Liu,
Xiangnan He, and Tat-Seng Chua. 2021. Learning intents behind interactions
with knowledge graph for recommendation. In WWW. 878–887.
[49] Xiting Wang, Kunpeng Liu, Dongjie Wang, Le Wu, Yanjie Fu, and
Xing Xie. 2022.
Multi-level recommendation reasoning over knowledge graphs with
reinforcement learning. In WWW. 2098–2108.
[50] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond,
Clement Delangue,
Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al.
2020. Transformers: State-of-the-art natural language processing. In EMNLP.
38–45.
[51] Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang, and Meng
Wang. 2019.
A neural influence diffusion model for social recommendation. In Proceedings
of the 42nd international ACM SIGIR conference on research and development in
information retrieval. 235–244.
[52] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and
Tieniu Tan. 2019.
Session-based recommendation with graph neural networks. In AAAI, Vol. 33.
346–353.
Jin et al.
[53] Wenxin Xiao, Hao He, Weiwei Xu, Xin Tan, Jinhao Dong, and Minghui
Zhou. 2022.
Recommending good first issues in GitHub OSS projects. In ICSE. 1830–1842.
[54] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018.
How Powerful
are Graph Neural Networks?. In ICLR.
[55] Wenyuan Xu, Xiaobing Sun, Xin Xia, and Xiang Chen. 2017. Scalable relevant
project recommendation on GitHub. In Proceedings of the 9th
Asia-Pacific Symposium on Internetware. 1–10.
[56] Weizhi Xu, Junfei Wu, Qiang Liu, Shu Wu, and Liang Wang. 2022.
Mining Finegrained Semantics via Graph Neural Networks for
Evidence-based Fake News
Detection. arXiv preprint arXiv:2201.06885 (2022).
[57] Cong Yan and Yeye He. 2020. Auto-suggest: Learning-to-recommend
data preparation steps using data science notebooks. In SIGMOD.
1539–1554.
[58] Ruichao Yang, Xiting Wang, Yiqiao Jin, Chaozhuo Li, Jianxun Lian, and Xing
Xie. 2022. Reinforcement Subgraph Reasoning for Fake News Detection. In KDD.
2253–2262.
[59] Yunwen Ye and Kouichi Kishida. 2003. Toward an understanding of
the motivation
of open source software developers. In ICSE. IEEE, 419–429.
[60] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton,
and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale
recommender systems. In KDD. 974–983.
[61] Xueli Yu, Weizhi Xu, Zeyu Cui, Shu Wu, and Liang Wang. 2021. Graph-based
Hierarchical Relevance Matching Signals for Ad-hoc Retrieval. In WWW. 778–
787.
[62] Yue Yu, Huaimin Wang, Gang Yin, and Tao Wang. 2016. Reviewer
recommendation for pull-requests in GitHub: What can we learn from
code review and bug
assignment? Information and Software Technology 74 (2016), 204–218.
[63] Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, and Qi Tian. 2019. Deep modular
co-attention networks for visual question answering. In CVPR. 6281–6290.
[64] Fajie Yuan, Xiangnan He, Haochuan Jiang, Guibing Guo, Jian Xiong, Zhezhao
Xu, and Yilin Xiong. 2020. Future data helps training: Modeling future contexts
for session-based recommendation. In WWW. 303–313.
[65] Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang.
2021. Mining Latent Structures for Multimedia Recommendation. In ACM MM.
3872–3880.
[66] Yu Zhang, Frank F Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, and
Jiawei Han. 2019.
Higitclass: Keyword-driven hierarchical classification of github
repositories. In
ICDM. IEEE, 876–885.
[67] Yu Zheng, Chen Gao, Liang Chen, Depeng Jin, and Yong Li. 2021.
DGCN: Diversified Recommendation with Graph Convolutional Networks. In
WWW. 401–412.
[68] Yu Zheng, Chen Gao, Xiang Li, Xiangnan He, Yong Li, and Depeng Jin. 2021.
Disentangling user interest and conformity for recommendation with causal
embedding. In WWW. 2980–2991.
[69] Jiaxin Zhu, Minghui Zhou, and Audris Mockus. 2014. Patterns of
folder use and
project popularity: A case study of GitHub repositories. In Proceedings of the
8th ACM/IEEE International Symposium on Empirical Software Engineering and
Measurement. 1–4.
[70] Yanqiao Zhu, Weizhi Xu, Jinghao Zhang, Qiang Liu, Shu Wu, and Liang Wang.
2021. Deep graph structure learning for robust representations: A survey. arXiv
preprint arXiv:2103.03036 (2021).
[71] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021.
Graph contrastive learning with adaptive augmentation. In WWW. 2069–2080.