This is just a random paper I bumped into today. I'm not totally sure what it's about. The source is not released yet, but I believe it also links to prior work if the topic is interesting. Apologies for the barebones copypaste without formatting. https://arxiv.org/abs/2210.08332 Code Recommendation for Open Source Software Developers Yiqiao Jin yjin328@gatech.edu Georgia Institute of Technology Atlanta, GA, USA Yunsheng Bai yba@cs.ucla.edu University of California, Los Angeles Los Angeles, CA, USA Yanqiao Zhu yzhu@cs.ucla.edu University of California, Los Angeles Los Angeles, CA, USA Yizhou Sun yzsun@cs.ucla.edu University of California, Los Angeles Los Angeles, CA, USA Wei Wang weiwang@cs.ucla.edu University of California, Los Angeles Los Angeles, CA, USA ABSTRACT Open Source Software (OSS) is forming the spines of technology infrastructures, attracting millions of talents to contribute. Notably, it is challenging and critical to consider both the developers’ interests and the semantic features of the project code to recommend appropriate development tasks to OSS developers. In this paper, we formulate the novel problem of code recommendation, whose purpose is to predict the future contribution behaviors of developers given their interaction history, the semantic features of source code, and the hierarchical file structures of projects. Considering the complex interactions among multiple parties within the system, we propose CODER, a novel graph-based code recommendation framework for open source software developers. CODER jointly models microscopic user-code interactions and macroscopic userproject interactions via a heterogeneous graph and further bridges the two levels of information through aggregation on file-structure graphs that reflect the project hierarchy. Moreover, due to the lack of reliable benchmarks, we construct three large-scale datasets to facilitate future research in this direction. Extensive experiments show that our CODER framework achieves superior performance under various experimental settings, including intra-project, crossproject, and cold-start recommendation. We will release all the datasets, code, and utilities for data retrieval upon the acceptance of this work. CCS CONCEPTS • Information systems → Collaborative filtering; Web and social media search; Social recommendation; Personalization. KEYWORDS Code recommendation; recommender system; open source software development; multimodal recommendation; graph neural networks 1 INTRODUCTION Open Source Software (OSS) is becoming increasingly popular in software engineering [22, 45]. As contribution to OSS projects is highly democratized [62], these projects attract millions of developers with diverse expertise and efficiently crowd-source the project development to a larger community of developers beyond the project’s major personnel [22, 32]. For instance, GitHub, one of the most successful platforms for developing and hosting OSS projects, has over 83 million users and 200 million repositories [12]. models/ roberta/ bert/ modeling_bert.pytokenization_bert.pymodeling_roberta.pytokenization_roberta.pydata/ transformers </>Param loading</>Embeddings</>Modelssrc/transformers Figure 1: An example of the transformers repository. OSS projects under similar topics usually adopt similar naming conventions and file structures, which can be seen as knowledge transferable across projects. Community support and teamwork are major driving forces behind open source projects [32]. OSS projects are usually developed in a collaborative manner [2], whereas collaboration in OSS is especially challenging. OSS projects are of large scales and usually contain numerous project files written in diverse programming languages [4]. According to statistics, the most popular 500 GitHub projects contain an average of 2,582 project files, 573 directories, and 360 contributors. Meanwhile, there are more than 300 programming languages on GitHub, 67 of which are actively being used [10, 11]. For project maintainers, it is both difficult and time-consuming to find competent contributors within a potentially large candidate pool. For OSS developers, recommending personalized development tasks according to their project experience and expertise can significantly boost their motivation and reduce their cognitive loads of manually checking the project files. As contribution in OSS is voluntary, developers that fail to find meaningful tasks are likely to quit the project development [42]. Therefore, an efficient system for automatically matching source code with potential contributors is being called for by both the project core team and the potential contributors to reduce their burden. To solve the above issues, in this paper, we for the first time introduce the novel problem of code recommendation for OSS developers. As shown in Fig. 2, this task recommends code in the form of project files to potentially suitable contributors. It is noteworthy arXiv:2210.08332v2 [cs.SE] 20 Oct 2022 Jin et al. that code recommendation has several unique challenges such that traditional recommender models are not directly applicable. Firstly, OSS projects contain multimodal interactions among users, projects, and code files. For example, OSS development contains user-code interactions, such as commits that depict microscopic behaviors of users, and user-project interactions, such as forks and stars that exhibit users’ macroscopic preferences and interests on projects. Also, the contribution relationships are often extremely sparse, due to the significant efforts required to make a single contribution to OSS projects. Therefore, directly modeling the contribution behavior as in traditional collaborative filtering approaches will inevitably lead to inaccurate user/item representations and suboptimal performances. Secondly, in the software engineering domain, code files in a project are often organized in a hierarchical structure [61]. Fig. 1 shows an example of the famous huggingface/transformers repository [50]. The src directory usually contains the major source code for a project. The data and models subdirectories usually include functions for data generation and model implementations, respectively. Such a structural organization of the OSS project reveals semantic relations among code snippets, which are helpful for developers to transfer existing code from other projects to their development. Traditional methods usually ignore such item-wise hierarchical relationships and, as a result, are incapable of connecting rich semantic features in code files with their project-level structures, which is required for accurate code recommendation. Thirdly, most existing benchmarks involving recommendation for softwares only consider limited user-item behaviors [5, 20], are of small scales [36, 37], or contain only certain languages such as Python [19, 34, 46] or Java [5, 20, 37], which renders the evaluation of different recommendation models difficult or not realistic. To overcome the above challenges, we propose CODER, a CODE Recommendation framework for open source software developers that matches project files with potential contributors. As shown in Fig. 2, CODER treats users, code files, and projects as nodes and jointly models the microscopic user-code interactions and macroscopic user-project interactions in a heterogeneous graph. Furthermore, CODER bridges these two levels of information through message aggregation on the file structure graphs that reflect the hierarchical relationships among graph nodes. Additionally, since there is a lack of benchmark datasets for the code recommendation task, we build three large-scale datasets from open software development websites. These datasets cover diverse subtopics in computer science and contain up to 2 million fine-grained user-file interactions. Overall, our contributions are summarized as follows: • We for the first time introduce the problem of code recommendation, whose purpose is to recommend appropriate development tasks to developers, given the interaction history of developers, the semantic features of source code, and hierarchical structures of projects. • We propose CODER, an end-to-end framework that jointly models structural and semantic features of source code as well as multiple types of user behaviors for improving the matching task. • We construct three large-scale multi-modal datasets for code recommendation that cover different topics in computer science to facilitate research on code recommendation. • We conduct extensive experiments on massive datasets to demonstrate the effectiveness of the proposed CODER framework and its design choices. 2 PROBLEM FORMULATION Before delving into our proposed CODER framework, we first formalize our code recommendation task. We use the terms “repository” and “project” interchangeably to refer to an open source project. We define U, V, R as the set of users, files, and repositories, respectively. Each repository 𝑟𝑘 ∈ R contains a subset of files V𝑘 ⊊ V. Both macroscopic project-level interactions and microscopic file-level interactions are present in OSS development. File-level behaviors. We define Y ∈ {0, 1} |U |× |V | as the interaction matrix between U and V for the file-level contribution behavior, where each entry is denoted by 𝑦𝑖𝑗 . 𝑦𝑖𝑗 = 1 indicates that 𝑢𝑖 has contributed to 𝑣𝑗 , and 𝑦𝑖𝑗 = 0, otherwise. Project-level behaviors. Interactions at the project level are more diverse. For example, the popular code hosting platform GitHub allows users to star (publicly bookmark) interesting repositories and watch (subscribe to) repositories for updates. We thus define T as the set of user-project behaviors. Similar to Y, we define S𝑡 ∈ {0, 1} |U |× |R | as the project-level interaction matrix for behavior of type 𝑡. Our goal is to predict the future file-level contribution behaviors of users based on their previous interactions. Formally, given the training data Y tr, we try to predict the interactions in the test set 𝑦𝑖𝑗 ∈ Y ts = Y\Y tr . 3 METHODOLOGY As shown in Fig. 2, we design CODER, a two-stage graph-based recommendation framework. CODER considers 𝑢𝑖 ∈ U, 𝑣𝑗 ∈ V, 𝑟𝑘 ∈ R as graph nodes, and models the user-item interactions and the item-item relations as edges. We use two sets of graphs to characterize the heterogeneous information in code recommendation. One is the user-item interaction graphs that encompass the collaborative signals. The other is the file-structure graphs that reveal file-file and file-project relationships from the project hierarchy perspective. The code recommendation problem is then formulated as a user-file link prediction task. CODER contains two major components: 1) Node Semantics Modeling, which learns the fine-grained representations of project files by fusing code semantics with their historical contributors, and then aggregate project hierarchical information on the file structure graph to learn the file and repository representation; 2) Multi-behavioral Modeling, which jointly models the microscopic user-file interactions and macroscopic user-project interactions. Finally, CODER fuses the representations from multiple behaviors for prediction. This way, node semantics modeling bridges the coarse-grained and fine-grained interaction signals on the item side. Therefore, CODER efficiently characterizes intra-project and inter-project differences, eventually uncovering latent user and item features that explain the interactions Y. 3.1 Node Semantics Modeling Node semantics modeling aims to learn file and repository representation. The challenge is how to inherently combine the semantic features of each project file with its interacted users and the Code Recommendation for Open Source Software Developers (b) Structural-Level Aggregation 𝑟ଵ (d) Project-Level Aggregation 𝑟ଵ 𝑟ଶ 𝐮∗𝐯∗𝐫௧,∗𝐯𝑠ி(𝑖, 𝑗)Node Semantics Modeling </> (c) File-Level Aggregation 𝑣ଵ 𝑣ଶ 𝑢ଵ 𝑢ଵ 𝑣ଵ 𝑣ଶ 𝑣ଷ 𝐐 = 𝐪ଵ~ேೂ 𝐂 = 𝐜ଵ~ே Multi-Behavioral Modeling (a) Code-User Modality Fusion 𝐡 = 𝑓௧௧(𝐂, 𝐐) 𝐮Input Predictionrepository 𝑟 user 𝑢 file 𝑣 code segment 𝑐 </> </> 𝑐ଵ 𝑐ଶ def add(x, y): return x + y 𝐳௧, ∗𝑠(𝑖, 𝑘, 𝑡)star contribution 𝐋 File Structure Graph User-File Interaction Graph User-Project Interaction Graphs Figure 2: Our proposed CODER framework for code recommendation. CODER jointly considers project file structures, code semantics, and user behaviors. CODER models the microscopic file-level interactions and macroscopic project-level interactions through Multi-Behavioral Modeling, and bridges the micro/macro-scopic signals through Node Semantics Modeling. project hierarchy. To address this challenge, we first use a codeuser modality fusion mechanism (Fig. 2a) to fuse the file content modality and the historical users at the code level. Then, we embed the fine-grained collaborative signals from user-file interactions into the file representations. Next, we employ structural-level aggregation (Fig. 2b), which explicitly models the project structures as hierarchical graphs to enrich the file/repository representation with structural information. This step produces representation for each file 𝑣𝑗 and repository 𝑟𝑘 , which serve as the input for user behavior modeling in Sec. 3.2. 3.1.1 Code-User Modality Fusion. A project file is characterized by diverse semantic features including multiple method declarations and invocations, which are useful for explaining why a contributor is interested in it. We therefore use pretrained CodeBERT [7], a bimodal language model for programming languages, to encode the rich semantic features of each file. CodeBERT is shown to generalize well to programming languages not seen in the pretraining stage, making it suitable for our setting where project files are written in diverse programming languages. Here, a straightforward way is to directly encode each file into a per-file latent representation. Such an encoding scheme has two issues. Firstly, a file may contain multiple classes and function declarations that are semantically distinct. Fig. 1 shows the file structure of the huggingface/transformers [50] repository as an example. The modeling_bert.py file contains not only various implementations of the BERT language model for different NLP tasks, but also utilities for parameter loading and word embeddings. These implementations are distributed among several code segments in the same project file, and file-level encoding can fail to encode such semantic correlations. Secondly, the property of a project file is influenced by the historical contributors’ features. A user’s contribution can be viewed as injecting her/his own attributes, including programming style and domain knowledge, into the interacted file. Such contribution behaviors make it more likely to be interacted again by users with similar levels of expertise than random contributors. Therefore, we propose a code-user modality fusion strategy to embed both code semantics and user characteristics into the file representation. Specifically, for each file, we partition its source code into 𝑁𝐶 code segments and encode each of them into a codesegment-level representation c𝑖 . This produces a feature map C = [c1, c2, . . . c𝑁𝐶 ], C ∈ R 𝑁𝐶 ×𝑑 , where 𝑑 is the embedding size. Similarly, we sample 𝑁𝑄 historical users of the file and encode them into a feature map Q = [u1, u2, . . . u𝑁𝑄 ], Q ∈ R 𝑁𝑄 ×𝑑 . Please refer to Appendix ?? for details in encoding C, Q. Inspired by the success of co-attention [30, 63], we transform the user attention space to code attention space by calculating a code-user affinity matrix L ∈ R 𝑁𝐶 ×𝑁𝑄 : L = tanh CW𝑂 Q ⊤ , (1) where W𝑂 ∈ R 𝑑×𝑑 is a trainable weight matrix. Next, we compute the attention weight a ∈ R 𝑁𝐶 of the code segments to select salient features from C. We treat the affinity matrix as a feature and learn an attention map H with 𝑁𝐻 representation: H = tanh W𝐶C ⊤ + W𝑄 (LQ) ⊤ , (2) a = softmax w⊤ 𝐻 H , (3) where W𝐶, W𝑄 ∈ R 𝑁𝐻 ×𝑑 , w𝐻 ∈ R 𝑁𝐻 are the weight parameters. Finally, the file attention representation h is calculated as the weighted sum of the code feature map: h = a ⊤C. (4) The file attention representation serves as a start point to further aggregate file structural feature Jin et al. 3.1.2 Structural-Level Aggregation. Projects are organized in a hierarchical way such that nodes located closer on the file structure graph are more closely related in terms of semantics and functionality. For example, in Fig. 1, both files under the bert/ directory contain source code for the BERT [24] language model, and files under roberta/ contains implementation for the RoBERTa [29] model. The file modeling_bert.py is therefore more closely related to tokenization_bert.py in functionality than to tokenization_ roberta.py. To exploit such structural clues, we model each repository as a hierarchical heterogeneous graph 𝐺𝑆 consisting of file, directory, and repository nodes. Each node is connected to its parent node through an edge, and nodes at the first level are directly connected to the virtual root node representing the project. To encode the features of directory nodes, we partition the directory names into meaningful words according to underscores and letter capitalization, then encoded the nodes by their TF-IDF features. Our encoding scheme is motivated by the insight that the use of standard directory names (e.g., doc, test, models) is correlated with project popularity among certain groups of developers [2, 69]. Repository features are encoded by their project owners, creation timestamps, and their top-5 programming languages. The repository and directory representations are mapped to the same latent space as the file nodes. Then, we pass the representation h through multiple GNN layers to aggregate the features of each node from its neighbors on 𝐺𝑆 . eh = 𝑓GNN(h,𝐺𝑆 ), (5) where eh is the structure-enhanced node representation. The aggregation function 𝑓GNN(·) can be chosen from a wide range of GNN operators, such as GCN [26], GraphSAGE [15], and GIN [54]. In practice, we employ a 3-layer Graph Attention Network (GAT) [44]. 3.2 Multi-behavioral Modeling Direct modeling of the sparse contribution behavior potentially leads to inaccurate user/item representations and aggravates the cold-start issue. Instead, we jointly model the microscopic user-file contribution in File-level Aggregation (Fig. 2c) and macroscopic user-project interactions in Project-level Aggregation (Fig. 2d) to learn user preferences and address the sparsity issue. Then, the representations learned from multi-level behaviors are combined to form the user and item representations for prediction. 3.2.1 File-level Aggregation. We model the project files and their contributors as an undirected user-file bipartite graph G𝐹 consisting of users 𝑢𝑖 ∈ U, files 𝑣𝑗 ∈ V and their interactions. The initial embedding matrix of users/items is denoted by E (0) , which serves as an initial state for end-to-end optimization. E (0) = [u (0) 1 , · · · , u (0) |U | | {z } users embeddings , v (0) 1 , · · · , v (0) |V | | {z } item embedding ], (6) where u (0) 𝑖 is the initial embeddings for user 𝑢𝑖 , and v (0) 𝑗 is the initial embeddings for file 𝑣𝑗 equivalent to its structure-enhanced representation eh (Sec. 3.1.2). We adopt the simple weight sum aggregator in LightGCN [17] in the propagation rule: u (𝑙) 𝑖 = Í 𝑣𝑗 ∈N𝑖 √︃ 1 |N𝑖 ||N𝑗 | v (𝑙−1) 𝑗 , (7) v (𝑙) 𝑗 = Í 𝑢𝑖 ∈N𝑗 √︃ 1 |N𝑗 | |N𝑖 | u (𝑙−1) 𝑖 , (8) where u (𝑙) 𝑖 and v (𝑙) 𝑗 are the embeddings for user 𝑢𝑖 and file 𝑣𝑗 at layer 𝑙. N𝑖 and N𝑗 indicate the neighbors of user 𝑢𝑖 and file 𝑣𝑗 . 1/ √︃ |N𝑖 | N𝑗 is the symmetric normalization term set to the graph Laplacian norm to avoid the increase of GCN embedding sizes [17, 26]. In matrix form, the propagation rule of file-level aggregation is expressed as: E (𝑙) = D −1/2AD−1/2E (𝑙−1) , A = 0 Ytr Y tr⊤ 0 , (9) where A ∈ R ( |U |+ |V |)×( |U |+ |V |) is the affinity matrix. D is the diagonal degree matrix in which each entry D𝑖𝑖 indicates the number of non-zero entries on the i-th row of A. By stacking multiple layers, each user/item node aggregates information from its higher-order neighbors. Propagation through 𝐿 layers yields a set of representations {E (𝑙) } 𝐿 𝑙=0 . Each E (𝑙) emphasizes the messages from its 𝑙-hop neighbors. We apply mean-pooling over all E (𝑙) to derive the user and file representations u ∗ 𝑖 and v ∗ 𝑗 from different levels of user/item features: u ∗ 𝑖 = 1 𝐿 + 1 ∑︁ 𝐿 𝑙=0 u (𝑙) 𝑖 , v ∗ 𝑗 = 1 𝐿 + 1 ∑︁ 𝐿 𝑙=0 v (𝑙) 𝑗 . (10) 3.2.2 Project-Level Aggregation. OSS development is characterized by both microscopic contribution behaviors and multiple types of macroscopic project-level behaviors. For example, developers usually find relevant projects and reuse their functions and explore ideas of possible features [19, 21]. In particular, GitHub users can star (bookmark) interesting repositories and discover projects under similar topics. This way, developers can adapt code implementation of these interesting projects into their own development later. Hence, project-level macroscopic interactions are conducive for extracting users’ broad interests. For each behavior𝑡, we propagate the user and repository embeddings on its project-level interaction graph G 𝑡 𝑃 . The initial embeddings Z (0) is shared by all 𝑡 ∈ T and is composed of the initial user representations identical to Eq. 6 and the repository embeddings from the structure-enhanced representation eh in Eq. 5: Z (0) = [z (0) 1 , z (0) 2 , . . . z (0) |U | | {z } user embeddings , r (0) 1 , r (0) 2 , . . . r (0) |R | | {z } repository embeddings ], (11) Z 𝑙 𝑡 = D −1/2 𝑡 Λ𝑡D −1/2 𝑡 Z (𝑙−1) 𝑡 , (12) where z (0) 𝑖 = u (0) 𝑖 . Λ𝑡 ∈ R ( |U |+ |R |)×( |U |+ |R |) is the affinity matrix for behavior 𝑡 constructed similarly as A in Eq. 9. Agg(·) is an aggregation function. With representations {Z (𝑙) 𝑡 } 𝐿 𝑙=0 obtained from multiple layers, we derive the combined user and repository Code Recommendation for Open Source Software Developers representations for behavior 𝑡 as z ∗ 𝑡,𝑖 = 1 𝐿 + 1 ∑︁ 𝐿 𝑙=0 z (𝑙) 𝑡,𝑖 , r ∗ 𝑡,𝑖 = 1 𝐿 + 1 ∑︁ 𝐿 𝑙=0 r (𝑙) 𝑡,𝑖 . (13) 3.3 Prediction For file-level prediction, we aggregate the macroscopic signals z ∗ 𝑡,𝑖, r ∗ 𝑡,𝑖 from each behavior 𝑡 into u𝑖 , v𝑗 : z ∗ 𝑖 = Agg(𝑧 ∗ 𝑡 , 𝑡 ∈ 𝑇 ), r ∗ 𝑘 = Agg(𝑟 ∗ 𝑡 , 𝑡 ∈ 𝑇 ), (14) u𝑖 = MLP( [u ∗ 𝑖 ||z ∗ 𝑖 ]), v𝑗 = MLP( [v ∗ 𝑗 ||r ∗ 𝜙 (𝑗) ]), (15) where MLP(·) is a multilayer perceptron. 𝜙 (·) : V → R maps the index of each project file to its repository. || is the concatenation operator. On the user side, both macroscopic interests and microlevel interactions are injected into the user representations. On the item side, the semantics of each file is enriched by its interacted users and the repository structural information. For computational efficiency, we employ inner product to calculate the user 𝑢𝑖 ’s preference towards each file 𝑣𝑗 : 𝑠𝐹 (𝑖, 𝑗) = u ⊤ 𝑖 v𝑗 , (16) where 𝑠𝐹 is the scoring function for the file-level behavior. Similarly, for each user-project pair, we derive a project-level score for each behavior 𝑡 using the project-level scoring function 𝑠𝑃 : 𝑠𝑃 (𝑖, 𝑘, 𝑡) = z ∗⊤ 𝑡,𝑖 r ∗ 𝑡,𝑘 . (17) 3.4 Optimization We employ the Bayesian Personalized Ranking (BPR) [39] loss, which encourages the prediction of an observed user-item interaction to be greater than an unobserved one: L𝐹 = ∑︁ (𝑖,𝑗+,𝑗−) ∈𝑂 − log(sigmoid(𝑠𝐹 (𝑖, 𝑗+ ) − 𝑠𝐹 (𝑖, 𝑗− ))), (18) L 𝑡 𝑃 = ∑︁ (𝑖,𝑘+,𝑘−) ∈𝑂 − log(sigmoid(𝑠𝑃 (𝑖, 𝑘+ , 𝑡) − 𝑠𝑃 (𝑖, 𝑘− , 𝑡))), (19) where L𝐹 is the file-level BPR loss, and L𝑡 𝑃 is the project-level BPR loss for behavior 𝑡. O denotes the pairwise training data. 𝑗 + indicates an observed interaction between user 𝑢𝑖 and item 𝑣𝑗 + and 𝑗 − indicates an unobserved one. As high-order neighboring relations within contributors are also useful for recommendations, we enforce users to have similar representations as their structural neighbors through the structure-contrastive learning objective [28]: L 𝑈 𝐶 = ∑︁ 𝑢𝑖 ∈U − log exp u (𝜂) 𝑖 · u (0) 𝑖 /𝜏 Í 𝑢𝑗 ∈U exp u (𝜂) 𝑖 · u (0) 𝑗 /𝜏 , (20) Here, 𝜂 is set to an even number so that each user node can aggregate signals from other user nodes. 𝜏 is the temperature hyperparameter. Similarly, the contrastive loss is applied to each 𝑣𝑖 : L 𝑉 𝐶 = ∑︁ 𝑣𝑖 ∈V − log exp v (𝑙) 𝑖 · v (0) 𝑖 /𝜏 Í 𝑣𝑗 ∈V exp v (𝑙) 𝑖 · v (0) 𝑗 /𝜏 . (21) The overall optimization objective is L = L𝐹 + 𝜆1 ∑︁ 𝑡 ∈T L 𝑡 𝑃 + 𝜆2 (L𝑈 𝐶 + L𝑉 𝐶 ) + 𝜆3 ∥Θ∥2, (22) Table 1: Summary of the datasets. The “#Files” column shows the number of files with observed interactions instead of all existing files in the projects. Dataset #Files #Users #Interactions Density ML 239,232 21,913 663,046 1.26E-4 DB 415,154 30,185 1,935,155 1.54E-4 FS 568,972 51,664 1,512,809 5.14E-5 where Θ denotes all trainable model parameters. 𝜆1, 𝜆2, 𝜆3 are hyperparameters. 4 EXPERIMENTS 4.1 Experimental Settings 4.1.1 Datasets. We collected 3 datasets covering diverse topics in computer science including machine learning (ML), fullstack (FS), and database (DB), using the GitHub API 1 and the PyGithub 2package. We retain repositories with ≥ 250 stars and ≥ 3 contributors to exclude repositories intended for private usages [2]. We include projects with contribution history of at least 3 months according to their commit history. To ensure that our model generalizes on a wide range of topics, popularity, and project scales, we first select 3 subsets of repositories using their GitHub topics 3 , which are project labels created by the project owners. Then, we randomly sample 300 repositories from each subset considering their numbers of project files and stars. We use the unix timestamp 1550000000 and 1602000000 to partition the datasets into train/val/test sets. This way, all interactions before the timestamp are used as the training data. We retain the users with at least 1 interaction in both train and test set. More details about dataset construction are in the appendix. 4.1.2 Implementation Details. We implemented our CODER model in PyTorch [38] and PyG [8]. For all models, we set the embedding size to 32 and perform Xavier initialization [13] on the model parameters. We use Adam optimizer [25] with batch size of 1024. For Node Semantic Modeling (Sec. 3.1), we set 𝑁𝐶 = 8 and 𝑁𝑄 = 4. The code encoder we use is the pretrained CodeBERT [7] model with 6 layers, 12 attention heads, and 768-dimensional hidden states. For Multi-Behavioral Modeling (Sec. 3.2), we set the number of convolution layers 𝐿 = 4 for both intra- and inter-level aggregation. For prediction and optimization, we search the hyperparameter 𝜆3 in [1𝑒 − 4, 1𝑒 − 3, 1𝑒 − 2], and 𝜆1 in [1𝑒 − 2, 1𝑒 − 1, 1]. For structure contrastive loss [28], we adopt the hyperparameter setting from the original implementation and set 𝜆2 = 1𝑒 − 6, 𝜂 = 2 without further tuning. For the baseline models, the hyperparameters are set to the optimal settings as reported in their original papers. For all models, we search the learning rate in [1𝑒 − 4, 3𝑒 − 4, 1𝑒 − 3, 3𝑒 − 3, 1𝑒 − 2]. 4.1.3 Baselines. We compare our methods with 3 groups of methods, including: (G1) factorization-based methods including MF [39]; (G2): neural-network-based methods including MLP [41] and NeuMF [18]; (G3): Graph-based methods that model user-item interactions as graphs, including NGCF [47], LightGCN [17], and NCL [28]. 1https://docs.github.com/en/rest 2https://github.com/PyGithub/PyGithub.git 3https://github.com/topics Jin et al. Table 2: The overall performance on 3 datasets. The best performance is marked in bold. The second best is underlined. Dataset Metric MF MLP NeuMF NGCF LightGCN NCL CODER Impr. NDCG@5 0.065 0.073 0.076 0.091 0.106 0.119 0.132 11.2% Hit@5 0.162 0.189 0.189 0.237 0.291 0.276 0.351 20.5% MRR@5 0.098 0.113 0.114 0.137 0.164 0.201 0.211 5.0% NDCG@10 0.066 0.075 0.081 0.093 0.109 0.118 0.136 14.8% ML Hit@10 0.229 0.250 0.263 0.310 0.386 0.337 0.440 14.0% MRR@10 0.106 0.121 0.124 0.147 0.177 0.209 0.223 6.7% NDCG@20 0.072 0.081 0.084 0.100 0.116 0.120 0.141 17.9% Hit@20 0.324 0.343 0.346 0.407 0.457 0.466 0.540 15.8% MRR@20 0.113 0.127 0.130 0.154 0.185 0.213 0.230 8.2% NDCG@5 0.085 0.079 0.085 0.099 0.082 0.124 0.160 29.0% Hit@5 0.205 0.191 0.206 0.263 0.237 0.316 0.390 23.2% MRR@5 0.130 0.118 0.128 0.162 0.132 0.252 0.260 3.2% NDCG@10 0.086 0.079 0.085 0.100 0.084 0.123 0.159 29.4% DB Hit@10 0.267 0.251 0.276 0.361 0.324 0.380 0.488 28.4% MRR@10 0.138 0.126 0.137 0.175 0.144 0.260 0.273 4.9% NDCG@20 0.088 0.083 0.088 0.103 0.091 0.125 0.160 27.3% Hit@20 0.335 0.338 0.362 0.454 0.422 0.437 0.588 29.5% MRR@20 0.143 0.132 0.143 0.182 0.150 0.264 0.280 6.0% NDCG@5 0.063 0.063 0.067 0.082 0.089 0.106 0.146 37.1% Hit@5 0.168 0.178 0.179 0.231 0.245 0.283 0.374 31.9% MRR@5 0.100 0.100 0.107 0.132 0.146 0.170 0.226 33.0% NDCG@10 0.063 0.065 0.068 0.085 0.092 0.106 0.144 35.6% FS Hit@10 0.231 0.244 0.249 0.319 0.332 0.361 0.467 29.3% MRR@10 0.109 0.110 0.117 0.144 0.157 0.180 0.239 32.3% NDCG@20 0.067 0.070 0.073 0.090 0.095 0.110 0.146 32.7% Hit@20 0.307 0.321 0.335 0.406 0.414 0.451 0.559 23.9% MRR@20 0.114 0.115 0.122 0.150 0.163 0.187 0.245 31.4% As the code recommendation task is to predict users’ file-level contribution, file-level behavior modeling is the most critical component. Thus, we use file-level contribution behaviors as the supervision signals as in Eq. 16. For brevity, we use repository identity to refer to the information of which repository a file belongs to. As the baselines do not explicitly leverage the repository identities of files, we encode their repository identities as a categorical feature through one-hot encoding during embedding construction. To ensure fairness of comparison, we incorporate the project-level interaction signals into the user representations by applying multihot encoding on the repositories each user has interacted with. All the baseline models use the same pretrained CodeBERT embeddings as CODER to leverage the rich semantic features in the source code. 4.1.4 Evaluation Metrics. Following previous works [17, 18, 47, 68], we choose Mean Reciprocal Rank (MRR@K), Normalized Discounted Cumulative Gain (NDCG@K), Recall@K (Rec@K) and Hit@K as the evaluation metrics. 4.2 Performance 4.2.1 Intra-Project Recommendation. In this setting, we evaluate the model’s ability to recommend development tasks under her interacted repositories. For each user 𝑢𝑖 , we rank the interactions under repositories s/he has interacted with in the training set. This setting corresponds to the scenario in which project maintainers recommend new development tasks to existing contributors based on their previous contribution. As shown in Tab. 2, CODER consistently outperforms the baselines by a large margin. On the ML dataset, CODER outperforms the best baseline by 17.9% on NDCG@20, 15.8% on Hit@20, and 8.2% on MRR@20. On the DB dataset, CODER achieves performance improvements of 27.3% on NDCG@20, 29.5% on Hit@20, and 6.0% on MRR@20. Notably, the greatest performance improvement is achieved on the FS dataset, which has the greatest sparsity. CODER achieves a maximum performance improvement of 37.1% on NDCG@5 and 35.6% on NDCG@10. The results show that CODER achieves significant performance improvement over the baseline, and is especially useful when the observed interactions are scarce. Among the baselines, graph-based method (G3) achieves better performances than (G1), (G2) as they can model the high-order relations between users and items through the interaction graph and the embedding function. LightGCN [17] underperforms NGCF [47] on the DB dataset, whose training set has the greatest density, and outperforms NGCF on the ML and FS datasets. This implies that the message passing scheme of NGCF, which involves multiple linear transformation and non-linear activation, is more effective for denser interactions. Such results justify our design choice in multi-behavioral modeling, which uses the LightGCN propagation scheme. NCL exhibits the strongest performance, demonstrating the importance of the contrastive learning loss in modeling differences Code Recommendation for Open Source Software Developers among homogeneous types of nodes, which is also included in our model design. Neural-network-based methods (G2) generally outperform matrix factorization (G1), as they leverage multiple feature transformations to learn the rich semantics in the file embeddings and the user-file interactions. 4.2.2 Cold-Start Recommendation. User contribution is usually sparse due to the considerable workload and voluntary nature of OSS development. In this sense, it is important to accurately capture the users’ preferences with few observed interactions. Our model is thus designed according to this principle. We define coldstart users as users with ≤ 2 interactions in the training set. To evaluate the model’s performance with fewer interactions, we choose 𝑁𝐷𝐶𝐺@𝐾, 𝑅𝑒𝑐𝑎𝑙𝑙@𝐾, and 𝐻𝑖𝑡@𝐾, where 𝐾 ∈ {3, 5}. The strongest 4 baselines in Tab. 2 are evaluated for comparison. As observed from Tab. 3, performance for cold-start users is worse than that for all users in Tab. 2. Notably, CODER is able to achieve even greater performance improvement over the baseline models. It can be attributed to the following aspects: 1) CODER learns more accurate representations by fusing the fine-grained semantics of project files with their interacted users, which facilitates the learning of user preferences evem in the absence of dense user interactions. 2) By explicitly modeling multiple types of projectlevel behaviors, CODER effectively models the users’ interests to complement the sparse file-level contribution relations, which is more effective than encoding the project-level interactions in the embedding space. 4.2.3 Cross-Project Recommendation. Although 91% developers in our dataset focused on 1 project throughout their development, active contributors can work on multiple projects. For these contributors, the project core team can recommend development tasks based on their development experiences in previous projects. During evaluation, we rank the interactions in projects each user has not yet interacted with in the training set. This setting is considerably more challenging than intra-project recommendation since the candidate item pool is significantly larger. According to the results in Fig. 4, CODER consistently achieves superior performance by a large margin with respect to the baselines, especially for 𝐾 ≥ 20. The results show that CODER jointly learns interproject differences to choose the correct repositories and characterize intra-project distinctions to recommend the correct files within the chosen repositories. To further validate the above observation, we randomly sample 10 repositories in the MF dataset and visualize their file embeddings in Fig. 5 using t-SNE [43]. A maximum of 300 files are displayed per repository for clarity. The file embeddings are obtained using 4 models: (a) Matrix Factorization [39], (b) LightGCN [17], (c) NCL [28], and (d) our CODER framework. We observe that models with the contrastive learning objective (NCL and CODER) manifest better clustering structures. In particular, file embeddings learned by our CODER framework demonstrate the best cross-repository differences. The results further prove that CODER jointly models the intra-project and inter-project differences among files, effectively distinguishing the files under the same OSS project, which is more efficient than directly encoding the repository identity in the embedding space. Table 3: File-level link prediction results for cold-start users. “LGN” stands for the baseline “LightGCN”. The best performance is marked in bold. The second best is underlined. Metric NeuMF NGCF LGN NCL CODER Impr. NDCG@3 0.059 0.067 0.068 0.090 0.126 40.9% Hit@3 0.106 0.123 0.161 0.211 0.224 5.9% ML MRR@3 0.081 0.087 0.089 0.119 0.165 38.3% NDCG@5 0.068 0.078 0.088 0.105 0.132 25.8% Hit@5 0.161 0.162 0.230 0.261 0.273 4.8% MRR@5 0.093 0.097 0.105 0.130 0.177 36.0% NDCG@3 0.078 0.063 0.055 0.075 0.119 53.0% Hit@3 0.152 0.114 0.128 0.165 0.238 44.4% DB MRR@3 0.102 0.089 0.070 0.095 0.157 54.0% NDCG@5 0.086 0.061 0.064 0.086 0.127 47.5% Hit@5 0.195 0.132 0.165 0.220 0.287 30.6% MRR@5 0.112 0.093 0.079 0.106 0.168 50.2% NDCG@3 0.079 0.075 0.085 0.092 0.128 38.7% Hit@3 0.171 0.165 0.179 0.179 0.242 35.6% FS MRR@3 0.110 0.095 0.104 0.116 0.171 48.0% NDCG@5 0.086 0.086 0.085 0.095 0.137 44.3% Hit@5 0.230 0.220 0.202 0.222 0.313 36.2% MRR@5 0.124 0.106 0.109 0.125 0.187 49.4% 4.2.4 Ablation Studies. In Fig. 3, we compare the performance of our model (abbreviated as CD) among its 5 variants. CD-F removes the code-user modality fusion strategy in Eq. (1). CD-C excludes the structural contrastive learning objective in Eq. 20-Eq. 21. CD-E does not use the pretrained CodeBERT embeddings and instead applies TF-IDF encoding on the source code, a common approach in project recommendation models [55]. CD-P removes the projectlevel aggregation in Sec. 3.2.2. CD-S disables the structural-level aggregation in Sec. 3.1.2. The results on the ML dataset are shown in Fig. 3. We have the following observations: First, all 6 variants of CODER outperforms NCL, among which the full model (CD) performs the best, indicating the importance of each component in our model design. The performance drops most significantly when we disable project-level aggregation in CD-P, indicating the importance of explicitly modeling user-project interactions through graph structures. We also observe a considerable decrease when we remove the structural-level aggregation (CD-S), implying that the structural information of files has a significant contribution towards the file representation. CD-E does not lead to a more significant performance decrease, but is outperformed by CD-F where fine-grained representations of the source code are present. Thus, user behaviors and project structural clues are more important than semantic features in code recommendation. 5 RELATED WORKS 5.1 Research in Open Source Open source has grown into a standard practice for software engineering [22] and attract researchers to study social coding [62]. Analytical studies focus on the users’ motivation [9, 59], expertise [45], and collaboration patterns [35] as well as factors that impact the popularity [2] of projects. Methodological studies explore project Jin et al. 0.08 0.09 0.10 0.11 0.12 CD -F -C -E -S -P NCL Rec@20 0.12 0.13 0.14 0.15 CD -F -C -E -S -P NCL NDCG@20 0.45 0.47 0.49 0.51 0.53 0.55 CD -F -C -E -S -P NCL Hit@20 0.18 0.2 0.22 0.24 CD -F -C -E -S -P NCLMRR@20Figure 3: Results among variants of CODER and the best baseline model NCL on the ML dataset. 0 0.02 0.04 0.06 0.08 0.1 5 10 15 20 50 100 Recall LGN NCL CODER 0.005 0.01 0.015 0.02 0.025 0.03 0.035 5 10 15 20 50 100 NDCG LGN NCL CODER 0.02 0.06 0.1 0.14 0.18 0.22 5 10 15 20 50 100 Hit LGN NCL CODER 0.015 0.02 0.025 0.03 5 10 15 20 50 100MRRLGNNCLCODERFigure 4: Cross-Project Performance of CODER and the 2 strongest baselines under various 𝐾, 𝐾 ∈ [5, 100]. Figure 5: t-SNE Visualization of the file embeddings on the ML dataset produced by (a) Matrix Factorization; (b) LightGCN; (c) NCL; (d) CODER. Files in the same color fall under the same repositories. classification [66], code search [31], connecting publications with projects [40]. Although previous works have explored the recommendation task in OSS development settings such as automatic suggestions of API function calls [19, 37], Good First Issues [1, 53], and data preparation steps [57], no previous works have studied the challenging task of code recommendation task, which requires in-depth understanding of OSS projects written in multiple programming languages and diverse user-item interactions. 5.2 Recommender Systems The advances in deep learning have greatly facilitated the evolution of recommender systems [3, 16, 49, 64, 65]. In particular, motivated by the success of Graph Neural Networks (GNN) [15, 26, 70, 71], a series of graph-based recommender systems [27, 49, 52] are proposed, which organize user behaviors into heterogeneous interaction graphs. These methods formulate item recommendation as link prediction or representation learning tasks [48, 67], and utilize high-order relationships to infer user preferences, item attributes, and collaborative filtering signals [17, 40, 47, 51, 60]. Noticeably, traditional recommendation models cannot be easily transferred to code recommendation as they do not model unique signals in OSS development, such as project hierarchies and code semantics. 6 CONCLUSION AND FUTURE WORKS In this work, we are the first to formulate the task of code recommendation for open source developers. We propose CODER, a code recommendation model suitable for open source projects written in diverse languages. Extensive experiments on 3 datasets demonstrate the superior performances of our method. Currently, our approach only considers recommending existing files to users. As CODER harnesses the metadata and semantic features of files, it cannot deal with users creating new files where such information of the candidate files is absent. We plan to generalize our framework by allowing users to initialize files under their interested subdirectories. Meanwhile, our source code encoding scheme can be further improved by harnessing knowledge about programming languages. For example, previous works explored the use of Abstract Syntax Tree (AST) [36] and data flow [14, 19] (graphs that represent dependency relation between variables) on language-specific tasks. Our current encoding scheme is a computationally efficient way to deal with the diversity of programming languages. Moreover, the user representations can be further enhanced by modeling users’ social relations [6, 33] and behaviors [23, 56, 58] Overall, future works Code Recommendation for Open Source Software Developers can incorporate domain knowledge about programming languages and social information about the users to improve the item and user representations at a finer granularity. REFERENCES [1] Jan Willem David Alderliesten and Andy Zaidman. 2021. An Initial Exploration of the “Good First Issue” Label for Newcomer Developers. In 2021 IEEE/ACM 13th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 117–118. [2] Hudson Borges, Andre Hora, and Marco Tulio Valente. 2016. Understanding the factors that impact the popularity of GitHub repositories. In ICSME. IEEE, 334–344. [3] Jin Chen, Defu Lian, Binbin Jin, Kai Zheng, and Enhong Chen. 2022. Learning Recommenders for Implicit Feedback with Importance Resampling. In WWW. 1997–2005. [4] Jailton Coelho, Marco Tulio Valente, Luciano Milen, and Luciana L Silva. 2020. Is this GitHub project maintained? Measuring the level of maintenance activity of open-source projects. Information and Software Technology 122 (2020), 106274. [5] Roberto Di Cosmo and Stefano Zacchiroli. 2017. Software Heritage: Why and How to Preserve Software Source Code. In iPRES 2017-14th International Conference on Digital Preservation. 1–10. [6] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph neural networks for social recommendation. In WWW. 417–426. [7] Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. CodeBERT: A PreTrained Model for Programming and Natural Languages. In Findings of EMNLP 2020. 1536–1547. [8] Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds. [9] Marco Gerosa, Igor Wiese, Bianca Trinkenreich, Georg Link, Gregorio Robles, Christoph Treude, Igor Steinmacher, and Anita Sarma. 2021. The shifting sands of motivation: Revisiting what drives contributors in open source. In ICSE. IEEE, 1046–1058. [10] GitHub. 2016. The State of the Octoverse. https://octoverse.github.com/2016/ [11] GitHub. 2022. Collection: Programming Languages. https://github.com/ collections/programming-languages [12] GitHub. 2022. Github Number of Repositories. https://github.com/search. [13] Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In AISTATS. JMLR Workshop and Conference Proceedings, 249–256. [14] Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2021. GraphCodeBERT: Pre-training Code Representations with Data Flow. In ICLR. [15] Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. NIPS 30 (2017). [16] Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou Sun, and Wei Wang. 2020. P-companion: A principled framework for diversified complementary product recommendation. In CIKM. 2517–2524. [17] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. In SIGIR. 639–648. [18] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182. [19] Xincheng He, Lei Xu, Xiangyu Zhang, Rui Hao, Yang Feng, and Baowen Xu. 2021. Pyart: Python api recommendation in real-time. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1634–1645. [20] Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing source code with transferred API knowledge. In IJCAI. 2269–2275. [21] Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019). [22] Jyun-Yu Jiang, Pu-Jen Cheng, and Wei Wang. 2017. Open source repository recommendation in social coding. In SIGIR. 1173–1176. [23] Yiqiao Jin, Xiting Wang, Ruichao Yang, Yizhou Sun, Wei Wang, Hao Liao, and Xing Xie. 2022. Towards fine-grained reasoning for fake news detection. In AAAI, Vol. 36. 5746–5754. [24] Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL. [25] Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR (Poster). [26] Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. In ICLR. [27] Anchen Li, Bo Yang, Huan Huo, and Farookh Hussain. 2022. Hypercomplex Graph Collaborative Filtering. In WWW. 1914–1922. [28] Zihan Lin, Changxin Tian, Yupeng Hou, and Wayne Xin Zhao. 2022. Improving Graph Collaborative Filtering with Neighborhood-enriched Contrastive Learning. In Proceedings of the ACM Web Conference 2022. 2320–2329. [29] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019). [30] Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. NIPS 29 (2016). [31] Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, and Satish Chandra. 2019. Aroma: Code recommendation via structural code search. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1–28. [32] Nora McDonald and Sean Goggins. 2013. Performance and participation in open source software on github. In CHI’13 extended abstracts on human factors in computing systems. 139–144. [33] Xin Mei, Xiaoyan Cai, Sen Xu, Wenjie Li, Shirui Pan, and Libin Yang. 2022. Mutually reinforced network embedding: An integrated approach to research paper recommendation. Expert Systems with Applications (2022), 117616. [34] Antonio Valerio Miceli-Barone and Rico Sennrich. 2017. A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 314–319. [35] Nadia Nahar, Shurui Zhou, Grace Lewis, and Christian Kästner. 2022. Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process. Organization 1, 2 (2022), 3. [36] Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Anh Nguyen, Lily Mast, Eli Rademacher, Tien N Nguyen, and Danny Dig. 2016. API code recommendation using statistical learning from fine-grained changes. In SIGSOFT. 511–522. [37] Phuong T Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas Degueule, and Massimiliano Di Penta. 2019. Focus: A recommender system for mining api function calls and usage patterns. In ICSE. IEEE, 1050–1060. [38] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. NIPS 32. [39] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI. 452– 461. [40] Huajie Shao, Dachun Sun, Jiahao Wu, Zecheng Zhang, Aston Zhang, Shuochao Yao, Shengzhong Liu, Tianshi Wang, Chao Zhang, and Tarek Abdelzaher. 2020. paper2repo: GitHub repository recommendation for academic papers. In WWW. 629–639. [41] Nitish Srivastava and Russ R Salakhutdinov. 2012. Multimodal learning with deep boltzmann machines. NIPS. [42] Igor Steinmacher, Ana Paula Chaves, Tayana Uchoa Conte, and Marco Aurélio Gerosa. 2014. Preliminary empirical identification of barriers faced by newcomers to Open Source Software projects. In 2014 Brazilian Symposium on Software Engineering. IEEE, 51–60. [43] Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR 9, 11 (2008). [44] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR. [45] Rahul Venkataramani, Atul Gupta, Allahbaksh Asadullah, Basavaraju Muddu, and Vasudev Bhat. 2013. Discovery of technical expertise from open source code repositories. In WWW. 97–98. [46] Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S Yu. 2018. Improving automatic source code summarization via deep reinforcement learning. In ASE. 397–407. [47] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural graph collaborative filtering. In SIGIR. 165–174. [48] Xiang Wang, Tinglin Huang, Dingxian Wang, Yancheng Yuan, Zhenguang Liu, Xiangnan He, and Tat-Seng Chua. 2021. Learning intents behind interactions with knowledge graph for recommendation. In WWW. 878–887. [49] Xiting Wang, Kunpeng Liu, Dongjie Wang, Le Wu, Yanjie Fu, and Xing Xie. 2022. Multi-level recommendation reasoning over knowledge graphs with reinforcement learning. In WWW. 2098–2108. [50] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. 2020. Transformers: State-of-the-art natural language processing. In EMNLP. 38–45. [51] Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang, and Meng Wang. 2019. A neural influence diffusion model for social recommendation. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. 235–244. [52] Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In AAAI, Vol. 33. 346–353. Jin et al. [53] Wenxin Xiao, Hao He, Weiwei Xu, Xin Tan, Jinhao Dong, and Minghui Zhou. 2022. Recommending good first issues in GitHub OSS projects. In ICSE. 1830–1842. [54] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How Powerful are Graph Neural Networks?. In ICLR. [55] Wenyuan Xu, Xiaobing Sun, Xin Xia, and Xiang Chen. 2017. Scalable relevant project recommendation on GitHub. In Proceedings of the 9th Asia-Pacific Symposium on Internetware. 1–10. [56] Weizhi Xu, Junfei Wu, Qiang Liu, Shu Wu, and Liang Wang. 2022. Mining Finegrained Semantics via Graph Neural Networks for Evidence-based Fake News Detection. arXiv preprint arXiv:2201.06885 (2022). [57] Cong Yan and Yeye He. 2020. Auto-suggest: Learning-to-recommend data preparation steps using data science notebooks. In SIGMOD. 1539–1554. [58] Ruichao Yang, Xiting Wang, Yiqiao Jin, Chaozhuo Li, Jianxun Lian, and Xing Xie. 2022. Reinforcement Subgraph Reasoning for Fake News Detection. In KDD. 2253–2262. [59] Yunwen Ye and Kouichi Kishida. 2003. Toward an understanding of the motivation of open source software developers. In ICSE. IEEE, 419–429. [60] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In KDD. 974–983. [61] Xueli Yu, Weizhi Xu, Zeyu Cui, Shu Wu, and Liang Wang. 2021. Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval. In WWW. 778– 787. [62] Yue Yu, Huaimin Wang, Gang Yin, and Tao Wang. 2016. Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Information and Software Technology 74 (2016), 204–218. [63] Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, and Qi Tian. 2019. Deep modular co-attention networks for visual question answering. In CVPR. 6281–6290. [64] Fajie Yuan, Xiangnan He, Haochuan Jiang, Guibing Guo, Jian Xiong, Zhezhao Xu, and Yilin Xiong. 2020. Future data helps training: Modeling future contexts for session-based recommendation. In WWW. 303–313. [65] Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021. Mining Latent Structures for Multimedia Recommendation. In ACM MM. 3872–3880. [66] Yu Zhang, Frank F Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, and Jiawei Han. 2019. Higitclass: Keyword-driven hierarchical classification of github repositories. In ICDM. IEEE, 876–885. [67] Yu Zheng, Chen Gao, Liang Chen, Depeng Jin, and Yong Li. 2021. DGCN: Diversified Recommendation with Graph Convolutional Networks. In WWW. 401–412. [68] Yu Zheng, Chen Gao, Xiang Li, Xiangnan He, Yong Li, and Depeng Jin. 2021. Disentangling user interest and conformity for recommendation with causal embedding. In WWW. 2980–2991. [69] Jiaxin Zhu, Minghui Zhou, and Audris Mockus. 2014. Patterns of folder use and project popularity: A case study of GitHub repositories. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 1–4. [70] Yanqiao Zhu, Weizhi Xu, Jinghao Zhang, Qiang Liu, Shu Wu, and Liang Wang. 2021. Deep graph structure learning for robust representations: A survey. arXiv preprint arXiv:2103.03036 (2021). [71] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang. 2021. Graph contrastive learning with adaptive augmentation. In WWW. 2069–2080.