[ot][spam][crazy][wrong] Reading a Book from Library Genesis

Karl gmkarl at gmail.com
Sun Dec 5 11:43:43 PST 2021


So!  Recap:

An adjacency matrix is an n-dimensional boolean array, where each
coordinate is a vertex, and each cell is whether or not there is an
edge connecting those vertices in coordinate order.

Then there is the unnormalized and normalized Laplacian matrix:
----

Defintion 2.28 (Laplacian Matrix)   For a given graph G = {V,E} with A
as its adjacency matrix, its Laplacian matrix is defined as
    L = D - A, (2.18)
where D is a diagonal degree matrix D = diag(d(v_1),...,d(v_|V|))

Another definition of the Laplacian matrix is a normalized version of
Eq. (2.18).

Definition 2.29 (Normalized Laplacian Matrix)    For a given graph G=
{V,E} with A as its adjacency matrix, its normalized Laplacian matrix
is defined as
    L = D^(-1/2)(D - A)D^(-1/2) = I - D^(-1/2)AD^(-1/2). (2.19)
----

The degree matrix wasn't pasted in.  d(v_1) appears to be the 'degree'
of node or vertex 1, which is the number of connections it has to an
edge. More at https://en.wikipedia.org/wiki/Degree_matrix .  A
diagonal matrix is a matrix that holds a single vector, with zeros
everywhere with inequal coordinates, and vector values where the
coordinates are equal.

The Laplacian matrix is simply the difference between the degree
matrix and the adjacency matrix.  Intuitively, I have no idea how that
makes sense, except for including the information from both, fully, if
there are no self-connections in the graph.

Noting, also, that if edges are multiple, you could use values >1 in
the adjacency matrix, and if edges are multiply propertied, you could
use multiple adjacency matrices or use an (n+1)-dimensioned
adjacency/Laplacian matrix for the additional properties.

I usually store graphs as an actual in-memory node graph, which I
imagine as taking far, far less memory, and this can be compressed to
a matrix of link indices which is only n * max_degree elements large.
I suppose with sparse matrices, the difference may matter little.

-

After the definition of adjacency matrices, the book continues with
Properties and Measures starting book page 19, pdf page 38.  A little
into this section interesting things are discussed, such as
eigenvector and betweenness centrality, but it seems like getting into
the model regressions faster could be interesting.  Anything that
shows current uses of some of these graph metrics.

Here's the start of section 2.5 Graph Signal Processing, book page 29
pdf page 48.
----
2.5 Graph Signal Processing

In many real-world graphs, there are often features or attributes
associated with the nodes.  This kind of graph structured data can be
viewed as graph signals, which capture both the structure information
(or connectivity between nodes) and data (or attributes at nodes).  A
graph signal consists of a graph G = {V, E} and a mapping function f
defined on the node domain, which maps the nodes to real values.
Mathematically, the mapping function can be represented as

f : V -> R^d, (2.24)

where d is the dimension of the value (vector) associated with each
node.  Without loss of generality, in this section we set d = 1 and
denote the mapped vaues for all nodes as f with [i] corresponding to
node v_i.
----

Okay, so graph signal processing sounds like just a way for
researchers to obscurify the fact that nodes hold data.  Flipping
forward ...

This looks like a helpful enumeration, book page 39, pdf page 58.
----
2.7 Computational Tasks on Graphs

There are a variety o computational tasks proposed for graphs.  These
tasks can be mainly divided into two categories: (1) node-focused
tasks where the entire data are usually represented as one graph with
nodes as the data samples; and (2) graph-focused tasks, where data
often consist of a set of graphs and each data sample is a graph.  In
this section, e briefly introduce representative tasks for each group.

2.7.1 Node-Focused Tasks

Numerous node-focused tasks have been extensively studied, such as
node classification, node ranking, link prediction, and community
detection.  Next we discuss two representative tasks including node
classification and link prediction.

Node classification
In many real-world graphs, nodes are associated with useful
information, which is often treated as labels of these nodes.  For
example, in social networks, such information can be demographic
properties of users such as age, sex, and occupation and users'
interests and hobbies.  These labels usually help characterize the
nodes and can be leveraged for many important applications.  For
example, on social media such as Facebook, labels related to interests
and hobbies can be utilized to recommend relevant items (i.e., news
and events) to their users.  However, in reality, it is often
difficult to get a full set of labels for all nodes.  For example,
less than 1% of Facebook users provide hteir complete demographic
profiles.  Hence, we are likely given a graph where only a part of the
nodes are associated with labels and we want to infer the labels for
those nodes without labels.  This motivates the problem of node
clsasification on graphs.

Definition 2.42 (Node classification)    Let G = {V, E} denote a graph
with V the set of nodes and E the set of edges.  Some of nodes in V
are associated with labels, and the set of these labeled nodes is
represented as V_l of V.  The remaining nodes do not have labels, and
this set of unlabeled nodes is denoted as V_u.  Specifically we have
V_l intersect V_u = 0.  The goal of the node classification task is to
learn a mapping Phi by leveraging G and labels of V_l, which can
predict the labels of unlabeled nodes (or v among V_u).

The above definition is for simple graphs and can be easily extended
to graphs with attributes and complex graphs introduced in Section
2.6.

Example 2.43 (Node Classification in Flickr)   Flickr is an
image-hosting platform that allows users to host their personal
photos.  It also serves as an online social community where users can
follow each other.  Hence, users on Flickr and their connections form
a graph.  Furthermore, users on Flickr can subscribe to interest
groups such as Black and White, The Fog and The Rain, and Dog World.
These subscriptions indicate the interest of users and can be used as
their labels.  A user can subscribe to multiple groups.  Hence, each
user can be associated with multiple labels.  A multilabel node
classification problem on graphs can help predict the potential groups
that users are interested in but have not yet subscribed to.  Such
data sets for Flickr can be found in Tang and Lui (2009).

Link Prediction
In many real-world applications, graphs are not complete with missing
edges.  On the one hand, some of the connections exist but they are
not observed or recorded, which leads to missing edges in the observed
graphs.  On the other hand, many graphs are naturally evolving.  On
social media such as Facebook, users can always become friends with
other users.  In academic collaboration graphs, a given author can
always build new collaboration relations with other authors.
Inferring or predicting these missing edges can benefit many
applications such as friend recommandations (Adamic and Adar, 2003),
knowledge graph completion (Nickel et al., 2015), and criminal
intelligence analysis (Berlusconi et al., 2016).  Next we give the
formal definition of the link prediction problem.

Definition 2.44 (Link Prediction)    Let G = {V, E} denote a graph
with V its set of nodes and E its set of edges.  Let M denote all
possible edges between the nodes in V.  Then, we denote the
complementary set of E with respect to M as E' = M - E.  The set E'
contains the unobserved edges between the nodes.  The goal of the link
prediction task is to predict the most likely edges.  Specifically, a
score can be assigned to each of the edges in E', which indicates how
likely it exists or will emerge in the future.

Note that the definition is stated for simple graphs and can be easily
extended to complex graphs introduced in Section 2.6.  For example,
for signed graphs, in addition to the existence of an edge, we want to
predict its sign.  For hypergraphs, we want to infer hyperedges that
describe the relations between multiple nodes.

Example 2.45 (Predicting Emerging Collaboration in the DBLP website)
DBLP is an online computer science bibliography website that hosts a
comprehensive list o research papers in computer science.  A
co-authorship graph can be constructed from the papers in DBLP where
the authors are the nodes and authors can be considered as connected
if they have co-authored at least one paper as recorded in DBLP.
Predicting what new collaborations between authors who have never
co-authored before is an interesting link prediction problem.  A large
DBLP collaboration data set for link prediction research can be found
inYang and Leskovec (2015).

2.7.2 Graph-Focused Tasks

There are numerous graph-focused tasks such as graph classification,
graph matching, and graph generation.  Next we discuss the most
representative graph-focused task; i.e., graph classification.

Graph Classification
Node classification treats each node in a graph as a data sample and
aims to assign labels to these unlabeled nodes.  In some applications,
each sample can be represented as a graph.  In chemoinformatics,
chemical molecules can be denoted as graphs where atoms are the nodes
and chemical bonds between them are the edges.  These chemical
molecules may have different properties such as solubility and
toxicity, which can be treated as their labels.  In reality, we may
want to automatically predict these properties for newly discovered
chemical molecules.  This goal can be achieved by the task of graph
classification, which aims to predict the labels for unlabeled graphs.
Due to the complexity of graph structures, graph classification cannot
be carried out by traditional classification, which calls for
dedicated efforts.  Next, we provide a formal definition of graph
classification.

Definition 2.46 (Graph Classification)    Given a set of labeled
graphs D = {(G_i, y_i)} with y_i as the label of the graph G_i, the
goal of the graph classification task is to learn a mapping function
Phi with D, which can predict the labels of unlabeled graphs.

In the definition above, we did not specify additional information
potentially associated with the graphs.  For example, in some
scenarios, each node in a graph is associated with certain features
that can be utilized for graph classification.

Example 2.47 (Classifying Protein Structure into Enzymes or Nonenzymes)
Proteins can be represented as graphs, where amino acids are the nodes
and edges between two nodes can be formed if they are less than 6
Angstroms apart.  Enzymes are a type of proteins that serve as
biological catalysts to catalyze biochemical reactions.  Given a
protein, predicting whether it is an enzyme or not can be treated as a
graph classification task where the label for each protein is either
enzyme or nonenzyme.
----

That's so cool!  It talked about social network prediction and
everything!  That was unfortunately a lot of hand-typing for me
(hand-typing helps me learn the material).  I want to do more but am a
little typed up right now.


More information about the cypherpunks mailing list