Understanding Large Social Networks

Introduction

Information networks are ubiquitous in the real world with examples such as airline networks, publication networks, social and communication networks, and the World Wide Web. The size of these information networks ranges from hundreds of nodes to millions and billions of nodes.

We will be working on the problem of embedding information networks into low-dimensional spaces, in which every vertex is represented as a low-dimensional vector. Such a low-dimensional embedding is very useful in a variety of applications such as visualization, node classification, link prediction and recommendation.

Edges can be undirected, directed, and/or weighted. Vertex 6 and 7 should be placed closely in the low-dimensional space as they are connected through a strong tie. Vertex 5 and 6 should also be placed closely as they share similar neighbours

LINE: LARGE-SCALE INFORMATION NETWORK EMBEDDING

First-order proximity

The First-order proximity refers to the local pairwise proximity between the vertices in the network (only neighbours).

Second-order proximity

Second-order proximity checks for directly connected nodes as well as other nodes which have influence over itself.

Model Optimization

Negative Sampling, which samples multiple negative edges according to some noisy distribution for each edge (i, j).

Novelty

As mentioned in LINE paper, neural networks for the First-Order Proximity and the Second-Order Proximity independently, and then concatenate the embeddings learned from both the models for a node to get the final embeddings of a node. In our project, we update this part of the model.
We use the same lookup table for the node representation in both the neural networks, i.e., the node embeddings learned from the First-Order Proximity are used in the neural network for the Second-Order Proximity, instead of initializing them again to random values. The embeddings learned from the Second-Order Proximity deep-net are, then, taken as the final embeddings of a node.