Improve Network Embeddings with Regularization

# Improve Network Embeddings with Regularization [Yi Zhang](mailto:zhang18f@uwindsor.ca), Jianguo Lu, Ofer Shai ## Dataset We use seven datasets in our experiment. | Dataset | # Nodes | # Edges | # Labels | Multi-label | |------------------------------------:|----------:|----------:|---------:|------------:| | [WebKB](./WebKB.tar.gz) | 877 | 1,608 | 5 | no | | [Cora](./Cora.tar.gz) | 2,708 | 5,429 | 7 | no | | [CiteSeer](./CiteSeer.tar.gz) | 3,319 | 4,722 | 6 | no | | [BlogCatalog](./BlogCatalog.tar.gz) | 10,312 | 333,983 | 39 | yes | | [PubMed](./PubMed.tar.gz) | 19,717 | 44,338 | 3 | no | | [Flickr](./Flickr.tar.gz) | 80,513 | 5,899,992 | 195 | yes | | [YouTube](./YouTube.tar.gz) | 1,138,499 | 2,990,443 | 47 | yes | ## Implementation You can download our impelmentation [here](./n2v_r.tar.gz). The codes are written in Cython, we recommand [Intel Python distribution](https://software.intel.com/en-us/distribution-for-python) for better performance. You can compile the code using the following commond: `python3 setup.py build_ext --inplace` ### How-to We implemente two iterators for DeepWalk and node2vec to perform the walk. For example, ``` import deepwalkDataIterator dataIterator = deepwalkDataIterator.DataIterator('Cora/edges.csv', 'cora.deepwalk') ``` The `deepwalkDataIterator` will load the graph from `Cora/edges.csv` and automaticly perform walk and save the paths into `cora.deepwalk`. Next, you can train the DeepWalk with ``` from n2v_r import run model = run( dataIterator, output='deepwalk.emb', iteration = 5, ) ``` The `run` function will takes the dataIterator and train the model `5` times. Once it finised, the embeddings will be saved in `deepwalk.emb`. [Here](./jupyter.html) is a jupyter sample to show how to use the code. ## Objective function Given a network $G=(V, E)$, where $V$ is a set of nodes and $E$ is a set of edges. An embedding is a dense $d$-dimension vector $v_i$ for a node $n_i \in V$. The embedding $v_i$ should retain the information of node $n_i$ in the network such as similarity and structure. To learn the embeddings from a network, for each node $n$ in the network, existing works train the SGNS model with a specific sampling strategy $N_{+}(n)$ that captures the node-neighborhood information. More specifically, DeepWalk and node2vec use the dynamic window on the uniform and biased random walk paths, and LINE uses random edge sampling. In our work, we add L2 regularization in the model to improve the embeddings. The objective function is: \begin{align} O = &\frac{1}{S}\sum_{n_i \in V} \sum_{n_j \in N_{+}(n_i)} [\log \sigma(u_j \cdot v_{i}) + \sum_{k=1}^{K} \mathbb{E}_{u_{k} \sim P_{n}(u)} \log \sigma(-u_{j} \cdot v_{i})]\\ -& \lambda\sum_{i=1}^{|V|} {\Vert v_{i}\Vert _2}^2 - \lambda \sum_{i=1}^{|V|} {\Vert u_{i}\Vert _2}^2 \end{align} where $S$ is the number of the observed training pairs. $v_i$ is the embedding vector for node $n_i$. $u_j$ is the output vector for node $n_j$. $\lambda$ is the regularization weight. $|V|$ is the number of nodes in the network. $P_n$ is a noise distribution which is the frequency of nodes raised to the power of 0.75. Note that for LINE, this frequency follows the degree distribution, while in DeepWalk and node2vec, it is the frequency of a node in the training samples. These works use SGD to update their models. Thus the model updated immediately when a training sample arrives. To ensure a smooth update, we evenly distribute the regularization weight $\lambda$ into each local objective function by frequency. More specifically, for each training sample $(n_i,n_j)$, the local objective is: \begin{align} \begin{split} \log \sigma(u_j \cdot v_{i}) + \sum_{k=1}^{K} \mathbb{E}_{u_{k} \sim P_{n}(u)} \log \sigma(-u_{j} \cdot v_{i}) \\ - \lambda_1 \Vert v_{i} \Vert _2^2 - \lambda_2 \Vert u_{j} \Vert _2^2 - \sum_{k=1}^{K} \mathbb{E}_{u_{k} \sim P_{n}} \lambda_3 \Vert u_{k}\Vert _2^2 . \end{split}\label{eq:network_embeddings-negtive_sampling} \end{align} Here $\sigma(x) = \frac{1}{1+e^{-x}}$ is the sigmoid function. $\lambda_1$, $\lambda_2$ and $\lambda_3$ are the regularization weight for embedding vector $v_i$, output vector $u_j$ and negative sample $u_k$ divided from $\lambda$, which are defined as below: \begin{align} \begin{split} \lambda_1 & = \frac{\lambda}{Freq_{i}(n_i)}, \\ \lambda_2 & = \frac{\lambda}{Freq_{o}(n_j)+Freq_{n}(n_j)}, \\ \lambda_3 & = \frac{\lambda}{Freq_{o}(n_k)+Freq_{n}(n_k)}. \end{split} \end{align} Here $Freq_{i}(\cdot)$, $Freq_{o}(\cdot)$ and $Freq_{n}(\cdot)$ denote the frequency of a node trained as an input, output and negative sample per training iteration respectively.