Machine Learning on Graphs using Scikit-Network Library


Anyone doing machine learning is familiar with the open source scikit-learn library which offers an intuitive way to develop a variety of machine learning models. Now, a similar library, scikit-network, is available for machine learning on graphs offering familiar API, efficient representation of graphs, and a collection of fast algorithms. Given that graph-based machine learning is getting lots of attention recently, I thought it is a good idea to introduce the scikit-network library to my readers.

The first thing is obviously to install the library. Having done that, let’s look at how the graphs are created and displayed in scikit-network (sk-network) library. We can create graphs in several ways:

  • 1. By defining an adjacency matrix
  • 2. Using an edge list
  • 3. Loading an existing graph from

The scikit-network represents a graph by its adjacency matrix in the Compressed Sparse Row (CSR) format of SciPy. The graphs are drawn using SVG (scalable vector graphics). A simple example of creating a graph and its display are shown below.

# import all necessary libraries
import sknetwork as skn
import numpy as np
import pandas as pd
from IPython.display import SVG
from scipy import sparse
#Define an adjacency list to create a graph
adjacency = np.array([[0, 1, 1, 0], [1, 0, 1, 1], [1, 1, 0, 0], [0, 1, 0, 1]])
adjacency = sparse.csr_matrix(adjacency)

Let’s display the graph that we have created.

from sknetwork.visualization import svg_graph
image = svg_graph(adjacency)
SVG(image)

The sk-network library comes with many well-known graph datasets. You can also load data from sites such as NetSet and Konect. Let’s load and display one such well known dataset, the Karate club.

from sknetwork.data import karate_club, miserables, movie_actor
graph = karate_club(metadata=True)
adjacency = graph.adjacency
position = graph.position
labels = graph.labels
image = svg_graph(adjacency, position, labels=labels)
SVG(image)

Karate club graph

The library has several built-in utility functions to gather basic properties of graphs. An example is shown below.

from sknetwork import utils as ut
deg = ut.get_degrees(adjacency)
print(deg)

[16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 12 17]

Let’s look at functions related to graph topology. I will show the use of two functions, connected components and the clustering coefficient of a graph. Recall that a connected component of a graph is a subgraph of nodes wherein every node is reachable from another node in the subgraph.The clustering coefficient captures the degree to which the neighbors of a given node link to each other.

from sknetwork.topology import get_connected_components, get_clustering_coefficient
get_connected_components(adjacency)
np.round(get_clustering_coefficient(adjacency), 2)

Scikit-network offers several algorithms to perform machine learning on graphs. Each algorithm is available as an object with some methods similar to those found in the scikit-learn library. I show an examples below to illustrate the usage of the scikit-network library. The example is for performing clustering, also known as community detection. The example method shown here is known as the Louvain method for community detection. The graph used here is the Karate-club graph.

from sknetwork.clustering import Louvain
louvain = Louvain(random_state=13)
labels = louvain.fit_predict(adjacency)# Labels reflect community ids
image = svg_graph(adjacency,labels=labels)
SVG(image)
Discovered Karate club communities

The library not only offers the traditional graph machine learning objects and methods but also includes deep learning modules for graphs including graph convolutional classifier and GraphSage. I am hoping you will explore this library for your use. The only shortcoming I found with the svikrit-network is its documentation. An improved documentation and a tutorial will go a longways to make it popular.