Share @ LinkedIn Facebook  network-analysis, networkx
Networkx - Network Analysis in Python : Node Importance & Paths

Network Analysis in Python: Node Importance & Paths [Networkx]

Table of Contents

Introduction

Graphs or commonly referred to as networks are interesting data structure which is used to show relationships between various entities. Graphs are used to represent many real-life scenarios like social networks, airports, and flights between them, recipe and ingredients, and many more. The analysis of the graph data structure requires a different set of algorithms some of which we'll be discussing in this tutorial. We'll also be explaining various properties of graphs which will be useful when doing analysis to find out meaningful insights.

Graphs are data structure which has two main entities:

  • Nodes/Vertices: It's used to represent entities like airports, people, recipe ingredients, etc
  • Edges: It's used to represent a relationship between nodes like the distance between airports, the relation between people, whether an ingredient is part of the recipe, etc. Edges are the most important properties of graphs.

Graphs are generally represented as G(V, E) where V represents a list of vertices/nodes and E represents a list of edges between those nodes. When representing graphs as visually each node is represented as a circle and each edge is shown as a line connecting nodes labeling relation between that nodes.

Graphs are generally divided into 2 categories:

  • 1. Directed Graphs: Directed graphs have edges with direction from one node to another hence highlighting that data can be flown in direction of arrow only and not reverse. It represents that the relationship between two nodes has direction.

    • Train Routes Graph
    • Air Traffic Graph
    • Twitter Social Network
    • Recipe-Ingredients Graph
  • 2. Undirected Graphs: Undirected graphs do not have any kind of direction information associated with it meaning that data can be flown in both directions. It represents that the relationship exists between two nodes mutually.

    • Facebook Social Network
    • Internet

Python provides a library called networkx for managing and manipulating graph data structure as well as various methods to analyze the properties of networks. We have already covered small tutorials explaining the usage of networkx which we suggest that you go through as it'll give you a basic idea about network creation using its API. We'll further explain graph creation and networkx API as a part of this tutorial as well. Networkx lets us create both directed and undirected graphs. We'll be using the nxviz library for visualizing various network plots. We'll be using network datasets available from Konect when explaining various graph concepts using networkx.

So without further delay, let’s get started with the tutorial.

We'll start by importing all the necessary libraries.

In [1]:
# https://coderzcolumn.com/tutorials/python/networkx-basic
# https://snap.stanford.edu/data/
import networkx as nx
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import sys
import warnings

warnings.filterwarnings("ignore")

print('Python Version : '+sys.version)
print('NetworkX version : '+nx.__version__)

%matplotlib inline
Python Version : 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0]
NetworkX version : 2.3

1. Load Dataset

The first dataset that we'll load is seventh grader dataset of students from Victoria. It has information about 29 seventh grade students who were asked about their favorite partner for three activities. It's a directed graph dataset. Each student will be represented as node and edge between two students represents that left student preferred the right student for some activity. The edge between students also has a weight between 1-3 which represents how often left students picked the right student as his partner. The graph created out of this dataset will be directed graph due to edge directions from student to his/her preferred partner.

We'll start loading the dataset and create a graph from it. We'll use this dataset for explaining the graph creation process as well as various properties of graphs available through networkx API.

In [3]:
df = pd.read_csv('datasets/moreno_seventh/out.moreno_seventh_seventh',
                     skiprows=2, header=None, sep=' ')
df.columns = ['student1', 'student2', 'count']

df.head()
Out[3]:
student1 student2 count
0 1 2 1
1 1 3 1
2 1 4 2
3 1 5 2
4 1 6 3
In [4]:
metadata = pd.read_csv('datasets/moreno_seventh/ent.moreno_seventh_seventh.student.gender',header=None)
metadata.columns=["Gender"]
metadata.head()
Out[4]:
Gender
0 male
1 male
2 male
3 male
4 male

2. Create a Graph

We can create a directed graph by using DiGraph() method of networkx. We can then loop through rows of our dataset and add edges to the graph. Directed graph object has method named add_edge() and add_node() which can be used to add edge and node respectively to graph. We can also add metadata about each edge and node using these methods. Whatever parameters that we pass after node name and edge names are added as metadata of that node and edge. Below we are adding weight of each edge as metadata which represents how often left students selected the right student for activities.

In [5]:
directed_G = nx.DiGraph()

for s1, s2, cnt in df.values:
    directed_G.add_edge(s1, s2, weight=cnt)

Below we'll now add metadata for each node which will have information about whether the student is male or female.

We can access list nodes and edges of the graph by calling methods nodes() and edges() respectively on graph objects.

We can access individual node and edge by attribute nodes and edges available in the graph.

In [6]:
for node in directed_G.nodes():
    directed_G.node[node]["Gender"] = metadata.loc[node-1]["Gender"]
In [7]:
directed_G.nodes[1], directed_G.edges[1,5]
Out[7]:
({'Gender': 'male'}, {'weight': 2})

We can even access node and edge list with their metadata by passing data=True to nodes() and edges() methods.

In [8]:
list(directed_G.nodes())[:5], list(directed_G.nodes(data=True))[:5]
Out[8]:
([1, 2, 3, 4, 5],
 [(1, {'Gender': 'male'}),
  (2, {'Gender': 'male'}),
  (3, {'Gender': 'male'}),
  (4, {'Gender': 'male'}),
  (5, {'Gender': 'male'})])
In [9]:
list(directed_G.edges(data=True))[:5]
Out[9]:
[(1, 2, {'weight': 1}),
 (1, 3, {'weight': 1}),
 (1, 4, {'weight': 2}),
 (1, 5, {'weight': 2}),
 (1, 6, {'weight': 3})]

3. Modify Graph

We can modify existing metadata of nodes and edges of graphs as well as we can add more nodes and edges to the graph. We'll below explain how to add new nodes and edges to the existing graph. We have already explained above how to add metadata to nodes and edges.

In [10]:
directed_G.add_node(30, Gender="male")
directed_G.add_node(31, Gender="female")
In [11]:
directed_G.add_edge(30,31, weight=3)
directed_G.add_edge(31,1, weight=2)
directed_G.add_edge(31,2, weight=1)
In [12]:
directed_G.nodes[30], directed_G.nodes[31]
Out[12]:
({'Gender': 'male'}, {'Gender': 'female'})
In [13]:
directed_G.edges[30,31], directed_G.edges[31,1]
Out[13]:
({'weight': 3}, {'weight': 2})

4. Visualize Graph

We'll now try various visualizations which will help us with looking at our graph from a different perspective. Networkx has an in-built method to visualize data called draw() which can be used to visualize networks. Apart from that, we'll be using nxviz library to visualize other network plots like arc plot, circus plot, and matrix plots. We'll also use the hiveplot library to visualize hive plots as well.

In [ ]:
nx.draw(directed_G, with_labels=True, font_weight='bold', node_color="lime", font_color="red")

Network Analysis in Python: Node Importance & Paths [Networkx]

The draw() method available with netowrkx is non-deterministic and gives different representations each time method is called. If there are too many points then plotting them through draw() does not give good representation as well as insight.

It has other methods like draw_circular() and draw_shell() which represents nodes in a circle.

In [ ]:
nx.draw_circular(directed_G, with_labels=True, font_weight='bold', node_color="lime", font_color="red")

Network Analysis in Python: Node Importance & Paths [Networkx]

In [ ]:
nx.draw_shell(directed_G, with_labels=True, font_weight='bold', node_color="lime", font_color="red")

Network Analysis in Python: Node Importance & Paths [Networkx]

4.1 Matrix Plot [Adjacency Matrix]

We can plot matrix plot available from nxviz which draws a matrix where each row and column is nodes and entry of the matrix is whether there is an edge between them or not.

In [ ]:
from nxviz import MatrixPlot

m = MatrixPlot(directed_G)
m.draw()

Network Analysis in Python: Node Importance & Paths [Networkx]

4.2 Arc Plot

Arc plot lays down each node of the graph according to groups and then draws arc between nodes if there exists an edge between them. It can be useful to analyze the relationship between nodes. It can let us highlight if some nodes are highly related to other nodes. We'll be using the method available from nxviz for plotting arc plot.

In [ ]:
from nxviz import ArcPlot

node_labels = [node[1]["Gender"] for node in directed_G.nodes(data=True)]

a = ArcPlot(directed_G, node_color='Gender', node_grouping='Gender')
a.draw()
plt.tight_layout()
plt. autoscale()

Network Analysis in Python: Node Importance & Paths [Networkx]

4.3 Circos Plot

Circos plot is exactly like arc plot but nodes are drawn in a circle instead of on a single line. We'll be using the method available from nxviz to plot the circos plot.

In [ ]:
from nxviz import CircosPlot

c = CircosPlot(directed_G, node_color='Gender', node_grouping='Gender')
c.draw()

Network Analysis in Python: Node Importance & Paths [Networkx]

4.4 Hive Plot

Hive plots are used to highlight groups within a network and their relationships. It can help understand the relationship between groups as well as within-group itself. Below we have divided nodes into two groups (male and female). We also have created color encodings for nodes of each group.

In [ ]:
from hiveplot import HivePlot

nodes = {}
nodes['male'] = [n for n,d in directed_G.nodes(data=True) if d['Gender'] == 'male']
nodes['female'] = [n for n,d in directed_G.nodes(data=True) if d['Gender'] == 'female']

edges = {}
edges['group'] = directed_G.edges(data=True)

nodes_cmap = {}
nodes_cmap['male'] = 'dodgerblue'
nodes_cmap['female'] = 'tomato'

edges_cmap = {}
edges_cmap['group'] = 'lime'

h = HivePlot(nodes=nodes, edges=edges, node_colormap=nodes_cmap, edge_colormap=edges_cmap)
h.draw()

Network Analysis in Python: Node Importance & Paths [Networkx]

5. Node Importance

Node importance represents how connected a node is to other networks. A node that is connected to many other nodes is generally considered an important node. Let’s take an example of air flights where nodes with many flights will be considered very busy airports and a company can think about moving some flights from there to lessen some burden of these airports.

We can interpret the importance of node by counting its neighbors or using degree centrality. We'll try to explain these concepts with examples below.

5.1 Load Dataset & Create Graph

We'll be using Social Patterns dataset available from konect. This dataset has information about the interaction of people during the exhibition INFECTIOUS: STAY AWAY in 2009 organized at the Science Gallery in Dublin. Here nodes represent individual visitors and edges represents if two individual were facing each other for more than 20 seconds. There can be more than one edges between individuals if there exists contact more than once. The graph created out of this dataset will be undirected graph as when one person is facing another then another one will be facing first as well.

In [21]:
df = pd.read_csv('datasets/sociopatterns-infectious/out.sociopatterns-infectious', sep=' ', skiprows=2, header=None)
df = df[[0, 1, 2]]
df.columns = ['person1', 'person2', 'weight']
df.head()
Out[21]:
person1 person2 weight
0 100 101 1
1 100 101 1
2 100 102 1
3 100 103 1
4 100 103 1
In [22]:
undirected_G = nx.Graph()

## Adding edges to graph as well as updating its weight if interaction is more than once
for idx in df.index:
    p1 = df.loc[idx]['person1']
    p2 = df.loc[idx]['person2']
    if undirected_G.has_edge(p1, p2):
        undirected_G.edges[p1, p2]['weight'] += 1
    else:
        undirected_G.add_edge(p1, p2, weight=1)
In [23]:
## Adding node number as attribute to node by sorting them.
for n in sorted(undirected_G.nodes()):
    undirected_G.node[n]['order'] = float(n)
In [24]:
list(undirected_G.nodes(data=True))[:5]
Out[24]:
[(100, {'order': 100.0}),
 (101, {'order': 101.0}),
 (102, {'order': 102.0}),
 (103, {'order': 103.0}),
 (104, {'order': 104.0})]
In [25]:
list(undirected_G.edges(data=True))[:5]
Out[25]:
[(100, 101, {'weight': 2}),
 (100, 102, {'weight': 1}),
 (100, 103, {'weight': 4}),
 (100, 104, {'weight': 3}),
 (100, 105, {'weight': 10})]

5.2 Circos Plot

In [ ]:
c = CircosPlot(undirected_G, node_order='order', node_color='order', figsize=(7,7))
c.draw()

Network Analysis in Python: Node Importance & Paths [Networkx]

5.3 Counting Neighbors

The first approach to finding out the importance of any node in a network is by counting its number of nodes. For the directed graph, it'll be a number of nodes pointing to that node whereas, for undirected graph, it'll be a number of edges connecting to it.

Let's count the number of neighbors of an above undirected graph. We'll be making use of a method named neighbors() available through graph object. We'll pass it node and it'll return an iterator with neighbors of that node. We'll then convert mapping from node to a number of its neighbor to pandas series. We'll then sort it to get the node with the highest neighbors.

In [27]:
node_to_neighbors_mapping = [(node, len(list(undirected_G.neighbors(node)))) for node in undirected_G.nodes()]
node_to_neighbors_mapping[:5]
Out[27]:
[(100, 29), (101, 13), (102, 16), (103, 26), (104, 17)]
In [28]:
node_to_neighbors_ser = pd.Series(data=dict(node_to_neighbors_mapping))

node_to_neighbors_ser.sort_values(ascending=False).head()
Out[28]:
51     50
272    47
195    43
235    43
161    34
dtype: int64

We can even use degree() method available with networkx. We can pass it graph object and it'll return an iterator with node and count of its neighbors. We can then sort it according to a number of neighbors to get nodes with the highest neighbors.

In [29]:
sorted(nx.degree(undirected_G), key = lambda x: x[1], reverse=True)[:5]
Out[29]:
[(51, 50), (272, 47), (235, 43), (195, 43), (161, 34)]

5.4 Degree Centrality

The degree centrality is another measure for finding the importance of a node in a network. Networkx provides a method named degree_centrality() which can be used to find out-degree centrality for each node. It returns a dictionary of the node to its degree centrality mapping. We can then sort it to find out nodes with high centrality.

In [30]:
sorted(nx.degree_centrality(undirected_G).items(), key=lambda x : x[1], reverse=True)[:5]
Out[30]:
[(51, 0.12224938875305623),
 (272, 0.11491442542787286),
 (235, 0.10513447432762836),
 (195, 0.10513447432762836),
 (161, 0.08312958435207823)]

5.5 Betweenness Centrality

The betweenness centrality refers to a number of shortest paths that pass through that node. This node as important as removing such nodes can sometimes break a network into more networks as well.

In [31]:
sorted(nx.betweenness_centrality(undirected_G, normalized=False).items(), key=lambda x : x[1], reverse=True)[:5]
Out[31]:
[(188, 22133.942335244796),
 (272, 16837.89054201336),
 (51, 16631.579891025405),
 (230, 9587.804811350245),
 (50, 6263.4480561729515)]
In [32]:
sorted(nx.betweenness_centrality(undirected_G).items(), key=lambda x : x[1], reverse=True)[:5]
Out[32]:
[(188, 0.2652804824685363),
 (272, 0.2018060614364706),
 (51, 0.19933337996818407),
 (230, 0.11491208604619403),
 (50, 0.07506889179937859)]

5.6 Load Centrality

The load centrality refers to a fraction of the shortest paths that pass through that node.

In [33]:
sorted(nx.load_centrality(undirected_G).items(), key=lambda x : x[1], reverse=True)[:5]
Out[33]:
[(188, 0.2494139630005818),
 (272, 0.1900537289851826),
 (51, 0.18935645762892783),
 (230, 0.1181998650885591),
 (50, 0.07399184253600075)]

There are many other types of centrality related to a node as well which can be explored further from this link.

6. Paths in a Graph

Traversing through a network and finding a path between nodes is an important exercise. We need to find the shortest path between two nodes of a graph in many situations. The graph traversal helps in understanding the structure of the graph and helps to find a route between nodes of the graph.

We can use graph traversal algorithms like breadth-first search and depth-first search to find paths between all nodes of the network. It can let us know whether there exists a path between two nodes of a graph.

Networkx has a method named has_path() which can be used to check whether there exists as a path between two nodes or not.

In [34]:
nx.has_path(directed_G, 1, 10)
Out[34]:
True
In [35]:
nx.has_path(directed_G, 1, 31)
Out[35]:
False
In [36]:
nx.has_path(undirected_G, 1, 10)
Out[36]:
True

We can even use depth-first search and breadth-first search modules available in networkx to create graphs from the output of that algorithms and then plot it. It'll help us understand how both algorithms traversed our graph.

In [ ]:
nx.draw_circular(nx.breadth_first_search.bfs_tree(directed_G, 1), with_labels=True, node_color="lime", font_color="red")

Network Analysis in Python: Node Importance & Paths [Networkx]

In [ ]:
nx.draw_circular(nx.depth_first_search.dfs_tree(directed_G, 1), with_labels=True, node_color="lime", font_color="red")

Network Analysis in Python: Node Importance & Paths [Networkx]

6.1 Shortest Path

Finding the shortest path requires many applications of network like finding the shortest route to a particular destination in car, the shortest path for a packet to travel in network, etc. Networkx provides a list of methods to find the shortest path between nodes of the graph. It also lets us choose between Dijkstra and bellman-ford algorithms when finding the shortest path.

Below we are trying various methods available in networkx to find the shortest paths between nodes.

In [39]:
nx.shortest_path(directed_G, source=1, target=22)
Out[39]:
[1, 2, 16, 22]
In [40]:
nx.shortest_path(directed_G, source=1, target=22, method="bellman-ford")
Out[40]:
[1, 2, 16, 22]
In [42]:
nx.bellman_ford_path(directed_G, source=1, target=22)
Out[42]:
[1, 2, 16, 22]
In [43]:
nx.dijkstra_path(directed_G, source=1, target=22)
Out[43]:
[1, 2, 16, 22]
In [193]:
nx.shortest_path_length(directed_G, source=1, target=22)
Out[193]:
3
In [205]:
print(list(nx.shortest_paths.all_pairs_dijkstra(directed_G))[0])
(1, ({1: 0, 2: 1, 3: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 4: 2, 5: 2, 8: 2, 9: 2, 16: 2, 17: 2, 18: 2, 21: 2, 23: 2, 25: 2, 6: 2, 7: 2, 27: 2, 20: 2, 24: 2, 28: 2, 19: 3, 29: 3, 26: 3, 22: 3}, {1: [1], 2: [1, 2], 3: [1, 3], 4: [1, 4], 5: [1, 5], 6: [1, 10, 6], 7: [1, 10, 7], 8: [1, 8], 9: [1, 9], 10: [1, 10], 11: [1, 11], 12: [1, 12], 13: [1, 13], 14: [1, 14], 15: [1, 15], 16: [1, 2, 16], 17: [1, 2, 17], 18: [1, 2, 18], 19: [1, 3, 19], 20: [1, 10, 20], 21: [1, 2, 21], 23: [1, 3, 23], 24: [1, 10, 24], 25: [1, 3, 25], 27: [1, 10, 27], 26: [1, 9, 26], 28: [1, 11, 28], 29: [1, 5, 29], 22: [1, 2, 16, 22]}))
In [210]:
print(list(nx.shortest_paths.all_pairs_dijkstra_path_length(directed_G))[0])
(1, {1: 0, 2: 1, 3: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 4: 2, 5: 2, 8: 2, 9: 2, 16: 2, 17: 2, 18: 2, 21: 2, 23: 2, 25: 2, 6: 2, 7: 2, 27: 2, 20: 2, 24: 2, 28: 2, 19: 3, 29: 3, 26: 3, 22: 3})
In [206]:
print(list(nx.shortest_paths.all_pairs_bellman_ford_path(directed_G))[0])
(1, {1: [1], 2: [1, 2], 3: [1, 3], 4: [1, 4], 5: [1, 5], 6: [1, 10, 6], 7: [1, 10, 7], 8: [1, 8], 9: [1, 9], 10: [1, 10], 11: [1, 11], 12: [1, 12], 13: [1, 13], 14: [1, 14], 15: [1, 15], 16: [1, 2, 16], 17: [1, 2, 17], 18: [1, 2, 18], 19: [1, 3, 19], 20: [1, 10, 20], 21: [1, 2, 21], 23: [1, 3, 23], 24: [1, 10, 24], 25: [1, 3, 25], 26: [1, 9, 26], 27: [1, 10, 27], 28: [1, 11, 28], 29: [1, 5, 29], 22: [1, 2, 16, 22]})
In [211]:
print(list(nx.shortest_paths.all_pairs_bellman_ford_path_length(directed_G))[0])
(1, {1: 0, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2, 7: 2, 8: 2, 9: 2, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 2, 17: 2, 18: 2, 19: 3, 20: 2, 21: 2, 23: 2, 24: 2, 25: 2, 26: 3, 27: 2, 28: 2, 29: 3, 22: 3})
In [212]:
print(list(nx.shortest_paths.all_pairs_shortest_path(directed_G))[0])
(1, {1: [1], 2: [1, 2], 3: [1, 3], 4: [1, 4], 5: [1, 5], 6: [1, 6], 7: [1, 7], 8: [1, 8], 9: [1, 9], 10: [1, 10], 11: [1, 11], 12: [1, 12], 13: [1, 13], 14: [1, 14], 15: [1, 15], 16: [1, 2, 16], 17: [1, 2, 17], 18: [1, 2, 18], 19: [1, 2, 19], 20: [1, 2, 20], 21: [1, 2, 21], 23: [1, 3, 23], 24: [1, 3, 24], 25: [1, 3, 25], 26: [1, 4, 26], 27: [1, 5, 27], 28: [1, 5, 28], 29: [1, 5, 29], 22: [1, 2, 16, 22]})
In [213]:
print(list(nx.shortest_paths.all_pairs_shortest_path_length(directed_G))[0])
(1, {1: 0, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 2, 17: 2, 18: 2, 19: 2, 20: 2, 21: 2, 23: 2, 24: 2, 25: 2, 26: 2, 27: 2, 28: 2, 29: 2, 22: 3})

Networkx provides another method named subgraph() which can be called on graph object by passing a list of nodes. It'll then return another graph which consists of that nodes and edges between those nodes. We'll use this method to create a graph of the shortest path and visualize it.

In [240]:
shortest_path = nx.shortest_path(undirected_G, 1, 400)
print(shortest_path)

subgraph = undirected_G.subgraph(shortest_path)

nx.draw(subgraph, with_labels=True, node_color="lime", font_color="red")
[1, 50, 188, 230, 335, 400]
In [245]:
shortest_path = nx.shortest_path(directed_G, 1, 25)

print(shortest_path)

subgraph = directed_G.subgraph(shortest_path)

nx.draw(subgraph, with_labels=True, node_color="lime", font_color="red")
[1, 3, 25]

This ends our small tutorial on basic graph analysis. We tried to cover below-mentioned points:

  • Graph Definition.
  • Centralities.
  • Shortest Paths.
  • Network Traversal.
  • Graph creation & modification through networkx.
  • Various visualizations to understand graph structure.

Please feel free to let us know your views in the comments section.

References



Sunny Solanki  Sunny Solanki