- Introduction
- 1. Load Dataset
- 2. Create a Graph
- 3. Modify Graph
- 4. Visualize Graph
- 5. Node Importance
- 6. Paths in a Graph
- References

Graphs or commonly referred to as networks are interesting data structure which is used to show relationships between various entities. Graphs are used to represent many real-life scenarios like social networks, airports, and flights between them, recipe and ingredients, and many more. The analysis of the graph data structure requires a different set of algorithms some of which we'll be discussing in this tutorial. We'll also be explaining various properties of graphs which will be useful when doing analysis to find out meaningful insights.

Graphs are data structure which has two main entities:

**Nodes/Vertices:**It's used to represent entities like airports, people, recipe ingredients, etc**Edges:**It's used to represent a relationship between nodes like the distance between airports, the relation between people, whether an ingredient is part of the recipe, etc. Edges are the most important properties of graphs.

Graphs are generally represented as G(V, E) where V represents a list of vertices/nodes and E represents a list of edges between those nodes. When representing graphs as visually each node is represented as a circle and each edge is shown as a line connecting nodes labeling relation between that nodes.

Graphs are generally divided into 2 categories:

**1. Directed Graphs:**Directed graphs have edges with direction from one node to another hence highlighting that data can be flown in direction of arrow only and not reverse. It represents that the relationship between two nodes has direction.- Train Routes Graph
- Air Traffic Graph
- Twitter Social Network
- Recipe-Ingredients Graph

**2. Undirected Graphs:**Undirected graphs do not have any kind of direction information associated with it meaning that data can be flown in both directions. It represents that the relationship exists between two nodes mutually.- Facebook Social Network
- Internet

Python provides a library called `networkx`

for managing and manipulating graph data structure as well as various methods to analyze the properties of networks. We have already covered small tutorials explaining the usage of networkx which we suggest that you go through as it'll give you a basic idea about network creation using its API. We'll further explain graph creation and networkx API as a part of this tutorial as well. Networkx lets us create both directed and undirected graphs. We'll be using the `nxviz`

library for visualizing various network plots. We'll be using network datasets available from Konect when explaining various graph concepts using networkx.

So without further delay, let’s get started with the tutorial.

We'll start by importing all the necessary libraries.

In [1]:

```
# https://coderzcolumn.com/tutorials/python/networkx-basic
# https://snap.stanford.edu/data/
import networkx as nx
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sys
import warnings
warnings.filterwarnings("ignore")
print('Python Version : '+sys.version)
print('NetworkX version : '+nx.__version__)
%matplotlib inline
```

The first dataset that we'll load is seventh grader dataset of students from Victoria. It has information about 29 seventh grade students who were asked about their favorite partner for three activities. It's a directed graph dataset. Each student will be represented as node and edge between two students represents that left student preferred the right student for some activity. The edge between students also has a weight between 1-3 which represents how often left students picked the right student as his partner. The graph created out of this dataset will be `directed graph`

due to edge directions from student to his/her preferred partner.

We'll start loading the dataset and create a graph from it. We'll use this dataset for explaining the graph creation process as well as various properties of graphs available through networkx API.

In [3]:

```
df = pd.read_csv('datasets/moreno_seventh/out.moreno_seventh_seventh',
skiprows=2, header=None, sep=' ')
df.columns = ['student1', 'student2', 'count']
df.head()
```

Out[3]:

In [4]:

```
metadata = pd.read_csv('datasets/moreno_seventh/ent.moreno_seventh_seventh.student.gender',header=None)
metadata.columns=["Gender"]
metadata.head()
```

Out[4]:

We can create a directed graph by using `DiGraph()`

method of networkx. We can then loop through rows of our dataset and add edges to the graph. Directed graph object has method named `add_edge()`

and `add_node()`

which can be used to add edge and node respectively to graph. We can also add metadata about each edge and node using these methods. Whatever parameters that we pass after node name and edge names are added as metadata of that node and edge. Below we are adding `weight`

of each edge as metadata which represents how often left students selected the right student for activities.

In [5]:

```
directed_G = nx.DiGraph()
for s1, s2, cnt in df.values:
directed_G.add_edge(s1, s2, weight=cnt)
```

Below we'll now add metadata for each node which will have information about whether the student is male or female.

We can access list nodes and edges of the graph by calling methods `nodes()`

and `edges()`

respectively on graph objects.

We can access individual node and edge by attribute `nodes`

and `edges`

available in the graph.

In [6]:

```
for node in directed_G.nodes():
directed_G.node[node]["Gender"] = metadata.loc[node-1]["Gender"]
```

In [7]:

```
directed_G.nodes[1], directed_G.edges[1,5]
```

Out[7]:

We can even access node and edge list with their metadata by passing `data=True`

to `nodes()`

and `edges()`

methods.

In [8]:

```
list(directed_G.nodes())[:5], list(directed_G.nodes(data=True))[:5]
```

Out[8]:

In [9]:

```
list(directed_G.edges(data=True))[:5]
```

Out[9]:

We can modify existing metadata of nodes and edges of graphs as well as we can add more nodes and edges to the graph. We'll below explain how to add new nodes and edges to the existing graph. We have already explained above how to add metadata to nodes and edges.

In [10]:

```
directed_G.add_node(30, Gender="male")
directed_G.add_node(31, Gender="female")
```

In [11]:

```
directed_G.add_edge(30,31, weight=3)
directed_G.add_edge(31,1, weight=2)
directed_G.add_edge(31,2, weight=1)
```

In [12]:

```
directed_G.nodes[30], directed_G.nodes[31]
```

Out[12]:

In [13]:

```
directed_G.edges[30,31], directed_G.edges[31,1]
```

Out[13]:

We'll now try various visualizations which will help us with looking at our graph from a different perspective. `Networkx`

has an in-built method to visualize data called `draw()`

which can be used to visualize networks. Apart from that, we'll be using `nxviz`

library to visualize other network plots like arc plot, circus plot, and matrix plots. We'll also use the `hiveplot`

library to visualize hive plots as well.

In [ ]:

```
nx.draw(directed_G, with_labels=True, font_weight='bold', node_color="lime", font_color="red")
```

The `draw()`

method available with netowrkx is non-deterministic and gives different representations each time method is called. If there are too many points then plotting them through `draw()`

does not give good representation as well as insight.

It has other methods like `draw_circular()`

and `draw_shell()`

which represents nodes in a circle.

In [ ]:

```
nx.draw_circular(directed_G, with_labels=True, font_weight='bold', node_color="lime", font_color="red")
```

In [ ]:

```
nx.draw_shell(directed_G, with_labels=True, font_weight='bold', node_color="lime", font_color="red")
```

We can plot matrix plot available from nxviz which draws a matrix where each row and column is nodes and entry of the matrix is whether there is an edge between them or not.

In [ ]:

```
from nxviz import MatrixPlot
m = MatrixPlot(directed_G)
m.draw()
```

Arc plot lays down each node of the graph according to groups and then draws arc between nodes if there exists an edge between them. It can be useful to analyze the relationship between nodes. It can let us highlight if some nodes are highly related to other nodes. We'll be using the method available from `nxviz`

for plotting arc plot.

In [ ]:

```
from nxviz import ArcPlot
node_labels = [node[1]["Gender"] for node in directed_G.nodes(data=True)]
a = ArcPlot(directed_G, node_color='Gender', node_grouping='Gender')
a.draw()
plt.tight_layout()
plt. autoscale()
```

Circos plot is exactly like arc plot but nodes are drawn in a circle instead of on a single line. We'll be using the method available from `nxviz`

to plot the circos plot.

In [ ]:

```
from nxviz import CircosPlot
c = CircosPlot(directed_G, node_color='Gender', node_grouping='Gender')
c.draw()
```

Hive plots are used to highlight groups within a network and their relationships. It can help understand the relationship between groups as well as within-group itself. Below we have divided nodes into two groups (male and female). We also have created color encodings for nodes of each group.

In [ ]:

```
from hiveplot import HivePlot
nodes = {}
nodes['male'] = [n for n,d in directed_G.nodes(data=True) if d['Gender'] == 'male']
nodes['female'] = [n for n,d in directed_G.nodes(data=True) if d['Gender'] == 'female']
edges = {}
edges['group'] = directed_G.edges(data=True)
nodes_cmap = {}
nodes_cmap['male'] = 'dodgerblue'
nodes_cmap['female'] = 'tomato'
edges_cmap = {}
edges_cmap['group'] = 'lime'
h = HivePlot(nodes=nodes, edges=edges, node_colormap=nodes_cmap, edge_colormap=edges_cmap)
h.draw()
```

Node importance represents how connected a node is to other networks. A node that is connected to many other nodes is generally considered an important node. Let’s take an example of air flights where nodes with many flights will be considered very busy airports and a company can think about moving some flights from there to lessen some burden of these airports.

We can interpret the importance of node by counting its `neighbors`

or using `degree centrality`

. We'll try to explain these concepts with examples below.

We'll be using Social Patterns dataset available from konect. This dataset has information about the interaction of people during the exhibition INFECTIOUS: STAY AWAY in 2009 organized at the Science Gallery in Dublin. Here nodes represent individual visitors and edges represents if two individual were facing each other for more than 20 seconds. There can be more than one edges between individuals if there exists contact more than once. The graph created out of this dataset will be `undirected graph`

as when one person is facing another then another one will be facing first as well.

In [21]:

```
df = pd.read_csv('datasets/sociopatterns-infectious/out.sociopatterns-infectious', sep=' ', skiprows=2, header=None)
df = df[[0, 1, 2]]
df.columns = ['person1', 'person2', 'weight']
df.head()
```

Out[21]:

In [22]:

```
undirected_G = nx.Graph()
## Adding edges to graph as well as updating its weight if interaction is more than once
for idx in df.index:
p1 = df.loc[idx]['person1']
p2 = df.loc[idx]['person2']
if undirected_G.has_edge(p1, p2):
undirected_G.edges[p1, p2]['weight'] += 1
else:
undirected_G.add_edge(p1, p2, weight=1)
```

In [23]:

```
## Adding node number as attribute to node by sorting them.
for n in sorted(undirected_G.nodes()):
undirected_G.node[n]['order'] = float(n)
```

In [24]:

```
list(undirected_G.nodes(data=True))[:5]
```

Out[24]:

In [25]:

```
list(undirected_G.edges(data=True))[:5]
```

Out[25]:

In [ ]:

```
c = CircosPlot(undirected_G, node_order='order', node_color='order', figsize=(7,7))
c.draw()
```

The first approach to finding out the importance of any node in a network is by counting its number of nodes. For the directed graph, it'll be a number of nodes pointing to that node whereas, for undirected graph, it'll be a number of edges connecting to it.

Let's count the number of neighbors of an above undirected graph. We'll be making use of a method named `neighbors()`

available through graph object. We'll pass it node and it'll return an iterator with neighbors of that node. We'll then convert mapping from node to a number of its neighbor to pandas series. We'll then sort it to get the node with the highest neighbors.

In [27]:

```
node_to_neighbors_mapping = [(node, len(list(undirected_G.neighbors(node)))) for node in undirected_G.nodes()]
node_to_neighbors_mapping[:5]
```

Out[27]:

In [28]:

```
node_to_neighbors_ser = pd.Series(data=dict(node_to_neighbors_mapping))
node_to_neighbors_ser.sort_values(ascending=False).head()
```

Out[28]:

We can even use `degree()`

method available with networkx. We can pass it graph object and it'll return an iterator with node and count of its neighbors. We can then sort it according to a number of neighbors to get nodes with the highest neighbors.

In [29]:

```
sorted(nx.degree(undirected_G), key = lambda x: x[1], reverse=True)[:5]
```

Out[29]:

The degree centrality is another measure for finding the importance of a node in a network. Networkx provides a method named `degree_centrality()`

which can be used to find out-degree centrality for each node. It returns a dictionary of the node to its degree centrality mapping. We can then sort it to find out nodes with high centrality.

In [30]:

```
sorted(nx.degree_centrality(undirected_G).items(), key=lambda x : x[1], reverse=True)[:5]
```

Out[30]:

The betweenness centrality refers to a number of shortest paths that pass through that node. This node as important as removing such nodes can sometimes break a network into more networks as well.

In [31]:

```
sorted(nx.betweenness_centrality(undirected_G, normalized=False).items(), key=lambda x : x[1], reverse=True)[:5]
```

Out[31]:

In [32]:

```
sorted(nx.betweenness_centrality(undirected_G).items(), key=lambda x : x[1], reverse=True)[:5]
```

Out[32]:

The load centrality refers to a fraction of the shortest paths that pass through that node.

In [33]:

```
sorted(nx.load_centrality(undirected_G).items(), key=lambda x : x[1], reverse=True)[:5]
```

Out[33]:

There are many other types of centrality related to a node as well which can be explored further from this link.

Traversing through a network and finding a path between nodes is an important exercise. We need to find the shortest path between two nodes of a graph in many situations. The graph traversal helps in understanding the structure of the graph and helps to find a route between nodes of the graph.

We can use graph traversal algorithms like `breadth-first search`

and `depth-first search`

to find paths between all nodes of the network. It can let us know whether there exists a path between two nodes of a graph.

`Networkx`

has a method named `has_path()`

which can be used to check whether there exists as a path between two nodes or not.

In [34]:

```
nx.has_path(directed_G, 1, 10)
```

Out[34]:

In [35]:

```
nx.has_path(directed_G, 1, 31)
```

Out[35]:

In [36]:

```
nx.has_path(undirected_G, 1, 10)
```

Out[36]:

We can even use `depth-first search`

and `breadth-first search modules`

available in networkx to create graphs from the output of that algorithms and then plot it. It'll help us understand how both algorithms traversed our graph.

In [ ]:

```
nx.draw_circular(nx.breadth_first_search.bfs_tree(directed_G, 1), with_labels=True, node_color="lime", font_color="red")
```

In [ ]:

```
nx.draw_circular(nx.depth_first_search.dfs_tree(directed_G, 1), with_labels=True, node_color="lime", font_color="red")
```

Finding the shortest path requires many applications of network like finding the shortest route to a particular destination in car, the shortest path for a packet to travel in network, etc. Networkx provides a list of methods to find the shortest path between nodes of the graph. It also lets us choose between `Dijkstra`

and `bellman-ford`

algorithms when finding the shortest path.

Below we are trying various methods available in networkx to find the shortest paths between nodes.

In [39]:

```
nx.shortest_path(directed_G, source=1, target=22)
```

Out[39]:

In [40]:

```
nx.shortest_path(directed_G, source=1, target=22, method="bellman-ford")
```

Out[40]:

In [42]:

```
nx.bellman_ford_path(directed_G, source=1, target=22)
```

Out[42]:

In [43]:

```
nx.dijkstra_path(directed_G, source=1, target=22)
```

Out[43]:

In [193]:

```
nx.shortest_path_length(directed_G, source=1, target=22)
```

Out[193]:

In [205]:

```
print(list(nx.shortest_paths.all_pairs_dijkstra(directed_G))[0])
```

In [210]:

```
print(list(nx.shortest_paths.all_pairs_dijkstra_path_length(directed_G))[0])
```

In [206]:

```
print(list(nx.shortest_paths.all_pairs_bellman_ford_path(directed_G))[0])
```

In [211]:

```
print(list(nx.shortest_paths.all_pairs_bellman_ford_path_length(directed_G))[0])
```

In [212]:

```
print(list(nx.shortest_paths.all_pairs_shortest_path(directed_G))[0])
```

In [213]:

```
print(list(nx.shortest_paths.all_pairs_shortest_path_length(directed_G))[0])
```

Networkx provides another method named `subgraph()`

which can be called on graph object by passing a list of nodes. It'll then return another graph which consists of that nodes and edges between those nodes. We'll use this method to create a graph of the shortest path and visualize it.

In [240]:

```
shortest_path = nx.shortest_path(undirected_G, 1, 400)
print(shortest_path)
subgraph = undirected_G.subgraph(shortest_path)
nx.draw(subgraph, with_labels=True, node_color="lime", font_color="red")
```

In [245]:

```
shortest_path = nx.shortest_path(directed_G, 1, 25)
print(shortest_path)
subgraph = directed_G.subgraph(shortest_path)
nx.draw(subgraph, with_labels=True, node_color="lime", font_color="red")
```

This ends our small tutorial on basic graph analysis. We tried to cover below-mentioned points:

- Graph Definition.
- Centralities.
- Shortest Paths.
- Network Traversal.
- Graph creation & modification through
`networkx`

. - Various visualizations to understand graph structure.

Please feel free to let us know your views in the comments section.

Sunny Solanki

Digital Marketing (7)

Machine Learning (21)

Python (65)