The network charts are ways to represent graph data structure using data visualization. The network chart generally consists of nodes that are represented by a dot, circle, or icon and edges which are represented by simple line for undirected graph or arrow for directed graphs.
Holoviews is a python library that provides easy to use API which can be used to plot network charts. Holoviews is a wrapper library around bokeh and matplotlib hence it uses them for plotting purpose behind the scene. We'll be using both bokeh and matplotlib backends for explaining network charts plotting using holoviews. If you do not have a background on holoviews and you are interested in learning holoviews then we have a tutorial on holoviews basics. Please feel free to explore our tutorial to learn about the wonderful data visualization library called holoviews.
We have already covered few tutorials about working with graph data structure using the networkx library. Networkx also provides data visualization of networks using matplotlib as backend but its quite basic and can get a bit messy with too many nodes. We'll be hence concentrating on holoviews for network charts in this tutorial. If you are interested in learning about networkx and network handling using networkx then you can visit our below tutorials:
We'll start by importing necessary libraries.
import pandas as pd
import numpy as np
pd.set_option("max_columns", 50)
import holoviews as hv
import networkx as nx
hv.extension("bokeh")
We'll be using Brazil flight data for visualization purposes. The dataset is available at kaggle.
We suggest that you download the dataset and follow along with us using the same dataset to better understand the material and get an in-depth idea about the whole process. We'll first load the dataset as a pandas dataframe and aggregate it in various ways to create different network charts.
Please make a note that original brazil flights dataset is quite big with size of 600+ MB and nearly 2.5 Mn rows. If you have low RAM then you can use nrows attribute of read_csv() method to load only first few thousand entries to follow along with tutorial without getting stuck. We have loaded only 10k entries for making things simple for explanation purpose.
brazil_flights = pd.read_csv("brazil_flights_data.csv", nrows=10000, encoding="latin")
brazil_flights = brazil_flights.rename(columns={"Cidade.Origem":"City_Orig", "Cidade.Destino":"City_Dest",
"Estado.Origem":"State_Orig", "Estado.Destino":"State_Dest",
"Pais.Origem":"Country_Orig", "Pais.Destino":"Country_Dest",
"Aeroporto.Origem":"Airport_Orig", "Aeroporto.Destino":"Airport_Dest"})
brazil_flights.head()
Below we are filtering the original dataset to keep only entries for domestic flights. We have filtered the original dataset to keep entries where origin and destination country is Brazil. We'll be reusing this dataset by aggregating its data in various ways to generate different network charts.
domestic_flights = brazil_flights[(brazil_flights["Country_Orig"] == "Brasil") & (brazil_flights["Country_Dest"] == "Brasil")]
#domestic_flights = domestic_flights.groupby(["City_Orig", "City_Dest"]).count()[["Voos"]].rename(columns={"Voos":"Count"}).reset_index()
domestic_flights.head()
We'll first plot a network chart showing flights between various cities in Brazil. We have already set bokeh
as an extension of holoviews. We'll first create an aggregated dataset from a domestic flight dataset where we'll group flights according to source and destination countries to get a count for each combination.
flights_bet_cities = domestic_flights.groupby(["City_Orig", "City_Dest"]).count()[["Voos"]]\
.rename(columns={"Voos":"Count"}).reset_index()
flights_bet_cities.head()
Holoviews provides an easy to use method named Graph()
to create a network chart. It expects 3 arguments:
kdims
- The second argument is 2 strings as a list representing columns from dataframe which will be used as source and destination connecting particular edge of the graph.vdims
- The third argument can be a list of strings representing columns of dataframe which will represent attributes of that edge.If we just provide dataframe as input to the method then it'll take the first 2 columns as the source and destination nodes of edges of the graph and third column as an attribute of that edge. We have already created our dataframe with the first column as source node of edge, the second column as destination node of edge, and the third column as the weight of that edge hence we'll not be providing kdims
and vdims
arguments. If dataframe has many columns and we need to use particular columns to represent source node, destination node of edge, and attributes of an edge than we can pass them as a list to kdims
and vdims
.
We are also printing the output of the Graph()
method. We can see that it has taken City_Orig
and City_Dest
as kdims
(Source and Destination Nodes of Edge) and Count
column as vdims
(Attribute of Edge).
%%opts Graph [height=700 width=700 title="Domestic Flights" directed=False] ()
cities_graph = hv.Graph(flights_bet_cities)
print(cities_graph)
cities_graph
Below we are gain creating the same network chart from the previous step but this time we have also added labels to nodes of the chart as well. We have also specified vdims
attribute as a list of column names representing cities between which there are flights.
In order to add labels to the chart, we need to first create a holoviews dataset which has each node mentioned uniquely. We need to pass this dataset along with the dataframe to Graph()
method as well. We then can call Labels()
method of holoviews by passing it a list of nodes from the original graph and column to use for labels. We can them merge labels with the graph using *
operator. We can also set attributes of labels by calling opts()
method on it.
Please make a note that there are two ways to set attributes of graph in holoviews. 1. using %%opts cell magic command. 2. opts() method. We'll use both of them alternatively for explanations.
cities_list = list(set(flights_bet_cities["City_Orig"].unique().tolist() + flights_bet_cities["City_Dest"].unique().tolist()))
cities_dataset = hv.Dataset(pd.DataFrame(cities_list, columns=["City"]))
%%opts Graph [height=700 width=700 title="Domestic Flights" directed=False] (node_size=9)
cities_graph_with_labels = hv.Graph((flights_bet_cities, cities_dataset), ["City_Orig", "City_Dest"])
labels = hv.Labels(cities_graph_with_labels.nodes, ['x', 'y'], ['City'])
labels.opts(xoffset=-0.05, yoffset=0.04, text_font_size='8pt',)
cities_graph_with_labels * labels
Below we are creating the same chart as the previous step but using matplotlib. We first need to set backend as matplotlib in order to create a chart using matplotlib as a background. We can then use exactly the same code as the previous step to create a network chart.
hv.extension("matplotlib")
hv.output(fig='svg', size=250)
%%opts Graph [title="Domestic Flights" directed=False] (node_size=9)
cities_graph_mat = hv.Graph((flights_bet_cities, cities_dataset))
labels = hv.Labels(cities_graph_mat.nodes, ['x', 'y'], ['City'])
labels.opts(xoffset=-0.05, yoffset=0.04, fontsize=6,)
cities_graph_mat * labels
We'll now create a network chart showing flights between states of Brazil. We'll first create an aggregated dataset from the domestic flight dataset created earlier. It'll have flight count between all states of Brazil and their count. It'll have 3 columns representing source state, the destination state of flight and count of flights between them.
hv.extension("bokeh")
flights_bet_states = domestic_flights.groupby(["State_Orig", "State_Dest"]).count()[["Voos"]].rename(columns={"Voos":"Count"}).reset_index()
flights_bet_states.head()
We are also creating holoviews dataset of all unique states which will be used for labeling graph.
states = list(set(flights_bet_states["State_Orig"].unique().tolist() + flights_bet_states["State_Dest"].unique().tolist()))
states_dataset = hv.Dataset(pd.DataFrame(states, columns=["State"]))
We have also modified node sizes, edge color, node color, edge line width attributes of the chart. We also have switched off the x and y-axis from the chart.
If you don't remember configuration options when trying %%opts jupyter magic command then you can press tab inside of brackets and parenthesis. It'll display you list of available options. You can then try various values for that configuration options.
%%opts Graph [height=600 width=600 title="Domestic Flights Between States" xaxis=None yaxis=None show_frame=False]
%%opts Graph (edge_alpha=0.6 node_color="tomato" node_size=15 edge_line_width=1 edge_color="dodgerblue")
states = hv.Graph((flights_bet_states, states_dataset))
labels = hv.Labels(states.nodes, ['x', 'y'], ['State'])
labels.opts(xoffset=-0.05, yoffset=0.05, text_font_size='8pt',)
states * labels
We'll now create a network chart displaying flights between various countries. We have first created an aggregated dataset from our original brazil flight dataset which has information about each combinations of countries and flight count between them.
flights_bet_countries = brazil_flights.groupby(["Country_Orig", "Country_Dest"]).count()[["Voos"]].rename(columns={"Voos":"Count"}).reset_index()
flights_bet_countries.head()
We have also created holoviews dataset of unique country names in order to give labels to chart. We have also used almost the same code as previous steps in order to create a network chart for this case.
countries = list(set(brazil_flights["Country_Orig"].unique().tolist() + brazil_flights["Country_Orig"].unique().tolist()))
countries_dataset = hv.Dataset(pd.DataFrame(countries, columns=["Country"]))
%%opts Graph [height=600 width=600 title="International Flights" xaxis=None yaxis=None show_frame=False]
%%opts Graph (node_color="Country" node_size=15)
international = hv.Graph((flights_bet_countries, countries_dataset))
labels = hv.Labels(international.nodes, ['x', 'y'], ['Country'])
labels.opts(xoffset=-0.15, yoffset=0.05, text_font_size='12pt',)
international * labels
Below we are again creating a network chart of domestic flights between cities but this time we have used a different way to layout the nodes of the chart. By default, holoviews layout nodes of network chart in a circle. We have used the Fruchterman Reingold layout available from the networkx
library. If you are not aware of the networkx library and interested in learning it then we recommend that you go through our tutorials on it from the reference section at the end.
Holoviews provides a method named layout_nodes
as a part of module holoviews.element.graphs
which can be used to layout nodes of the graph according to a different layout. It accepts an existing graph object as the first parameter and layout method to bet used from networkx as the second attribute.
We can set the graph as directed or undirected by setting True
or False
value for the directed attribute as explained in the code below. We have used it True this time to show directions. All our previous plots were undirected because we had set directed to False.
cities = list(set(flights_bet_cities["City_Orig"].unique().tolist() + flights_bet_cities["City_Dest"].unique().tolist()))
cities_dataset = hv.Dataset(pd.DataFrame(cities, columns=["City"]))
%%opts Graph [height=700 width=700 directed=True bgcolor="snow" title="Domestic Flights with Fruchterman Reingold Layout"]
%%opts Graph (edge_alpha=0.6 edge_color="black" node_color="tomato")
from holoviews.element.graphs import layout_nodes
cities_graph_fruchterman = layout_nodes(cities_graph, layout=nx.layout.fruchterman_reingold_layout)
labels = hv.Labels(cities_graph_fruchterman.nodes, ['x', 'y'], ["index"])
labels.opts(xoffset=-0.05, yoffset=0.04, text_font_size='8pt',)
cities_graph_fruchterman * labels
Below we are creating a network chart of flight count between states of brazil using Fruchterman Reingold layout. We have used the same code as the previous step for generating a network chart.
hv.extension("bokeh")
%%opts Graph [directed=True bgcolor="snow" title="Domestic Flights with Fruchterman Reingold Layout"]
%%opts Graph (edge_alpha=0.6 edge_color="black" node_color="tomato")
states_graph = layout_nodes(states, layout=nx.layout.fruchterman_reingold_layout)
labels = hv.Labels(states_graph.nodes, ['x', 'y'], ["index"])
labels.opts(xoffset=-0.05, yoffset=0.04, text_font_size='8pt',)
states_graph * labels
Below we have again created a network chart of flights between cities but this time using the Kamada Kawai layout. We have also modified other chart attributes like background, text colors, etc.
%%opts Graph [height=700 width=700 directed=False bgcolor="black" title="Domestic Flight with Kamada Kawai Layout"]
%%opts Graph (edge_color="red")
cities_graph_kamada = layout_nodes(cities_graph, layout=nx.layout.kamada_kawai_layout)
labels = hv.Labels(cities_graph_kamada.nodes, ['x', 'y'], ["index"])
labels.opts(xoffset=-0.05, yoffset=0.04, text_font_size='8pt', text_color="white")
cities_graph_kamada * labels
Below we have again created a network chart the same as the previous step but using matplotlib as backend. Please make a note that we are using exactly the same code as the previous step to generate a network chart using matplotlib. We just need to set backend as matplotlib first.
hv.extension("matplotlib")
hv.output(fig='svg', size=250)
%%opts Graph [directed=False bgcolor="black" title="Domestic Flight with Kamada Kawai Layout"]
%%opts Graph (edge_color="red")
cities_graph_kamada_mat = layout_nodes(cities_graph, layout=nx.layout.kamada_kawai_layout)
labels = hv.Labels(cities_graph_kamada_mat.nodes, ['x', 'y'], ["index"])
labels.opts(xoffset=-0.05, yoffset=0.04, color="white")
cities_graph_kamada_mat * labels
Below we are creating another network chart of only cities which had more traffic to them from previous charts. We can see that cities Campinas
, Confins
, Guarulhos
, Rio De Janeiro
and Recife
has more traffic compared to other cities. We can filter original domestic flights dataset to keep only entries where source city is one of these cities in order to see traffic originating from these cities. We have also divided labels into two categories where one category has labels of these main cities and the second category has labels of all other cities.
hv.extension("bokeh")
%%opts Graph [height=700 width=700 directed=False bgcolor="black" title="Domestic Flight with Kamada Kawai Layout"]
%%opts Graph (edge_color="red")
main_cities = flights_bet_cities[flights_bet_cities.City_Orig.isin(["Campinas", "Confins", "Guarulhos", "Rio De Janeiro", "Recife"])]
main_cities_list = list(set(main_cities["City_Orig"].unique().tolist() + main_cities["City_Dest"].unique().tolist()))
main_cities_dataset = hv.Dataset(pd.DataFrame(main_cities_list, columns=["City"]))
main_cities_graph = hv.Graph((main_cities, main_cities_dataset))
main_cities_graph = layout_nodes(main_cities_graph, layout=nx.layout.kamada_kawai_layout)
labels = hv.Labels(main_cities_graph.nodes, ['x', 'y'], ["index"])
main_labels = labels.select(index=["Campinas", "Confins", "Guarulhos", "Rio De Janeiro", "Recife"])
other_cities = set(main_cities.City_Dest.values.tolist()).difference(["Campinas", "Confins", "Guarulhos", "Rio De Janeiro", "Recife"])
other_labels = labels.select(index=other_cities)
other_labels.opts(xoffset=-0.05, yoffset=0.04, text_font_size='8pt', text_color="white")
main_labels.opts(xoffset=-0.05, yoffset=0.04, text_font_size='12pt', text_color="yellow")
main_cities_graph * other_labels * main_labels
Below we are creating the same chart as the previous step but using matplotlib as background. Please make a note that we are using exactly the same code as the previous step to generate a network chart using matplotlib. We just need to set backend as matplotlib first.
hv.extension("matplotlib")
hv.output(fig='svg', size=250)
%%opts Graph [directed=False bgcolor="black" title="Domestic Flight with Kamada Kawai Layout"]
%%opts Graph (edge_color="red")
main_cities = flights_bet_cities[flights_bet_cities.City_Orig.isin(["Campinas", "Confins", "Guarulhos", "Rio De Janeiro", "Recife"])]
main_cities_list = list(set(main_cities["City_Orig"].unique().tolist() + main_cities["City_Dest"].unique().tolist()))
main_cities_dataset = hv.Dataset(pd.DataFrame(main_cities_list, columns=["City"]))
main_cities_graph = hv.Graph((main_cities, main_cities_dataset))
main_cities_graph = layout_nodes(main_cities_graph, layout=nx.layout.kamada_kawai_layout)
labels = hv.Labels(main_cities_graph.nodes, ['x', 'y'], ["index"])
main_labels = labels.select(index=["Campinas", "Confins", "Guarulhos", "Rio De Janeiro", "Recife"])
other_cities = set(main_cities.City_Dest.values.tolist()).difference(["Campinas", "Confins", "Guarulhos", "Rio De Janeiro", "Recife"])
other_labels = labels.select(index=other_cities)
other_labels.opts(xoffset=-0.05, yoffset=0.04, fontsize='x-small', color="white")
main_labels.opts(xoffset=-0.05, yoffset=0.04, fontsize='x-large', color="yellow")
main_cities_graph * other_labels * main_labels
Below we are creating a network chart showing flight traffic between three important states and all other states. We can observe from our previous flights between states network chart and can notice that three states (MG
, SP
, and PA
) has most flights. We have filtered domestic flights dataset to keep only entries where source state is among one of these three states in order to analyze traffic from these three states.
hv.extension("bokeh")
%%opts Graph [height=700 width=700 directed=False bgcolor="black" title="Domestic Flights Between States with Kamada Kawai Layout"]
%%opts Graph (edge_color="cyan" node_color="tomato")
main_states = flights_bet_states[flights_bet_states.State_Orig.isin(["MG", "SP", "PA"])]
main_states_list = list(set(main_states["State_Orig"].unique().tolist() + main_states["State_Dest"].unique().tolist()))
main_states_dataset = hv.Dataset(pd.DataFrame(main_states_list, columns=["State"]))
main_states_graph = hv.Graph((main_states, main_states_dataset))
main_states_graph = layout_nodes(main_states_graph, layout=nx.layout.kamada_kawai_layout)
labels = hv.Labels(main_states_graph.nodes, ['x', 'y'], ["index"])
main_labels = labels.select(index=["MG", "SP", "PA"])
other_states = set(main_states.State_Dest.values.tolist()).difference(["MG", "SP", "PA"])
other_labels = labels.select(index=other_states)
other_labels.opts(xoffset=-0.05, yoffset=0.04, text_font_size='8pt', text_color="white")
main_labels.opts(xoffset=-0.05, yoffset=0.04, text_font_size='15pt', text_color="lawngreen")
main_states_graph * other_labels * main_labels
This ends our small tutorial on the network chart plotting using holoviews. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to