Share @ LinkedIn Facebook  chord-chart
How to Plot Chord Diagram in Python [holoviews]?

How to Plot Chord Diagram in Python [holoviews]?

The chord diagram is a data visualization technique used to show the relationship between various data attributes. It organizes data attributes radially in a circle and the between attributes is shown by drawing arcs between them. When graphs have many arcs between points then it can make visualization look messy. The chord diagram can bundle these arcs using a technique called hierarchical edge bundling which creates an arc between two data attributes and the size of arc varies based on a number of connections between them.

The chord diagrams are commonly used to show the relationship between data attributes by presenting a relationship attribute-based on the size ofthe arc. Its also used to show flow or connections between data attributes. The chord diagrams are commonly used for population migration studies, airport routes, economic flows, genome studies, etc.

We'll be explaining ways to plot chord diagrams in python using holoviews. Holoviews is a wrapper library around bokeh and matplotlib hence it uses them for plotting purpose behind the scene. We'll be using both bokeh and matplotlib backends for explaining chord diagrams plotting using holoviews. If you do not have a background on holoviews and you are interested in learning holoviews then we have a tutorial on holoviews basics. Please feel free to explore our tutorial to learn about the wonderful data visualization library called holoviews.

We'll now start by importing necessary libraries.

In [1]:
import pandas as pd
import numpy as np

import warnings

warnings.filterwarnings("ignore")
pd.set_option("max_columns", 30)

import holoviews as hv

We'll be using New Zealand Migration and Brazil flight data for visualization purposes. The datasets are available at kaggle.

  • New Zealand Migration Data - It has information about a number of people who departed from and arrived in New Zealand from all continents and countries of the world from 1979 till 2016.
  • Flights in Brazil - It has information about various flights from and to Brazil.

We suggest that you download the datasets and follow along with us using the same datasets to better understand the material and get an in-depth idea about the whole process. We'll first load the datasets as pandas dataframe and aggregate it in various ways to create different chord diagrams.

NOTE

Please make a note that original brazil flights dataset is quite big with size of 600+ MB and nearly 2.5 Mn rows. If you have low RAM then you can use nrows attribute of read_csv() method to load only first few thousand entries to follow along with tutorial without getting stuck. We have loaded only 10k entries for making things simple for explanation purpose.

In [2]:
nz_migration = pd.read_csv("datasets/migration_nz.csv")
nz_migration.head()
Out[2]:
Measure Country Citizenship Year Value
0 Arrivals Oceania New Zealand Citizen 1979 11817.0
1 Arrivals Oceania Australian Citizen 1979 4436.0
2 Arrivals Oceania Total All Citizenships 1979 19965.0
3 Arrivals Antarctica New Zealand Citizen 1979 10.0
4 Arrivals Antarctica Australian Citizen 1979 0.0
In [3]:
brazil_flights = pd.read_csv("brazil_flights_data.csv", nrows=10000, encoding="latin")
brazil_flights = brazil_flights.rename(columns={"Cidade.Origem":"City_Orig", "Cidade.Destino":"City_Dest",
                               "Estado.Origem":"State_Orig", "Estado.Destino":"State_Dest",
                               "Pais.Origem":"Country_Orig", "Pais.Destino":"Country_Dest",
                               "Aeroporto.Origem":"Airport_Orig", "Aeroporto.Destino":"Airport_Dest"})
brazil_flights.head()
Out[3]:
Voos Companhia.Aerea Codigo.Tipo.Linha Partida.Prevista Partida.Real Chegada.Prevista Chegada.Real Situacao.Voo Codigo.Justificativa Airport_Orig City_Orig State_Orig Country_Orig Airport_Dest City_Dest State_Dest Country_Dest LongDest LatDest LongOrig LatOrig
0 AAL - 203 AMERICAN AIRLINES INC Internacional 2016-01-30T08:58:00Z 2016-01-30T08:58:00Z 2016-01-30T10:35:00Z 2016-01-30T10:35:00Z Realizado NaN Afonso Pena Sao Jose Dos Pinhais PR Brasil Salgado Filho Porto Alegre RS Brasil -51.175381 -29.993473 -49.172481 -25.532713
1 AAL - 203 AMERICAN AIRLINES INC Internacional 2016-01-13T12:13:00Z 2016-01-13T12:13:00Z 2016-01-13T21:30:00Z 2016-01-13T21:30:00Z Realizado NaN Salgado Filho Porto Alegre RS Brasil Miami Miami N/I Estados Unidos -80.287046 25.795865 -51.175381 -29.993473
2 AAL - 203 AMERICAN AIRLINES INC Internacional 2016-01-29T12:13:00Z 2016-01-29T12:13:00Z 2016-01-29T21:30:00Z 2016-01-29T21:30:00Z Realizado NaN Salgado Filho Porto Alegre RS Brasil Miami Miami N/I Estados Unidos -80.287046 25.795865 -51.175381 -29.993473
3 AAL - 203 AMERICAN AIRLINES INC Internacional 2016-01-19T12:13:00Z 2016-01-18T12:03:00Z 2016-01-19T21:30:00Z 2016-01-18T20:41:00Z Realizado LIBERACAO SERV. TRAFEGO AEREO/ANTECIPACAO Salgado Filho Porto Alegre RS Brasil Miami Miami N/I Estados Unidos -80.287046 25.795865 -51.175381 -29.993473
4 AAL - 203 AMERICAN AIRLINES INC Internacional 2016-01-30T12:13:00Z 2016-01-30T12:13:00Z 2016-01-30T21:30:00Z 2016-01-30T21:30:00Z Realizado NaN Salgado Filho Porto Alegre RS Brasil Miami Miami N/I Estados Unidos -80.287046 25.795865 -51.175381 -29.993473

We'll start by setting backend for plotting as bokeh. We'll be explicitly specifying each time which backend to use to holoviews.

In [ ]:
hv.extension("bokeh")

How to Plot Chord Diagram in Python [holoviews]?

We'll first group brazil flight dataset by origin city and destination city and then take count of each combinations. We'll be using this aggregated dataset for plotting the chord chart.

In [5]:
flight_counts_bet_cities = brazil_flights.groupby(by=["City_Orig", "City_Dest"]).count()[["Voos"]].rename(columns={"Voos":"Count"}).reset_index()
flight_counts_bet_cities = flight_counts_bet_cities.sort_values(by="Count", ascending=False)
flight_counts_bet_cities.head()
Out[5]:
City_Orig City_Dest Count
121 Guarulhos Confins 135
70 Confins Guarulhos 126
88 Confins Sao Paulo 124
120 Guarulhos Buenos Aires/Aeroparque 120
36 Buenos Aires/Aeroparque Guarulhos 120

Chord Diagram Showing Traffic Movement Between Cities [Bokeh]

Holoviews has a method named Chord to create chord diagrams. We need to provide its dataframe containing a source of flow, the destination of flow, and value. We can explicitly specify which column name to use as source column, destination column, and values. If we don't specify column names then it'll take the first column as the source, second column as a destination, and third column as value flow from source to destination.

We can specify chart attributes for holoviews chord charts by using %%opts jupyter notebook cell magic command. We can specify an attribute and its value in either brackets or parentheses. The figure dimensions attributes are specified in brackets and graph attributes are specified in parenthesis.

Below we are creating a chord chart showing traffic movement between cities of brazil.

In [6]:
%%opts Chord [height=600 width=600 title="Traffic Movement Between Cities" ]

chord = hv.Chord(flight_counts_bet_cities)
print(chord)
:Chord   [City_Orig,City_Dest]   (Count)
In [ ]:
chord

How to Plot Chord Diagram in Python [holoviews]?

We can also give labels to each node on the circle. We need to create holoviews Dataset object in order to do it as explained below. We need to pass it dataframe containing all cities’ names as a column.

In [8]:
cities = list(set(flight_counts_bet_cities["City_Orig"].unique().tolist() + flight_counts_bet_cities["City_Dest"].unique().tolist()))
cities_dataset = hv.Dataset(pd.DataFrame(cities, columns=["City"]))

Below we are creating a chord chart again for traffic movement between the cities of Brazil. This time we are giving flight dataset and cities dataset object created above as input to the Chord method. We also have set labels attribute to City which was specified as a column name when creating cities dataset.

In [ ]:
%%opts Chord [height=700 width=700 title="Traffic Movement Between Cities" labels="City"]

hv.Chord((flight_counts_bet_cities, cities_dataset))

How to Plot Chord Diagram in Python [holoviews]?

Chord Diagram Showing Traffic Movement Between Cities [Matplotlib]

We'll now set backend as matplotlib in order to plot the same chart using matplotlib.

In [ ]:
hv.extension("matplotlib")
hv.output(fig='svg', size=250)

How to Plot Chord Diagram in Python [holoviews]?

Below we are plotting the same chart as above but using matplotlib.

In [ ]:
%%opts Chord [height=700 width=700 title="Traffic Movement Between Cities" labels="City"]

hv.Chord((flight_counts_bet_cities, cities_dataset))

How to Plot Chord Diagram in Python [holoviews]?

Chord Diagram Showing Traffic Movement Between States [Bokeh]

We'll now create a chord chart showing traffic movement between states of the dataset.

In [ ]:
hv.extension("bokeh")

How to Plot Chord Diagram in Python [holoviews]?

Below we are creating another dataset that has information about flight movement between states. We are first grouping the original flight dataset based on origin state and destination state and then counting flights between each state combinations.

In [13]:
flight_counts_bet_states = brazil_flights.groupby(by=["State_Orig", "State_Dest"]).count()[["Voos"]].rename(columns={"Voos":"Count"}).reset_index()
flight_counts_bet_states = flight_counts_bet_states.sort_values(by="Count", ascending=False)
flight_counts_bet_states.head()
Out[13]:
State_Orig State_Dest Count
136 SP N/I 554
75 N/I SP 549
55 MG SP 438
47 MG MG 389
81 PA PA 375
In [14]:
states = list(set(flight_counts_bet_states["State_Orig"].unique().tolist() + flight_counts_bet_states["State_Dest"].unique().tolist()))
states_dataset = hv.Dataset(pd.DataFrame(states, columns=["State"]))

Below we have created a chord chart depicted traffic movement between states of the dataset. We also have a modified chart by setting colors of nodes and edges.

Important Information

If you don't remember configuration options when trying %%opts jupyter magic command then you can press tab inside of brackets and parenthesis. It'll display you list of available options. You can then try various values for that configuration options.

In [ ]:
%%opts Chord [height=700 width=700 title="Traffic Movement Between States" labels="State"]
%%opts Chord (node_color="State" node_cmap="Category20" edge_color="State_Orig" edge_cmap='Category20')

hv.Chord((flight_counts_bet_states, states_dataset))

How to Plot Chord Diagram in Python [holoviews]?

Chord Diagram Showing Traffic Movement Between States [Matplotlib]

In [ ]:
hv.extension("matplotlib")
hv.output(fig='svg', size=200)

How to Plot Chord Diagram in Python [holoviews]?

In [ ]:
%%opts Chord [title="Traffic Movement Between States" labels="State"]
%%opts Chord (node_color="State" node_cmap="Category20" edge_color="State_Orig" edge_cmap='Category20')

hv.Chord((flight_counts_bet_states, states_dataset))

How to Plot Chord Diagram in Python [holoviews]?

Chord Diagram Showing Traffic Movement Between Airports [Bokeh]

We'll now create a chord diagram showing traffic movement between airports of a dataset.

In [ ]:
hv.extension("bokeh")

How to Plot Chord Diagram in Python [holoviews]?

We'll first create an aggregated dataset which has information about flight count between source and destination airports. We also have filtered dataset to keep only airports which have more than 75 flights in order to prevent chord chart from getting crowded.

In [19]:
flight_counts_bet_airports = brazil_flights.groupby(by=["Airport_Orig", "Airport_Dest"]).count()[["Voos"]].rename(columns={"Voos":"Count"}).reset_index()
flight_counts_bet_airports = flight_counts_bet_airports.sort_values(by="Count", ascending=False)
flight_counts_bet_airports = flight_counts_bet_airports[flight_counts_bet_airports["Count"] > 75]
flight_counts_bet_airports.head()
Out[19]:
Airport_Orig Airport_Dest Count
122 Guarulhos - Governador Andre Franco Montoro Tancredo Neves 135
260 Tancredo Neves Guarulhos - Governador Andre Franco Montoro 126
253 Tancredo Neves Congonhas 124
104 Guarulhos - Governador Andre Franco Montoro Buenos Aires/Aeroparque 120
35 Buenos Aires/Aeroparque Guarulhos - Governador Andre Franco Montoro 120
In [20]:
airports = list(set(flight_counts_bet_airports["Airport_Orig"].unique().tolist() + flight_counts_bet_airports["Airport_Dest"].unique().tolist()))
airports_dataset = hv.Dataset(pd.DataFrame(airports, columns=["Airport"]))

We have modified many chart attributes below when plotting a chord chart explaining flight movement between airports.

In [ ]:
%%opts Chord [height=800 width=800 title="Traffic Movement Between Airports" labels="Airport" bgcolor="black"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="Airport_Orig" edge_cmap='Category20' edge_alpha=0.8)
%%opts Chord (edge_line_width=2 node_size=25 label_text_color="white")

hv.Chord((flight_counts_bet_airports, airports_dataset))

How to Plot Chord Diagram in Python [holoviews]?

Chord Diagram Showing Traffic Movement Between Airports [Matplotlib]

In [ ]:
hv.extension("matplotlib")
hv.output(fig='svg', size=250)

How to Plot Chord Diagram in Python [holoviews]?

In [ ]:
%%opts Chord [labels="Airport" title="Traffic Movement Between Airports"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="Airport_Orig" edge_cmap='Category20')
%%opts Chord (node_size=15 edge_alpha=0.8 edge_linewidth=1.0)

hv.Chord((flight_counts_bet_airports, airports_dataset))

How to Plot Chord Diagram in Python [holoviews]?

Chord Diagram Showing Immigration to New Zealand [Bokeh]

We'll now be using a chord chart to show population immigration to New Zealand in 2016. We'll start by setting backend as bokeh.

In [ ]:
hv.extension("bokeh")

How to Plot Chord Diagram in Python [holoviews]?

We'll first need to create a dataset that has a column with information about the source and destination county as well as population immigration between them. We'll filter the original new Zealand migration dataset by keeping only entries with arrivals and year as 2016. We'll then remove entries where continents data is present. We'll also create a new column named DestCountry having values as New Zealand. We'll then group by source country and the destination country and then sum up the population. We'll then remove entries which has less than 1000 value for the population in order to prevent the chart from getting cluttered.

In [25]:
immigration_to_nz = nz_migration[nz_migration["Measure"] == "Arrivals"]
immigration_to_nz = immigration_to_nz[immigration_to_nz["Year"]==2016]
immigration_to_nz = immigration_to_nz[~immigration_to_nz["Country"].isin(['All countries', 'Not stated', 'Asia', 'Europe',])]
immigration_to_nz = immigration_to_nz.groupby(by="Country").sum()[["Value"]]
immigration_to_nz["DestCountry"] = "New Zealand"
immigration_to_nz = immigration_to_nz.reset_index().rename(columns={"Country":"SourceCountry"})
immigration_to_nz = immigration_to_nz[["SourceCountry", "DestCountry", "Value"]].sort_values(by="Value", ascending=False)
immigration_to_nz = immigration_to_nz[immigration_to_nz.Value > 1000]
immigration_to_nz.head()
Out[25]:
SourceCountry DestCountry Value
164 Oceania New Zealand 54490.0
14 Australia New Zealand 47513.0
230 UK New Zealand 19466.0
45 China New Zealand 13176.0
5 Americas New Zealand 12084.0
In [26]:
immigrate_countries = list(set(immigration_to_nz["SourceCountry"].unique().tolist() + ["New Zealand"]))
immigrate_countries_dataset = hv.Dataset(pd.DataFrame(immigrate_countries, columns=["Country"]))
In [ ]:
%%opts Chord [height=800 width=800 title="Immigration to New Zealand [2016]" labels="Country" bgcolor="black"]
%%opts Chord (node_color="Country" node_cmap="Category20" edge_color="SourceCountry" edge_cmap='Category20' edge_alpha=0.8)
%%opts Chord (edge_line_width=3 node_size=25 label_text_color="lime")

hv.Chord((immigration_to_nz, immigrate_countries_dataset))

How to Plot Chord Diagram in Python [holoviews]?

Chord Diagram Showing Immigration to New Zealand [Matplotlib]

In [ ]:
hv.extension("matplotlib")
hv.output(fig='svg', size=250)

How to Plot Chord Diagram in Python [holoviews]?

In [ ]:
%%opts Chord [height=800 width=800 title="Immigration to New Zealand [2016]" labels="Country"]
%%opts Chord (node_color="Country" node_cmap="Category20" edge_color="SourceCountry" edge_cmap='Category20' edge_alpha=0.8)
%%opts Chord (node_size=15 edge_alpha=0.8 edge_linewidth=1.0)

hv.Chord((immigration_to_nz, immigrate_countries_dataset))

How to Plot Chord Diagram in Python [holoviews]?

Chord Diagram Showing Emigration from New Zealand [Bokeh]

We'll now create a chord diagram showing population emigration from New Zealand.

In [ ]:
hv.extension("bokeh")

How to Plot Chord Diagram in Python [holoviews]?

Below we are creating an emigration dataset exactly the same way as we create an immigration dataset with few minor changes in steps.

In [31]:
migration_from_nz = nz_migration[nz_migration["Measure"] == "Departures"]
migration_from_nz = migration_from_nz[migration_from_nz["Year"]==2016]
migration_from_nz = migration_from_nz[~migration_from_nz["Country"].isin(['All countries', 'Not stated', 'Asia', 'Europe',])]
migration_from_nz = migration_from_nz.groupby(by="Country").sum()[["Value"]]
migration_from_nz["SourceCountry"] = "New Zealand"
migration_from_nz = migration_from_nz.reset_index().rename(columns={"Country":"DestCountry"})
migration_from_nz = migration_from_nz[["SourceCountry", "DestCountry", "Value"]].sort_values(by="Value", ascending=False)
migration_from_nz = migration_from_nz[migration_from_nz.Value>1000]
migration_from_nz.head()
Out[31]:
SourceCountry DestCountry Value
164 New Zealand Oceania 49314.0
14 New Zealand Australia 46758.0
230 New Zealand UK 13719.0
5 New Zealand Americas 8688.0
233 New Zealand USA 4442.0
In [32]:
emigrate_countries = list(set(migration_from_nz["DestCountry"].unique().tolist() + ["New Zealand"]))
emigrate_countries_dataset = hv.Dataset(pd.DataFrame(emigrate_countries, columns=["Country"]))
In [ ]:
%%opts Chord [height=800 width=800 title="Emigration from New Zealand [2016]" labels="Country" bgcolor="black"]
%%opts Chord (node_color="Country" node_cmap="Category20" edge_color="DestCountry" edge_cmap='Category20' edge_alpha=0.8)
%%opts Chord (edge_line_width=3 node_size=25 label_text_color="cyan")

hv.Chord((migration_from_nz, emigrate_countries_dataset))

How to Plot Chord Diagram in Python [holoviews]?

Chord Diagram Showing Emigration from New Zealand [Matplotlib]

In [ ]:
hv.extension("matplotlib")
hv.output(fig='svg', size=250)

How to Plot Chord Diagram in Python [holoviews]?

In [ ]:
%%opts Chord [height=800 width=800 title="Emigration from New Zealand [2016]" labels="Country"]
%%opts Chord (node_color="Country" node_cmap="Category20" edge_color="DestCountry" edge_cmap='Category20' edge_alpha=0.8)
%%opts Chord (node_size=15 edge_alpha=0.8 edge_linewidth=1.0)

hv.Chord((migration_from_nz, emigrate_countries_dataset))

How to Plot Chord Diagram in Python [holoviews]?

This ends our small tutorial explaining how to plot a chord diagram using holoviews. Please feel free to let us know your views in the comments section.

References



Sunny Solanki  Sunny Solanki