Share @ LinkedIn Facebook  treemap, plotly
Treemap in Python (plotly)

Treemap in Python (plotly)

Treemap are quite common ways to see the distribution of quantities with hierarchical structures. We have many situations where data is hierarchical like population by country/continent, market capitalization by sector/company, area distribution by country/continent, etc. We need an efficient way to represent this data so that we can analyze distribution down to a few layers of hierarchy. We'll be explaining how to draw treemaps using few examples below using plotly as our primary library.

In [1]:
import pandas as pd
import numpy as np

import plotly.express as px

Loading Datasets

We'll start the loading list of below datasets which will be used for plotting treemap.

In [2]:
district_wise_population = pd.read_csv("datasets/indian-census-data-with-geospatial-indexing/district wise population and centroids.csv")
district_wise_population["Country"] = "India"
district_wise_population.head()
Out[2]:
State District Latitude Longitude Population in 2001 Population in 2011 Country
0 Andhra Pradesh Anantapur 14.312066 77.460158 3640478 4081148 India
1 Andhra Pradesh Chittoor 13.331093 78.927639 3745875 4174064 India
2 Andhra Pradesh East Godavari 16.782718 82.243207 4901420 5154296 India
3 Andhra Pradesh Guntur 15.884926 80.586576 4465144 4887813 India
4 Andhra Pradesh Krishna 16.143873 81.148051 4187841 4517398 India
In [3]:
world_data = pd.read_csv("datasets/countries of the world.csv")
world_data.head()
Out[3]:
Country Region Population Area (sq. mi.) Pop. Density (per sq. mi.) Coastline (coast/area ratio) Net migration Infant mortality (per 1000 births) GDP ($ per capita) Literacy (%) Phones (per 1000) Arable (%) Crops (%) Other (%) Climate Birthrate Deathrate Agriculture Industry Service
0 Afghanistan ASIA (EX. NEAR EAST) 31056997 647500 48,0 0,00 23,06 163,07 700.0 36,0 3,2 12,13 0,22 87,65 1 46,6 20,34 0,38 0,24 0,38
1 Albania EASTERN EUROPE 3581655 28748 124,6 1,26 -4,93 21,52 4500.0 86,5 71,2 21,09 4,42 74,49 3 15,11 5,22 0,232 0,188 0,579
2 Algeria NORTHERN AFRICA 32930091 2381740 13,8 0,04 -0,39 31 6000.0 70,0 78,1 3,22 0,25 96,53 1 17,14 4,61 0,101 0,6 0,298
3 American Samoa OCEANIA 57794 199 290,4 58,29 -20,71 9,27 8000.0 97,0 259,5 10 15 75 2 22,46 3,27 NaN NaN NaN
4 Andorra WESTERN EUROPE 71201 468 152,1 0,00 6,6 4,05 19000.0 100,0 497,2 2,22 0 97,78 3 8,71 6,25 NaN NaN NaN
In [4]:
starbucks_stores = pd.read_csv("datasets/starbucks_store_locations.csv")
starbucks_stores = starbucks_stores.groupby(["Country","State/Province","City"]).count()[["Store Number"]].rename(columns={"Store Number":"Count"})
starbucks_stores = starbucks_stores.reset_index()

starbucks_stores.head()
Out[4]:
Country State/Province City Count
0 AD 7 Andorra la Vella 1
1 AE AJ Ajman 2
2 AE AZ Abu Dhabi 40
3 AE AZ Al Ain 8
4 AE DU Abu Dhabi 3

Indian State/District Population Distribution 2001 Treemap

Our first treemap consists of the population distribution of India per state per district for the year 2001. We have used 3 layers of hierarchical data here ['Country', 'State', 'District']. We need to pass categorical columns to path attribute whereas numerical column to values attribute to get a distribution of values by path hierarchy.

In [ ]:
fig = px.treemap(district_wise_population,
                 path=['Country', 'State', 'District'],
                 values='Population in 2001')

fig.update_layout(title="Indian State/District Population Distribution 2001",
                  width=1000, height=700,)

fig.show()

Indian State/District Population Distribution 2001 Treemap Plotly

Indian State/District Population Distribution 2011 Treemap

Our second treemap consists of the population distribution of India per state per district for the year 2011.

In [ ]:
fig = px.treemap(district_wise_population,
                 path=['Country', 'State', 'District'],
                 values='Population in 2011',
                 color="District",
                 width=1000, height=700,
                 title="Indian State/District Population Distribution 2011",
                 )

fig.show()

Indian State/District Population Distribution 2011 Treemap Plotly

World Population Distribution Treemap

Our third treemap consists of population distribution per country per continent/region.

In [ ]:
fig = px.treemap(world_data,
                 path=['Region', 'Country'],
                 values='Population',
                 color='Country',
                 hover_data=['Area (sq. mi.)','Pop. Density (per sq. mi.)'],
                 width=1000, height=700,
                 title="World Population Distribution",)


fig.show()

World Population Distribution Treemap Plotly

Starbucks Store Counts Per City, State, Country Treemap

Our fourth treemap consists of Starbucks store counts per city per state per country for the whole world.We have color-encoded it by country.

In [ ]:
fig = px.treemap(starbucks_stores,
                 path=["Country","State/Province","City"],
                 values='Count',
                 color='Country',
                 width=1000, height=700,
                 title="Starbucks Store Counts Per City, State, Country",)

fig.show()

Starbucks Store Counts Per City, State, Country Treemap Plotly

World Area Distribution Color-encoded by GDP Treemap

Our fifth treemap consists of Area distribution per region per country for the whole world. We have also color-encoded it by GDP for each country so that we can see how area and GDP are related.

We can notice from the below graph that countries like Russia, China, Canada, the US, Brazil, Australia have high areas but GDP per capita is high for Canada, US, Australia.

In [ ]:
fig = px.treemap(world_data,
                 path=['Region', 'Country'],
                 values='Area (sq. mi.)',
                 color='GDP ($ per capita)',
                 color_continuous_scale='RdYlGn',
                  )

fig.update_layout(title="World Area Distribution Color-encoded by GDP",
                  width=1000, height=600,)

fig.show()

World Area Distribution Color-encoded by GDP Treemap Plotly

World Population Distribution Color-encoded by GDP Treemap

Our sixth treemap consists of Population distribution per region per country for the whole world. We have also color-encoded it by GDP for each country so that we can see how Population and GDP are related.

We can notice from below graph that countries like China, India, US, Brazil, Pakistan, Indonesia has high area but GDP per capita is high for the US, Australia, and most Europe countries.

In [ ]:
fig = px.treemap(world_data,
                 path=['Region', 'Country'],
                 values='Population',
                 color='GDP ($ per capita)',
                 color_continuous_scale='RdBu',
                 width=1000, height=600,
                 title="World Population Distribution Color-encoded by GDP",)

fig.show()

World Population Distribution Color-encoded by GDP Treemap Plotly

This ends our small tutorial on generating Treemap in python using plotly. Please feel free to let us know your views in the comment section.

References


Sunny Solanki  Sunny Solanki