The sunburst diagram can be used to visualize the distribution of hierarchical variables of data. It represents distribution with a list of rings around the center circle. The central circle represents the total quantity of a particular attribute and then each ring around it represents distribution at that level to a relationship with parent ring which is inside of it. The common example to explain the usage of sunburst chart would population distribution of world where the central circle represents total world distribution, ring around it represents distribution per continent, ring around it represents distribution per country of each continent and ring around it can further used to for distribution per state of each country.
The sunburst chart is very similar to treemap charts with the only difference that data is laid out radially. If you are interested in learning treemap plotting using python then feel free to go through our tutorial on treemap which explains various ways to draw treemap in python.
We'll start by importing necessary libraries.
import pandas as pd
import numpy as np
pd.set_option("max_columns", 30)
import plotly.express as px
import plotly.graph_objects as go
We'll also be using 3 datasets available from kaggle to include further data for analysis and plotting.
We suggest that you download allthe datasets to follow along with us through the tutorial.
starbucks_locations = pd.read_csv("datasets/starbucks_store_locations.csv")
starbucks_locations.head()
world_countries_data = pd.read_csv("datasets/countries of the world.csv")
world_countries_data["World"] = "World"
world_countries_data.head()
indian_district_population = pd.read_csv("datasets/indian-census-data-with-geospatial-indexing/district wise population for year 2001 and 2011.csv")
indian_district_population["Country"] = "India"
indian_district_population.head()
There are two ways to generate a sunburst chart using plotly. It provides two APIs for generating sunburst charts.
plotly.express
- It provides method named sunburst()
to create sunburst charts.plotly.graph_objects
- It provides method named Sunburst()
to create charts.We'll be explaining both ways one by one below.
The plotly has a module named express
which provides easy to use method named sunburst()
which can be used to create sunburst charts. It accepts dataframe containing data, columns to use for hierarchical, and column to use for actual values of the distribution. We can provide a list of columns with hierarchical relations as list to the pathattribute of the method. The values to use to decide sizes of distribution circles can be provided as a column name to the
valuesattribute. We can also provide
title,
width, and height
attribute of the figure. The sunburst()
method returns figure object which can be used to show a chart by calling show()
method on it.
We'll need to prepare the dataset first in order to show Starbucks store counts distribution per city, and country worldwide. We'll be grouping the original Starbucks dataset according to Country, and City. Then we'll call count()
on it which will count entry for each possible combination of Country and City. We also have introduceda new column named World which has all valuesthe same containing string World. We have created this column to createa circle inthe center to seethe total worldwide count.
starbucks_dist = starbucks_locations.groupby(by=["Country", "State/Province", "City"]).count()[["Store Number"]].rename(columns={"Store Number":"Count"})
starbucks_dist["World"] = "World"
starbucks_dist = starbucks_dist.reset_index()
starbucks_dist.head()
fig = px.sunburst(starbucks_dist,
path=["World", "Country", "State/Province", "City"],
values='Count',
title="Starbucks Store Count Distribution World Wide [Country, State, City]",
width=750, height=750)
fig.show()
Below we are creating a sunburst chart depicting population distribution per district per the of India in 2011. We have passed the path parameter list of columns necessary to createa hierarchy. We have covered this in our tutorial on treemap as well.
fig = px.sunburst(indian_district_population,
path=["Country", "State", "District",],
values='Population in 2011',
width=750, height=750,
title="Indian District Population Per State",
)
fig.show()
Below we have created a sunburst chart showing population count per country per region of the world. We have provided necessary columns having a hierarchical relationship to the path parameter of the method.
fig = px.sunburst(world_countries_data,
path=["World", "Region", "Country"],
values='Population',
width=750, height=750,
title="World Population Per Country Per Region",
)
fig.show()
Below the sunburst chart explains area distribution per country per region worldwide.
fig = px.sunburst(world_countries_data,
path=["World", "Region", "Country"],
values='Area (sq. mi.)',
width=750, height=750,
title="World Area Per Country Per Region",
)
fig.show()
Below we have again plotted sunburst chart explaining population distribution per country per region but we have also color encoded each distribution according to GDP of that country/region. We can compare the population and GDP of the country based on this sunburst chart. We can notice that countries like India and China have less GDP even though having more population whereas countries like the US, Japan, Germany, UK, France, Australia, Hong Kong have less population but more GDP.
fig = px.sunburst(world_countries_data,
path=["World", "Region", "Country"],
values='Population',
width=750, height=750,
color_continuous_scale="BrBG",
color='GDP ($ per capita)',
title="World Population Per Country Per Region Color-Encoded By GDP"
)
fig.show()
Below we have again plotted population distribution per country per region of the world but this time we have color encoded data to the area of countries and region. This helps us compare the relationship between population and area. We can notice that countries like India are more but has less area compared to countries like Russia, the United States, Brazil which has visibly more area with less population.
fig = px.sunburst(world_countries_data,
path=["World", "Region", "Country"],
values='Population',
width=750, height=750,
color_continuous_scale="RdYlGn",
color='Area (sq. mi.)',
title="World Population Per Country Per Region Color-Encoded By Area"
)
fig.show()
The second way of creating a sunburst chart using plotly is using the Sunburst()
method of the graph_objects
module. We need to provide it a list of all possible combination of parent and child combination and their values in order to create a chart using this method.
In order to create a sunburst chart using graph_objects.Sunburst()
method, we have done little preprocessing with data. The Sunburst()
method expects that we provided all possible parent-child relationship labels and their values to it. We have region-country relation labels and values ready in the dataset but for getting world-region relationship labels and values we have grouped dataframe according to the region in order to get region-wise population counts. We have then combined labels in order to generate all possible parent-child relationship labels as well as values.
region_wise_pop = world_countries_data.groupby(by="Region").sum()[["Population"]].reset_index()
parents = [""] + ["World"] *region_wise_pop.shape[0] + world_countries_data["Region"].values.tolist()
labels = ["World"] + region_wise_pop["Region"].values.tolist() + world_countries_data["Country"].values.tolist()
values = [world_countries_data["Population"].sum()] + region_wise_pop["Population"].values.tolist() + world_countries_data["Population"].values.tolist()
fig =go.Figure(go.Sunburst(
parents=parents,
labels= labels,
values= values,
))
fig.update_layout(title="World Population Per Country Per Region",
width=700, height=700)
fig.show()
Below we have again created a sunburst chart of population distribution but this time it looks completely like the plotly.express
module. We have set the branchvalues
parameter to string value total
which fills the whole circle. By default, the Sunburst()
method does not create full circle sunburst charts.
region_wise_pop = world_countries_data.groupby(by="Region").sum()[["Population"]].reset_index()
parents = [""] + ["World"] *region_wise_pop.shape[0] + world_countries_data["Region"].values.tolist()
labels = ["World"] + region_wise_pop["Region"].values.tolist() + world_countries_data["Country"].values.tolist()
values = [world_countries_data["Population"].sum()] + region_wise_pop["Population"].values.tolist() + world_countries_data["Population"].values.tolist()
fig =go.Figure(go.Sunburst(
parents=parents,
labels= labels,
values= values,
branchvalues="total",
))
fig.update_layout(title="World Population Per Country Per Region",
width=700, height=700)
fig.show()
Below we have combined two sunburst charts into a single figure. One sunburst chart is about world population distribution per country per region and another is about area distribution per country per region. We can combine many related sunburst charts this way to show possible relationships. Please go through code to understand little preprocessing in order to create charts.
fig = go.Figure()
parents = [""] + ["World"] *region_wise_pop.shape[0] + world_countries_data["Region"].values.tolist()
labels = ["World"] + region_wise_pop["Region"].values.tolist() + world_countries_data["Country"].values.tolist()
values = [world_countries_data["Population"].sum()] + region_wise_pop["Population"].values.tolist() + world_countries_data["Population"].values.tolist()
fig.add_trace(go.Sunburst(
parents=parents,
labels= labels,
values= values,
domain=dict(column=0),
name="Population Distribution"
))
region_wise_area = world_countries_data.groupby(by="Region").sum()[["Area (sq. mi.)"]].reset_index()
parents = [""] + ["World"] *region_wise_area.shape[0] + world_countries_data["Region"].values.tolist()
labels = ["World"] + region_wise_area["Region"].values.tolist() + world_countries_data["Country"].values.tolist()
values = [world_countries_data["Area (sq. mi.)"].sum()] + region_wise_area["Area (sq. mi.)"].values.tolist() + world_countries_data["Area (sq. mi.)"].values.tolist()
fig.add_trace(go.Sunburst(
parents=parents,
labels= labels,
values= values,
domain=dict(column=1)
))
fig.update_layout(
grid= dict(columns=2, rows=1),
margin = dict(t=0, l=0, r=0, b=0),
width=900, height=700
)
fig.show()
This ends our small tutorial explaining how to plot a sunburst chart in python using plotly. Please feel free to let us know your views in the comments section.