Updated On : Sep-23,2021 Tags maps, plotnine, choropleth
Maps using Plotnine (Choropleth, Scatter, and Bubble Maps)

Maps using Plotnine (Choropleth, Scatter, and Bubble Maps)

Datasets nowadays generally have location-related information present in them. Accurately plotting location-related information on maps can give useful insights which can help make a better decision during data analysis. Location information can be present in the form of exact location name (Country, city, etc), location id, or longitude & latitude information. Other information can be presented on a map by merging it with geospatial data. Map charts like choropleth maps, scatter maps, bubble maps and connection maps are commonly used to represents information on maps. Python provides a list of libraries (geopandas, bokeh, plotly, folium, ipyleaflet, etc.) to deal with geospatial data and present information on a map. Some of them provide static maps whereas some provide interactive maps.

As a part of this tutorial, we'll be concentrating on library name Plotnine to create maps in python. We'll be creating choropleth maps and scatter maps with simple examples. Plotnine is a python library that is based on the concept of the grammar of graphics. The plotnine has almost the same interface as that of ggplot2 library of R Programming. The grammar of the graphics concept defines the chart into a list of layers, creates layers individually, and then combines them to create a full chart. We have covered in detail how Plotnine works and how to get started with it in a separate tutorial. If you are interested in learning about it then please feel free to check it.

We won't be covered in details about the inner workings of plotnine or other details as it has already been covered in that tutorial. In this tutorial, we'll start directly with chart creation as we expect that the individual reading this tutorial has a little background of plotnine. If you are just starting out with plotnine then we recommend that you take some time and read our tutorial about it linked above as it'll help you get started with the library.

We'll start by importing the necessary libraries for our tutorial. We have imported geopandas as it provides a dataframe that has geospatial data for the world. We'll be using the data frames available from it for our tutorial.

In [1]:
import plotnine
In [2]:
import geopandas as gpd
import pandas as pd

print("Plotnine Version : {}".format(plotnine.__version__))
print("Geopandas Version : {}".format(gpd.__version__))

gpd.datasets.available
Plotnine Version : 0.8.0
Geopandas Version : 0.9.0
Out[2]:
['naturalearth_cities', 'naturalearth_lowres', 'nybb']

Below we have loaded naturalearth_lowres dataset which has geospatial data about the whole world. We'll be using it to plot a world map and other information on it. We have loaded it as a geopandas dataframe and printed the first few rows to give an idea about the contents of the dataset.

In [3]:
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
print("Geometry Column Name : ", world.geometry.name)
print("Dataset Size : ", world.shape)
world.head()
Geometry Column Name :  geometry
Dataset Size :  (177, 6)
Out[3]:
pop_est continent name iso_a3 gdp_md_est geometry
0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253 Africa W. Sahara ESH 906.5 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 35623680 North America Canada CAN 1674000.0 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 326625791 North America United States of America USA 18560000.0 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

Below we have loaded another geo JSON dataset that has geospatial information about US states. It has geometry information (polygon representing the state) about each state of the US. We'll be using this dataset to represent on US map.

The dataset can be downloaded from the below link.

In [4]:
us_states_geo = gpd.read_file("datasets/us-states.json")

us_states_geo.head()
Out[4]:
id name geometry
0 AL Alabama POLYGON ((-87.35930 35.00118, -85.60667 34.984...
1 AK Alaska MULTIPOLYGON (((-131.60202 55.11798, -131.5691...
2 AZ Arizona POLYGON ((-109.04250 37.00026, -109.04798 31.3...
3 AR Arkansas POLYGON ((-94.47384 36.50186, -90.15254 36.496...
4 CA California POLYGON ((-123.23326 42.00619, -122.37885 42.0...

The third dataset that we have loaded is world happiness data which has information for each country about attributes like happiness, GDP, social support, healthy life expectancy, etc. We'll be merging this dataset with the geopandas world dataset loaded earlier to create maps with information present from this dataset.

In [5]:
world_happiness = pd.read_csv("datasets/world_happiness_2019.csv")
print("Dataset Size : ",world_happiness.shape)
world_happiness.head()
Dataset Size :  (156, 9)
Out[5]:
Overall rank Country or region Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption
0 1 Finland 7.769 1.340 1.587 0.986 0.596 0.153 0.393
1 2 Denmark 7.600 1.383 1.573 0.996 0.592 0.252 0.410
2 3 Norway 7.554 1.488 1.582 1.028 0.603 0.271 0.341
3 4 Iceland 7.494 1.380 1.624 1.026 0.591 0.354 0.118
4 5 Netherlands 7.488 1.396 1.522 0.999 0.557 0.322 0.298

Below we have merged the world happiness dataset loaded in the previous cell with the geopandas world dataset based on the country name. We have printed the first few rows of the dataset to show data present in the dataframe.

In [6]:
world_total_data = world.merge(world_happiness, left_on="name", right_on="Country or region")

world_total_data.head()
Out[6]:
pop_est continent name iso_a3 gdp_md_est geometry Overall rank Country or region Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption
0 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 153 Tanzania 3.231 0.476 0.885 0.499 0.417 0.276 0.147
1 35623680 North America Canada CAN 1674000.0 MULTIPOLYGON (((-122.84000 49.00000, -122.9742... 9 Canada 7.278 1.365 1.505 1.039 0.584 0.285 0.308
2 326625791 North America United States of America USA 18560000.0 MULTIPOLYGON (((-122.84000 49.00000, -120.0000... 19 United States of America 6.892 1.433 1.457 0.874 0.454 0.280 0.128
3 18556698 Asia Kazakhstan KAZ 460700.0 POLYGON ((87.35997 49.21498, 86.59878 48.54918... 60 Kazakhstan 5.809 1.173 1.508 0.729 0.410 0.146 0.096
4 29748859 Asia Uzbekistan UZB 202300.0 POLYGON ((55.96819 41.30864, 55.92892 44.99586... 41 Uzbekistan 6.174 0.745 1.529 0.756 0.631 0.322 0.240

World Happiness Choropleth Map

As a part of this section, we have created a choropleth map of the world happiness score. The country is colored based on the happiness score of that country. The score ranges from 0-10.

Our code for this example creates individual layers of the map and then adds them all to create the final map. We have first created a chart with data and mapping information. The mapping information (aes function call) states that fill color should be based on Score column. We have then created a map using geom_map() method. Then we have created a title for the chart. We have also created theme details and colormap details objects separately. At last, we have added all individual layers to create the final choropleth map. This is the same format, we'll be following to create all maps.

NOTE

Please make a note that data provided for plotnine for plotting maps requires column named geometry in them which should have information about map objects (polygons representing country, states, city, etc.).

In [ ]:
from plotnine import ggplot, geom_map, aes, scale_fill_cmap, theme, labs

chart = ggplot(data=world_total_data, mapping=aes(fill="Score"))
map_proj = geom_map()
labels = labs(title="World Happiness Score Choropleth Map")
theme_details = theme(figure_size=(12,6))
colormap = scale_fill_cmap(cmap_name="Blues")

world_happiness_choropleth = chart + map_proj + labels + theme_details + colormap

world_happiness_choropleth

Maps using Plotnine (Choropleth, Scatter, and Bubble Maps)

World Healthy Life Expectancy Choropleth Map

Below we have created another choropleth map which shows information about healthy life expectancy for countries worldwide. We have used the same approach to create a chart like our previous example. The code for this example is almost the same as the previous example with few minor changes.

We have added color attribute in mapping to inform it what color of the line should be in each country. We have set it the same as fill attribute so that it blends in with fill color, unlike the previous map chart. We have also used different colormap in this example.

In [ ]:
from plotnine import scale_color_cmap

chart = ggplot(data=world_total_data, mapping=aes(fill="Healthy life expectancy", color="Healthy life expectancy"))
map_proj = geom_map()
labels = labs(title="World Healthy Life Expectancy Choropleth Map")
theme_details = theme(figure_size=(12,6))
fill_colormap = scale_fill_cmap(cmap_name="RdYlGn")
color_colormap = scale_color_cmap(cmap_name="RdYlGn")

world_happiness_choropleth = chart + map_proj + labels + theme_details + fill_colormap + color_colormap

world_happiness_choropleth

Maps using Plotnine (Choropleth, Scatter, and Bubble Maps)

Freedom to Make Life Choices in Asian Countries

As a part of this section, we have created a choropleth map representing freedom to make life choices in Asian countries.

To create this map, we have filtered our world dataset to keep only entries where continent is Asia. We have then followed the same steps as the previous example to create a choropleth map.

In [ ]:
asia_data = world_total_data[world_total_data["continent"] == 'Asia']

chart = ggplot(data=asia_data, mapping=aes(fill="Freedom to make life choices", color="Freedom to make life choices"))
map_proj = geom_map()
labels = labs(title="Asia freedom to make life choices Choropleth Map")
theme_details = theme(figure_size=(10,7))
fill_colormap = scale_fill_cmap(cmap_name="PiYG")
color_colormap = scale_color_cmap(cmap_name="PiYG")

asia_happiness_choropleth = chart + map_proj + labels + theme_details + fill_colormap + color_colormap

asia_happiness_choropleth

Maps using Plotnine (Choropleth, Scatter, and Bubble Maps)

US States Population 2018 Choropleth Map

As a part of this example, we'll create a choropleth map showing a population of US states in 2018. We have loaded the dataset first as a pandas data frame.

In [11]:
us_state_pop = pd.read_csv("datasets/State Populations.csv")
us_state_pop.head()
Out[11]:
State 2018 Population
0 California 39776830
1 Texas 28704330
2 Florida 21312211
3 New York 19862512
4 Pennsylvania 12823989

Below we have created a dataset by merging the US states geo dataframe and the US states population dataset. We'll be using this merged dataset which has both geospatial and population details for each state to plot a choropleth map.

In [12]:
us_states_pop = us_states_geo.merge(us_state_pop, left_on="name", right_on="State")

us_states_pop.head()
Out[12]:
id name geometry State 2018 Population
0 AL Alabama POLYGON ((-87.35930 35.00118, -85.60667 34.984... Alabama 4888949
1 AK Alaska MULTIPOLYGON (((-131.60202 55.11798, -131.5691... Alaska 738068
2 AZ Arizona POLYGON ((-109.04250 37.00026, -109.04798 31.3... Arizona 7123898
3 AR Arkansas POLYGON ((-94.47384 36.50186, -90.15254 36.496... Arkansas 3020327
4 CA California POLYGON ((-123.23326 42.00619, -122.37885 42.0... California 39776830

Below we have created a choropleth map using a dataframe from the previous step representing the US states population. Our code follows the same approach that we have followed in our previous examples. We have added few extra lines of code to improve the aesthetics of the chart. We have put x/y axes limits and modified the chart background color.

In [ ]:
from plotnine import scale_color_cmap, xlim, ylim, element_rect

chart = ggplot()
map_proj = geom_map(data=us_states_pop, mapping=aes(fill="2018 Population", color="2018 Population"))
labels = labs(title="US 2018 Population Choropleth Map")
theme_details = theme(figure_size=(10,6), panel_background=element_rect(fill="snow"))
fill_colormap = scale_fill_cmap(cmap_name="RdYlBu")
color_colormap = scale_color_cmap(cmap_name="RdYlBu")
xlimit = xlim(-170,-60)
ylimit = ylim(25, 72)

us_pop_choropleth = chart + map_proj + labels + theme_details + fill_colormap + color_colormap + xlimit + ylimit

us_pop_choropleth

Maps using Plotnine (Choropleth, Scatter, and Bubble Maps)

Starbucks Store Count Per US States

In this example, we'll be plotting the Starbucks store count for each state of the US. We have downloaded the Starbucks stores information dataset from the below link and loaded it as a pandas dataframe. We have printed the first few lines to show the contents of the dataset.

In [14]:
starbucks_stores = pd.read_csv("datasets/starbucks_store_locations.csv")

starbucks_stores.head()
Out[14]:
Brand Store Number Store Name Ownership Type Street Address City State/Province Country Postcode Phone Number Timezone Longitude Latitude
0 Starbucks 47370-257954 Meritxell, 96 Licensed Av. Meritxell, 96 Andorra la Vella 7 AD AD500 376818720 GMT+1:00 Europe/Andorra 1.53 42.51
1 Starbucks 22331-212325 Ajman Drive Thru Licensed 1 Street 69, Al Jarf Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.42
2 Starbucks 47089-256771 Dana Mall Licensed Sheikh Khalifa Bin Zayed St. Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.39
3 Starbucks 22126-218024 Twofour 54 Licensed Al Salam Street Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.38 24.48
4 Starbucks 17127-178586 Al Ain Tower Licensed Khaldiya Area, Abu Dhabi Island Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.54 24.51

Below we have filtered the Starbucks dataset to keep only rows where a country is the US. We have then grouped the filtered dataset based on state and calculated the count of entries per state. The final dataset will have a count of Starbucks stores for each state.

In [15]:
us_stores = starbucks_stores[starbucks_stores.Country=="US"]
us_stores_statewise_cnt = us_stores.groupby("State/Province").count()[["Store Name"]].rename(columns={"Store Name":"Count"})
us_stores_statewise_cnt = us_stores_statewise_cnt.reset_index()
us_stores_statewise_cnt.head()
Out[15]:
State/Province Count
0 AK 49
1 AL 85
2 AR 55
3 AZ 488
4 CA 2821

Below we have merged the Starbucks US stores count dataset from the previous cell with the US states geo dataset from earlier. The final merged dataset will have information about geographical data of US states and Starbucks stores count per state.

In [16]:
us_stores_statewise = us_states_geo.merge(us_stores_statewise_cnt, left_on="id", right_on="State/Province")

us_stores_statewise.head()
Out[16]:
id name geometry State/Province Count
0 AL Alabama POLYGON ((-87.35930 35.00118, -85.60667 34.984... AL 85
1 AK Alaska MULTIPOLYGON (((-131.60202 55.11798, -131.5691... AK 49
2 AZ Arizona POLYGON ((-109.04250 37.00026, -109.04798 31.3... AZ 488
3 AR Arkansas POLYGON ((-94.47384 36.50186, -90.15254 36.496... AR 55
4 CA California POLYGON ((-123.23326 42.00619, -122.37885 42.0... CA 2821

Below we have created a choropleth map which shows Starbucks stores count for each state of the united states. We can use it to analyze where Starbucks stores are concentrated more and where it's less.

In [ ]:
from plotnine import scale_color_cmap, xlim, ylim, element_rect

chart = ggplot()
map_proj = geom_map(data=us_stores_statewise, mapping=aes(fill="Count", color="Count"))
labels = labs(title="Starbucks US Stores Choropleth Map")
theme_details = theme(figure_size=(10,6), panel_background=element_rect(fill="#a3ccff"))
fill_colormap = scale_fill_cmap(cmap_name="RdBu")
color_colormap = scale_color_cmap(cmap_name="RdBu")
xlimit = xlim(-170,-60)
ylimit = ylim(25, 72)

us_stores_choropleth = chart + map_proj + labels + theme_details + fill_colormap + color_colormap + xlimit + ylimit

us_stores_choropleth

Maps using Plotnine (Choropleth, Scatter, and Bubble Maps)

Starbucks Store Locations Across World

In this example, we'll explain how we can put points on a map to create a scatter map. We'll show locations of stores worldwide using a scatter map. This can be used to analyze the concentration of stores.

Our code for this example starts by creating a chart with the world dataset that we had loaded earlier from geopandas. We then add mapping to the chart as a part of geom_map() method. We have this time not provided the column name of the dataset in mapping. Instead, we have provided a single value to color all countries and their border using one color.

We have added points on the chart using geom_point() method. We have provided the Starbucks stores dataset that we had loaded earlier and mapping details to it. The mapping details instructs to use Longitude columns as X-axis and Latitude column as Y-axis. We have instructed to color points with tomato color.

At last, we have created the final scatter map by adding up all individual layers.

In [ ]:
from plotnine import geom_point

chart = ggplot(data=world)
map_proj = geom_map(fill="white", color="lightgrey")
labels = labs(title="World Starbucks Stores Scatter Map")
theme_details = theme(figure_size=(12,6.5))

scatter_points = geom_point(data=starbucks_stores.dropna(),
                            mapping=aes(x="Longitude", y="Latitude"),
                            color="tomato", alpha=0.3, size=1)

world_starbucks_stores = chart + map_proj + labels + theme_details + scatter_points

world_starbucks_stores

Maps using Plotnine (Choropleth, Scatter, and Bubble Maps)

Starbucks Store Locations Across US

In this example, we have created a scatter map showing store locations across the US. The code is almost exactly the same as our previous example with few minor changes. It uses the US state geo dataset for plotting the US chart and us stores dataset to plot points showing store locations on a map.

In [ ]:
chart = ggplot(data=us_states_geo)
map_proj = geom_map(fill="white", color="lightgrey")
labels = labs(title="US Starbucks Stores Map")
theme_details = theme(figure_size=(12,6.5))
xlimit = xlim(-170,-60)
ylimit = ylim(25, 72)

scatter_points = geom_point(data=us_stores.dropna(),
                            mapping=aes(x="Longitude", y="Latitude"),
                            color="tomato", alpha=0.3, size=1)

us_starbucks_stores = chart + map_proj + labels + theme_details + xlimit + ylimit + scatter_points

us_starbucks_stores

Maps using Plotnine (Choropleth, Scatter, and Bubble Maps)

Store Count per US States Bubble Map

As a part of this example, we'll create a bubble map that shows bubbles on the US chart for each state where bubble size will be based on a count of Starbucks stores in that state.

We have created a helpful method named calculate_center() which takes as input pandas dataframe which has one column with geodata and returns center of the region represented by each individual geographic region. We'll be using this method to find the center of each region which is an individual US state in our case and will be plotting bubbles on the map at those center locations.

We have introduced extra columns center, x, x2, and y for our purpose in our US Starbucks stores count dataset. The x2 column is column x shifted by value of 2.2. This is done to prevent labels from overlapping on bubbles. We'll be using this final modified dataset for plotting a bubble map.

In [21]:
def calculate_center(df):
    """
    Calculate the centre of a geometry

    This method first converts to a planar crs, gets the centroid
    then converts back to the original crs. This gives a more
    accurate
    """
    original_crs = df.crs
    planar_crs = 'EPSG:3857'
    return df['geometry'].to_crs(planar_crs).centroid.to_crs(original_crs)

us_stores_statewise["center"] = calculate_center(us_stores_statewise)
us_stores_statewise["x"] = [val.x for val in us_stores_statewise.center]
us_stores_statewise["x2"] = [val.x+2.2 for val in us_stores_statewise.center]
us_stores_statewise["y"] = [val.y for val in us_stores_statewise.center]

us_stores_statewise.head()
Out[21]:
id name geometry State/Province Count center x x2 y
0 AL Alabama POLYGON ((-87.35930 35.00118, -85.60667 34.984... AL 85 POINT (-86.82705 32.81439) -86.827048 -84.627048 32.814386
1 AK Alaska MULTIPOLYGON (((-131.60202 55.11798, -131.5691... AK 49 POINT (-152.52500 65.00297) -152.525004 -150.325004 65.002968
2 AZ Arizona POLYGON ((-109.04250 37.00026, -109.04798 31.3... AZ 488 POINT (-111.66516 34.33632) -111.665157 -109.465157 34.336315
3 AR Arkansas POLYGON ((-94.47384 36.50186, -90.15254 36.496... AR 55 POINT (-92.43914 34.91573) -92.439137 -90.239137 34.915733
4 CA California POLYGON ((-123.23326 42.00619, -122.37885 42.0... CA 2821 POINT (-119.68388 37.38770) -119.683878 -117.483878 37.387697

Below we have created a bubble map showing Starbucks stores count for each state. The code starts by creating a map of US states.

Points are added to chart using geom_point() method. We have provided a dataset created in the previous cell to this method. The mapping information provided to geom_point() instructs to use x column for X-axis, y column for Y-axis and Count column for size of points/bubbles.

Text annotation of US states abbreviations are added to map using geom_text() method. The mapping information provided to geom_text() instructs to use x2 column for X-axis, y column for Y-axis and State/Province column for each label.

At last, we have added individual layers that we created to create a final bubble map.

In [ ]:
from plotnine import geom_text

chart = ggplot(data=us_states_geo)
map_proj = geom_map(fill="white", color="lightgrey")
labels = labs(x="Longitude", y="Latitude", title="US Starbucks Stores Count Bubble Map", size="Store Count")
theme_details = theme(figure_size=(12,6.5))
xlimit = xlim(-170,-60)
ylimit = ylim(25, 72)

scatter_points = geom_point(data=us_stores_statewise.dropna(),
                            mapping=aes(x="x", y="y", size="Count"),
                            color="tomato", alpha=0.7)

texts = geom_text(data=us_stores_statewise.dropna(),
                            mapping=aes(x="x2", y="y", label="State/Province"),
                            color="black", size=8)

us_starbucks_stores = chart + map_proj + labels + theme_details + xlimit + ylimit + scatter_points + texts

us_starbucks_stores

Maps using Plotnine (Choropleth, Scatter, and Bubble Maps)



Sunny Solanki  Sunny Solanki