Share @ LinkedIn Facebook  cufflinsk, maps, plotly, pandas
Choropleth Maps & Scatter Maps using cufflinks

The cufflinks library provides a wrapper around pandas so that we can create an interactive plotly chart directly from it by calling iplot() or figure() method on the dataframe. The iplot() API is almost the same as that of plot() API which generates charts based on matplotlib. We have already covered a tutorial explaining how to generate various charts using cufflinks in a separate tutorial. We recommend that you go through that tutorial if you do not have a background on cufflinks.

As a part of this tutorial, we'll be using the same API to generate scatter maps and choropleth maps. We'll be generating scatter and choropleth maps using one line of code from the pandas dataframe.

We'll start by loading the necessary libraries.

In [1]:
import pandas as pd
import numpy as np

import cufflinks as cf

print("List of Cufflinks Themes : ", cf.getThemes())

cf.set_config_file(theme='ggplot',sharing='public',offline=True)
List of Cufflinks Themes :  ['ggplot', 'pearl', 'solar', 'space', 'white', 'polar', 'henanigans']

Load Datasets

We'll be using below mentioned 2 datasets for plotting various maps. Both datasets are easily available from kaggle. We suggest that you download both datasets to follow along with the tutorial.

  • World Happiness Report Dataset - It has information about attributes like happiness score, GDP per capita, social support, healthy life expectancy, generosity, corruption, and freedom to make life choices for each country of the world.

  • Starbucks Store Locations Dataset - It has information about Starbucks store locations worldwide. It has information about each store's name, address, city, state, country, latitude, and longitude.

We have loaded both datasets as pandas dataframe.

In [2]:
starbucks_stores = pd.read_csv("datasets/starbucks_store_locations.csv")

starbucks_stores.head()
Out[2]:
Brand Store Number Store Name Ownership Type Street Address City State/Province Country Postcode Phone Number Timezone Longitude Latitude
0 Starbucks 47370-257954 Meritxell, 96 Licensed Av. Meritxell, 96 Andorra la Vella 7 AD AD500 376818720 GMT+1:00 Europe/Andorra 1.53 42.51
1 Starbucks 22331-212325 Ajman Drive Thru Licensed 1 Street 69, Al Jarf Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.42
2 Starbucks 47089-256771 Dana Mall Licensed Sheikh Khalifa Bin Zayed St. Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.39
3 Starbucks 22126-218024 Twofour 54 Licensed Al Salam Street Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.38 24.48
4 Starbucks 17127-178586 Al Ain Tower Licensed Khaldiya Area, Abu Dhabi Island Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.54 24.51
In [3]:
world_happiness = pd.read_csv("datasets/world_happiness_2019.csv")
world_happiness.head()
Out[3]:
Overall rank Country or region Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption
0 1 Finland 7.769 1.340 1.587 0.986 0.596 0.153 0.393
1 2 Denmark 7.600 1.383 1.573 0.996 0.592 0.252 0.410
2 3 Norway 7.554 1.488 1.582 1.028 0.603 0.271 0.341
3 4 Iceland 7.494 1.380 1.624 1.026 0.591 0.354 0.118
4 5 Netherlands 7.488 1.396 1.522 0.999 0.557 0.322 0.298

Scatter Maps

We can plot a scatter map from the pandas dataframe by calling the figure() method on it and passing the kind parameter value as scattergeo. We also need to pass latitude and longitude column names to lat and lon parameters of the figure() method. We have also passed the Store Name column to the text parameter so that when a mouse hovers over any point in the chart, the name of that store will be displayed in a tooltip.

Below we have plotted a scatter chart of Starbucks store locations worldwide. We can clearly see a high amount of store concentration in the US, Europe, and China.

In [ ]:
starbucks_stores.figure(kind="scattergeo",
                        size=0.05,
                        margin=(0,0,0,0),
                        colors=["tomato"],
                        lat="Latitude", lon="Longitude", text="Store Name")

Choropleth Maps & Scatter Maps using cufflinks

Below we have created another scatter chart exactly the same way as the previous step. We have plotted scatter chart for stores only located in the US. We have added one more parameter which is projection. We need to override the default projection which plots points on the world map to the USA map. We have set albers usa as a projection in order to highlight only the US map.

We can see a high concentration of Starbucks stores in the east and west coast of the US.

In [ ]:
us_stores = starbucks_stores[starbucks_stores.Country=="US"]

us_stores.figure(kind="scattergeo",
                        size=0.05,
                        margin=(0,0,0,0),
                        colors="tomato",
                        projection={"type":"albers usa"},
                        lat="Latitude", lon="Longitude", text="Store Name")

Choropleth Maps & Scatter Maps using cufflinks

Choropleth Maps

The second chart type that we'll introduce is choropleth maps. We'll be using the world happiness dataframe for plotting happiness score, population, and GDP per capita as choropleth maps. The choropleth maps in plotly require country or state names as ISO codes instead of the full name. Our original happiness dataset has a full country name instead of ISO codes for the country. We'll hence use geopandas dataframe to get ISO codes for the country from country name.

We have below loaded geopandas library and data frame which has information about each country of the world as well as their ISO codes.

In [6]:
import geopandas as gpd

gpd.datasets.available
Out[6]:
['naturalearth_cities', 'naturalearth_lowres', 'nybb']
In [7]:
world_geo_df = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world_geo_df.head()
Out[7]:
pop_est continent name iso_a3 gdp_md_est geometry
0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253 Africa W. Sahara ESH 906.5 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 35623680 North America Canada CAN 1674000.0 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 326625791 North America United States of America USA 18560000.0 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

We are merging the geopandas dataframe with the world happiness dataframe so that the final dataframe will have ISO codes for each country present in it.

In [8]:
world_geo_df = world_geo_df.merge(world_happiness, how="left", left_on="name", right_on="Country or region")
world_geo_df.head()
Out[8]:
pop_est continent name iso_a3 gdp_md_est geometry Overall rank Country or region Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption
0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000... NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 153.0 Tanzania 3.231 0.476 0.885 0.499 0.417 0.276 0.147
2 603253 Africa W. Sahara ESH 906.5 POLYGON ((-8.66559 27.65643, -8.66512 27.58948... NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 35623680 North America Canada CAN 1674000.0 MULTIPOLYGON (((-122.84000 49.00000, -122.9742... 9.0 Canada 7.278 1.365 1.505 1.039 0.584 0.285 0.308
4 326625791 North America United States of America USA 18560000.0 MULTIPOLYGON (((-122.84000 49.00000, -120.0000... 19.0 United States of America 6.892 1.433 1.457 0.874 0.454 0.280 0.128

We can easily create a choropleth map from the world dataframe by calling iplot() method on it and passing the kind parameter as choropleth. Apart from chart kind, we also need to pass two other important parameters which are locations and z. The locations parameter will be used to map ISO codes in the choropleth map and the z parameter will be used to map the value for that code. We have used the iso_a3 column as a locations column because it has ISO codes for each country and pop_est as z parameter as it has population data for each country.

We have first created a choropleth map of the world population. We have used Reds as the color palette of the map. We can see from chart high concentration of the population in China and India.

In [ ]:
world_geo_df.iplot(kind="choropleth",
                   locations="iso_a3", z="pop_est",
                   colorscale="Reds",
                   margin=(0,0,0,0), title="World Population Choropleth Map")

Choropleth Maps & Scatter Maps using cufflinks

Below we have created another choropleth map which is created with exactly the same code as the previous chart with only difference in the column used for the z parameter and color palette. We have plotted a choropleth map showing happiness for each country of the world.

In [ ]:
world_geo_df.iplot(kind="choropleth",
                   locations="iso_a3", z="Score",
                   colorscale="PiYG",
                   margin=(0,0,0,0), title="World Happiness Choropleth Map")

Choropleth Maps & Scatter Maps using cufflinks

The third choropleth map that we have created is the same way as the previous two choropleth maps. We have created a choropleth map of GDP per capita for each country of the world.

In [ ]:
world_geo_df.iplot(kind="choropleth",
                   locations="iso_a3", z="GDP per capita",
                   colorscale="RdBu",
                   margin=(0,0,0,0), title="World GDP Per Capita Choropleth Map")

Choropleth Maps & Scatter Maps using cufflinks

The fourth choropleth map that we'll be creating will show the distribution of Starbucks stores per each US state. We have hence created a new dataframe below which has information about the count of Starbucks store per each state of the US.

In [12]:
us_stores = starbucks_stores[starbucks_stores.Country == "US"]
us_stores = us_stores.groupby(by=['State/Province']).count()[["Store Name"]].rename(columns={"Store Name":"Count"}).reset_index()
us_stores.head()
Out[12]:
State/Province Count
0 AK 49
1 AL 85
2 AR 55
3 AZ 488
4 CA 2821

We can easily create a choropleth map from the us_stores dataframe by calling iplot() method on it. We have used State/Province column as locations column and Count column as z column. We also have introduced two more parameters (locationmode and projection) which are needed in the case of the USA. These two parameters will help us show only the US map. If we don't provide these parameters then it'll show the whole world map which is not needed as we only need to see the US map.

In [ ]:
us_stores.iplot(kind="choropleth",
                       locations="State/Province", z="Count",
                       colorscale="YlOrRd",
                       margin=(0,0,0,0), locationmode="USA-states",
                       projection={"type":"albers usa"},
                       title="Starbucks Stores Count Per US State", )

Choropleth Maps & Scatter Maps using cufflinks



Sunny Solanki  Sunny Solanki