Updated On : Oct-08,2021 Tags geoplot, scatter-maps, bubble-maps
Geoplot - Scatter & Bubble Maps [Python]

Geoplot - Scatter & Bubble Maps [Python]

Geoplot is a very easy-to-use python library that lets us create maps with just one method calls generally. It let us create different kinds of maps like choropleth maps, scatter maps, bubble maps, cartogram, KDE on maps, connection/Sankey maps, etc. Geoplot is built on the top of cartopy and geopandas to make working with maps easier. We have already covered on tutorial on geoplot where we have explained how we can create choropleth maps using geoplot. Please feel free to check it if you are interested in choropleth maps.

As a part of this tutorial, we'll cover scatter and bubble maps creation using geoplot. We'll be trying to explain the API with simple and easy-to-use examples.

We'll start by importing the necessary libraries. We have also imported geopandas as we'll be using geospatial datasets available from it by merging them with other datasets.

In [1]:
import geoplot

print("Geoplot Version : {}".format(geoplot.__version__))

import matplotlib.pyplot as plt
Geoplot Version : 0.4.4
In [2]:
import geopandas as gpd

import pandas as pd

print("Geopandas Version : {}".format(gpd.__version__))

print("Available Datasets : {}".format(gpd.datasets.available))
Geopandas Version : 0.9.0
Available Datasets : ['naturalearth_cities', 'naturalearth_lowres', 'nybb']

Below we have loaded the world geometries dataset available from geopandas. It has a column named geometry which holds instances of polygons or multi-polygons representing individual countries of the world.

In [3]:
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
#world = gpd.read_file(geoplot.datasets.get_path("world"))

world.head()
Out[3]:
pop_est continent name iso_a3 gdp_md_est geometry
0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253 Africa W. Sahara ESH 906.5 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 35623680 North America Canada CAN 1674000.0 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 326625791 North America United States of America USA 18560000.0 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

Below we have loaded another dataset that has location information for Starbucks stores worldwide. The dataset can be downloaded from the below link.

The dataset has geospatial information for each store in the form of longitude and latitude. Geoplot requires that we represent geospatial information as shapely objects and through geometry column in the dataset. To satisfy this requirement, we have created Point shapely objects by looping through the longitudes and latitudes of each store. We have then stored Point objects in geometry column of the dataset. Each Point object represents one location on the map.

Please make a NOTE that we have also converted normal pandas dataframe to GeoDataFrame.

In [5]:
from shapely.geometry import Point

starbucks = pd.read_csv("datasets/starbucks_store_locations.csv")

points = []
for lon, lat in zip(starbucks["Longitude"], starbucks["Latitude"]):
    points.append(Point(lon, lat))

starbucks["geometry"] = points

starbucks = gpd.GeoDataFrame(starbucks)

starbucks.head()
Out[5]:
Brand Store Number Store Name Ownership Type Street Address City State/Province Country Postcode Phone Number Timezone Longitude Latitude geometry
0 Starbucks 47370-257954 Meritxell, 96 Licensed Av. Meritxell, 96 Andorra la Vella 7 AD AD500 376818720 GMT+1:00 Europe/Andorra 1.53 42.51 POINT (1.53000 42.51000)
1 Starbucks 22331-212325 Ajman Drive Thru Licensed 1 Street 69, Al Jarf Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.42 POINT (55.47000 25.42000)
2 Starbucks 47089-256771 Dana Mall Licensed Sheikh Khalifa Bin Zayed St. Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.39 POINT (55.47000 25.39000)
3 Starbucks 22126-218024 Twofour 54 Licensed Al Salam Street Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.38 24.48 POINT (54.38000 24.48000)
4 Starbucks 17127-178586 Al Ain Tower Licensed Khaldiya Area, Abu Dhabi Island Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.54 24.51 POINT (54.54000 24.51000)

Starbucks Store Locations Worldwide Scatter Map

Our first scatter map represents Starbucks store locations on the world map. We have first created a world map using polyplot() method and then plotted points on it showing locations using pointplot() method of geoplot. The definition of pointplot() is below for reference.


  • pointplot(df,projection=None,hue=None,cmap=None,norm=None,scheme=None,scale=None,limits=(1,5),legend=False,legend_var=None,legend_values=None,legend_labels=None,legend_kwargs=None,figsize=(8,6), extent=None,ax=None, **kwargs) - This method takes as input GeoDataFrame which has geometry column holding locations on map. It plots point at those locations on map.
    • The projection parameter takes as input instance of any projection available from geoplot.crs module. It can be used to change the projection of the map.
    • The hue parameter takes as input string column name from GeoDataFrame or geo series, iterable specifying values to be used to color points of the map. We can give a column name from GeoDataFrame which has continuous data or categorical data. It'll create a color bar for a continuous column and a categorical legend for the categorical column.
    • The cmap parameter accepts matplotlib colormap name.
    • The scheme parameter accepts instances of mapclassify specifying scheme to group geo data. We can also give the string to this parameter specifying the name of any class from mapclassify. It'll create an instance of that class with default parameters. Below are some of the classification schemes for choropleth maps.
      • 'BoxPlot', 'EqualInterval', 'FisherJenks', 'FisherJenksSampled', 'HeadTailBreaks', 'JenksCaspall', 'JenksCaspallForced', 'JenksCaspallSampled', 'MaxP', 'MaximumBreaks', 'NaturalBreaks', 'Quantiles', 'Percentiles', 'StdMean'
    • The scale parameter accepts string column name from data frame or iterable which will be used to decide the size of points on a map.
    • The limits parameter accepts tuple of (min, max) specifying minimum and maximum sizes of bubbles on the map. The value of scale will be mapped to bubble size in the range specified by this parameter.
    • The legend parameter accepts boolean value specifying whether to include a legend or not.
    • The legend_var accepts one of the below strings as input specifying which variable to use for legends.
      • 'hue' - It creates legend based on hue parameter value.
      • 'scale' - It creates legend based on scale parameter value.
    • The legend_kwargs parameter accepts dictionary specifying options to modify the legend of the map.
    • The legend_labels accepts list of string values for categorical legends. We can group map values using scheme parameter into groups and using this parameter we can give labels for each group that will be included in the map.
    • The legend_values parameter accepts a list of values to be used for categorical legends.
    • The extent parameter takes as input tuple of 4 values (min_longitude, min_latitude, max_longitude, max_latitude) specifying bounding box of map.
    • The figsize parameter accepts tuple specifying figure size.
    • The ax parameter accepts matplotlix Axes object.
    • All other extra parameters provided will be passed to point objects which are present in geometry column. This can be fill color of points, edge color, edge width, etc.

We have first created a world map using polyplot() method and we have stored Axes object returned by it in a variable. We have then plotted points on a world map using pointplot() method using the Starbucks dataset we had created earlier. We have given Axes returned by polyplot() method to pointplot() so that it creates points on the same map. We have given parameters like color, edgecolor and alpha which will be passed to Point object.

In [ ]:
world_map_axes = geoplot.polyplot(world, facecolor="lightgrey", alpha=0.5, figsize=(15, 10));

geoplot.pointplot(starbucks,
                  ax=world_map_axes,
                  color="tomato",
                  edgecolor="black",
                  alpha=0.5
                 );

Geoplot - Scatter & Bubble Maps

Starbucks Store Locations Across US Scatter Map

In this section, we have created another scatter map that shows the locations of Starbucks stores on the US map. We have first loaded the geo json dataset which has polygons for each state of the US creating a whole US map. We have loaded the dataset using geopandas. The dataset can be downloaded from the below link.

In [11]:
us_states = gpd.read_file("datasets/us-states.json")

us_states.head()
Out[11]:
id name geometry
0 AL Alabama POLYGON ((-87.35930 35.00118, -85.60667 34.984...
1 AK Alaska MULTIPOLYGON (((-131.60202 55.11798, -131.5691...
2 AZ Arizona POLYGON ((-109.04250 37.00026, -109.04798 31.3...
3 AR Arkansas POLYGON ((-94.47384 36.50186, -90.15254 36.496...
4 CA California POLYGON ((-123.23326 42.00619, -122.37885 42.0...

Below we have first drawn the US map using polyplot() method and stored Axes reference returned by it in a variable. We have then plotted points on a map using pointplot() method. We have filtered our Starbucks GeoDataFrame to keep only entries of US and give it to pointplot(). We have given Axes reference returned by polyplot() as well to a method.

In [ ]:
us_map_axes = geoplot.polyplot(us_states, facecolor="dodgerblue", alpha=0.1, figsize=(15, 15));

geoplot.pointplot(starbucks[starbucks["Country"] == "US"],
                  ax=us_map_axes,
                  color="dodgerblue",
                  edgecolor="black",
                  alpha=0.1
                 );

Geoplot - Scatter & Bubble Maps

US States Population Bubble Map

In this section, we have plotted a bubble map showing the US state’s population. In order to plot bubbles on the US map for each state, we need the center of each state where the bubble will be plotted showing the population of that state. The size of the bubble will be based on the population of that state.

We have created a simple method named calculate_center() which takes as input GeoDataFrame, calculates the center of each polygon/multi-polygon geometry object, and returns them. We have added a center for each state in our dataframe using this method.

In [13]:
def calculate_center(df):
    """
    Calculate the centre of a geometry

    This method first converts to a planar crs, gets the centroid
    then converts back to the original crs. This gives a more
    accurate
    """
    original_crs = df.crs
    planar_crs = 'EPSG:3857'
    return df['geometry'].to_crs(planar_crs).centroid.to_crs(original_crs)

us_states["center"] = calculate_center(us_states)
us_states["Longitude"] = [val.x for val in us_states.center]
us_states["Latitude"] = [val.y for val in us_states.center]

us_states.head()
Out[13]:
id name geometry center Longitude Latitude
0 AL Alabama POLYGON ((-87.35930 35.00118, -85.60667 34.984... POINT (-86.82705 32.81439) -86.827048 32.814386
1 AK Alaska MULTIPOLYGON (((-131.60202 55.11798, -131.5691... POINT (-152.52500 65.00297) -152.525004 65.002968
2 AZ Arizona POLYGON ((-109.04250 37.00026, -109.04798 31.3... POINT (-111.66516 34.33632) -111.665157 34.336315
3 AR Arkansas POLYGON ((-94.47384 36.50186, -90.15254 36.496... POINT (-92.43914 34.91573) -92.439137 34.915733
4 CA California POLYGON ((-123.23326 42.00619, -122.37885 42.0... POINT (-119.68388 37.38770) -119.683878 37.387697

Below we have loaded the dataset which has information about the population of each US state. The dataset can be downloaded from the below link.

In [14]:
us_pop = pd.read_csv("datasets/State Populations.csv")

us_pop.head()
Out[14]:
State 2018 Population
0 California 39776830
1 Texas 28704330
2 Florida 21312211
3 New York 19862512
4 Pennsylvania 12823989

Now we have merged the US state’s geo dataset with the US state’s population dataset. We have removed geometry column which had polygons/multi-polygons as geometries. We have also renamed center column as new geometry column. We'll be using this modified dataframe to create bubbles on the US map.

In [15]:
us_states_pop = us_states.merge(us_pop, left_on="name", right_on="State").drop(columns=["geometry"]).rename(columns={"center": "geometry"})

us_states_pop.head()
Out[15]:
id name geometry Longitude Latitude State 2018 Population
0 AL Alabama POINT (-86.82705 32.81439) -86.827048 32.814386 Alabama 4888949
1 AK Alaska POINT (-152.52500 65.00297) -152.525004 65.002968 Alaska 738068
2 AZ Arizona POINT (-111.66516 34.33632) -111.665157 34.336315 Arizona 7123898
3 AR Arkansas POINT (-92.43914 34.91573) -92.439137 34.915733 Arkansas 3020327
4 CA California POINT (-119.68388 37.38770) -119.683878 37.387697 California 39776830

Below we have first created a US map using polyplot() method giving it original US states GeoDataFrame which we had loaded earlier. We have stored Axes reference returned by it in a variable.

We have then created bubbles on the map using the merged geo dataframe which we created in the previous cell. We have instructed pointplot() method to use column 2018 Population for hue and scale both. The values of this column will decide the size of points on the map. We have given Axes reference generated by polyplot() to it so that bubbles are drawn on the same map. We have instructed the method to use the value of scale parameter to decide legend values.

We have asked the method to use Quantiles scheme for clustering column data into 5 clusters. We can notice that from the legend that it has 5 entries. The limit parameter is set to (5,50) informing to create bubbles in this size range.

In [ ]:
us_map_axes = geoplot.polyplot(us_states, facecolor="white", figsize=(15, 10));

geoplot.pointplot(us_states_pop,
                  hue="2018 Population",
                  scale="2018 Population",
                  scheme="Quantiles", ## mapclassify.Quantiles(us_states_pop["2018 Population"], k=5) will give same results.
                  ax=us_map_axes,
                  edgecolor="black",
                  legend=True,
                  legend_var="scale",
                  legend_kwargs={"loc":"best",
                                  "fontsize": "large",
                                  "title":"Population",
                                  "title_fontsize":"large"},
                  limits=(5, 50),

                 );

plt.title("US State's 2018 Population", fontdict={"fontsize": 15}, pad=15);

Geoplot - Scatter & Bubble Maps

GDP of Countries Bubble Map

Our second bubble map shows the GDP of countries on the world map. We have the first loaded dataset which has information about happiness scores and few other important parameters for each country of the world. The dataset can be downloaded from the below link.

We have then merged the world geo dataset with the happiness dataset to create a merged geo dataset. We have then found out centers for each country using calculate_center() method. We have then removed original geometry columns which have polygons/multi-polygons. The center column has been renamed as a new geometry column.

In [28]:
happiness = pd.read_csv("datasets/world_happiness_2019.csv")

world_happiness = world.merge(happiness, left_on="name", right_on="Country or region")

world_happiness["center"] = calculate_center(world_happiness)

world_happiness = world_happiness.drop(columns=["geometry"]).rename(columns={"center": "geometry"})

world_happiness.head()
Out[28]:
pop_est continent name iso_a3 gdp_md_est Overall rank Country or region Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption geometry
0 53950935 Africa Tanzania TZA 150600.0 153 Tanzania 3.231 0.476 0.885 0.499 0.417 0.276 0.147 POINT (34.75848 -6.27836)
1 35623680 North America Canada CAN 1674000.0 9 Canada 7.278 1.365 1.505 1.039 0.584 0.285 0.308 POINT (-96.99822 67.99064)
2 326625791 North America United States of America USA 18560000.0 19 United States of America 6.892 1.433 1.457 0.874 0.454 0.280 0.128 POINT (-119.45018 51.26001)
3 18556698 Asia Kazakhstan KAZ 460700.0 60 Kazakhstan 5.809 1.173 1.508 0.729 0.410 0.146 0.096 POINT (67.31752 48.46973)
4 29748859 Asia Uzbekistan UZB 202300.0 41 Uzbekistan 6.174 0.745 1.529 0.756 0.631 0.322 0.240 POINT (63.12119 41.82781)

Below we have first created a world map using polyplot() method and world geo dataset. We have Axes reference object in a variable.

We have then created bubbles showing the GDP of each country on a world map using pointplot() method. We have given our merged world geo dataset from the previous cell. We have asked to use gdp_md_est column as column for hue and scale. The legend is decided based on the value of scale parameter.

In [ ]:
world_map_axes = geoplot.polyplot(world, facecolor="white", figsize=(15, 10));

geoplot.pointplot(world_happiness,
                  hue="gdp_md_est",
                  scale="gdp_md_est",
                  cmap="RdBu",
                  ax=world_map_axes,
                  edgecolor="black",
                  legend=True,
                  legend_var="scale",
                  legend_kwargs={"loc":"best",
                                  "fontsize": "large",
                                  "title":"GDP",
                                  "title_fontsize":"large"},
                  limits=(5, 50),

                 );

plt.title("GDP of Countries", fontdict={"fontsize": 15}, pad=15);

Geoplot - Scatter & Bubble Maps

Starbucks US Store per State Bubble Map

In this section, we'll be creating a bubble map showing the count of Starbucks stores for each US state.

We have first created an intermediate dataframe where we have filtered entries of only US Starbucks. We have then created a new dataframe that has a count of stores per US state by using the grouping functionality of pandas.

We have then merged this intermediate dataset with the geo dataset (the US states population bubble map section) which has a center of each US state. The final geo dataframe has a center for each state and Starbucks store count for each as well. We'll be using this dataset for creating our bubble map.

In [15]:
starbucks_us = starbucks[starbucks["Country"] == "US"]

starbucks_us_state_cnt = starbucks_us.groupby("State/Province").count()[["Store Number"]].reset_index().rename(columns={"Store Number":"Store Count"})

starbucks_us_state_cnt.head()
Out[15]:
State/Province Store Count
0 AK 49
1 AL 85
2 AR 55
3 AZ 488
4 CA 2821
In [16]:
starbucks_us_statewise = us_states_pop.merge(starbucks_us_state_cnt, left_on="id", right_on="State/Province")

starbucks_us_statewise.head()
Out[16]:
id name geometry Longitude Latitude State 2018 Population State/Province Store Count
0 AL Alabama POINT (-86.82705 32.81439) -86.827048 32.814386 Alabama 4888949 AL 85
1 AK Alaska POINT (-152.52500 65.00297) -152.525004 65.002968 Alaska 738068 AK 49
2 AZ Arizona POINT (-111.66516 34.33632) -111.665157 34.336315 Arizona 7123898 AZ 488
3 AR Arkansas POINT (-92.43914 34.91573) -92.439137 34.915733 Arkansas 3020327 AR 55
4 CA California POINT (-119.68388 37.38770) -119.683878 37.387697 California 39776830 CA 2821

Below we have first created a US map using polyplot() method. We have stored Axes reference returned by a method in a variable which we'll pass on to pointplot() method.

We have then added bubbles on US map using pointplot() method. We have asked it to use Store Count column for hue and scale parameters. We have added a title to the chart as well.

In [ ]:
us_map_axes = geoplot.polyplot(us_states, facecolor="white", figsize=(15, 15));

geoplot.pointplot(starbucks_us_statewise,
                  hue="Store Count",
                  scale="Store Count",
                  cmap="BrBG",
                  ax=us_map_axes,
                  edgecolor="black",
                  legend=True,
                  legend_var="scale",
                  limits=(5, 50),

                 );

plt.title("Starbucks US Stores Count Statewise", fontdict={"fontsize": 15}, pad=15);

Geoplot - Scatter & Bubble Maps

This ends our small tutorial explaining how we can use geoplot to create scatter and bubble maps in Python. Please feel free to let us know your views in the comments section. Please feel free to go through reference section tutorials if you want to learn about other libraries in python which can be used to plot maps.

References



Sunny Solanki  Sunny Solanki