Share @ LinkedIn Facebook  maps, choropleth-maps, geopandas
Plotting Static Maps using geopandas (Working with Geospatial data)

Plotting Static Maps with geopandas [Working with Geospatial data]

Table of Contents

Introduction

Geopandas provides easy to use interface which lets us work with geospatial data and visualize it. It lets us create high-quality static map plots. Geopandas is built on top of matplotlib, descartes, fiona and shapely libraries. It extends pandas and maintains geospatial data as data frames. It allows us to do operations using python which would otherwise require a spatial database like PostGIS. We'll explore geopandas in this tutorial and will plot various map plots like choropleth map, scatter plots on maps, etc.

Installation

In [ ]:
!pip install geopandas

1. GeoSeries and GeoDataFrame Data Structures

Geopandas has two main data structures which are GeoSeries and GeDataFrame which are a subclass of pandas Series and DataFrame.We'll be mostly using GeoDataFrame for most of our work but will explain both in short.

GeoSeries

It's a vector where each entry represents one observation which constitutes one or more shapes. There are basically three shapes that form one observation representing country, city, list of islands, etc.

  • Points
  • Lines
  • Polygons

GeoDataFrame

It represents tabular data which consists of a list of GeoSeries. There is one column that holds geometric data containing shapes (shapely objects) of that observation. This column needs to be present to identify the dataframe as GeoDataFrame. This column can be accessed using the geometry attribute of the dataframe.

Let's load some datasets which are available with geopandas itself. It uses fiona for reading and writing geospatial data.

In [1]:
import geopandas
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import warnings

warnings.filterwarnings('ignore')

%matplotlib inline

geopandas.datasets.available
Out[1]:
['naturalearth_cities', 'naturalearth_lowres', 'nybb']

geopandas has 3 datasets available. naturalearth_lowres and nybb dataset consist of Polygon shapes whereas naturalearth_cities consist of Points shape. We'll try to load the naturalearth_lowres dataset which has information about each country’s shapes. It also holds information about the estimated country population and continent.

In [2]:
world = geopandas.read_file(geopandas.datasets.get_path("naturalearth_lowres"))
print("Geometry Column Name : ", world.geometry.name)
print("Dataset Size : ", world.shape)
world.head()
Geometry Column Name :  geometry
Dataset Size :  (177, 6)
Out[2]:
pop_est continent name iso_a3 gdp_md_est geometry
0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253 Africa W. Sahara ESH 906.5 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 35623680 North America Canada CAN 1674000.0 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 326625791 North America United States of America USA 18560000.0 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...

Below we are loading another dataset that holds information about new york and its districts.

In [3]:
nybb = geopandas.read_file(geopandas.datasets.get_path("nybb"))
nybb.head()
Out[3]:
BoroCode BoroName Shape_Leng Shape_Area geometry
0 5 Staten Island 330470.010332 1.623820e+09 MULTIPOLYGON (((970217.022 145643.332, 970227....
1 4 Queens 896344.047763 3.045213e+09 MULTIPOLYGON (((1029606.077 156073.814, 102957...
2 3 Brooklyn 741080.523166 1.937479e+09 MULTIPOLYGON (((1021176.479 151374.797, 102100...
3 1 Manhattan 359299.096471 6.364715e+08 MULTIPOLYGON (((981219.056 188655.316, 980940....
4 2 Bronx 464392.991824 1.186925e+09 MULTIPOLYGON (((1012821.806 229228.265, 101278...

2. Plotting With GeoPandas

We'll now explain plotting various map plots with GeoPandas. We'll also be using world happiness report dataset available from kaggle to include further data for analysis and plotting.

Geopandas uses matplotlib behind the scenes hence little background of matplotlib will be helpful with it as well. We'll start by just plotting the world dataframe which we loaded above to see results. We can simply call plot() on GeoDataFrame and it'll plot world countries on map. By default, it'll just print all countries with their borders. Below we are printing both world and new york maps.

In [ ]:
world.plot(figsize=(12,8));

World Map

In [ ]:
with plt.style.context(("seaborn", "ggplot")):
    nybb.plot(figsize=(12,8), color="white", edgecolor="grey");

New York Map

We'll now load our world happiness data. Please feel free to download a dataset from kaggle.

Dataset URL: World Happiness Report

Dataset also holds information about GDP per capita, social support, life expectancy, generosity, and corruption. We'll be merging this dataframe with geopandas world GeoDataFrame which we loaded above to combine data and then use combined data for plotting. Please make a note that the happiness report dataset does not have a happiness report for all countries present on earth. It has data for around 156 countries.

NOTE: Please make a note that we have manually made changes to the happiness dataset for around 10-15 countries where the country name was mismatching with name present in geopandas geodataframe. We have modified happiness report CSV to have the same country name as that of the geodataframe.

In [6]:
world_happiness = pd.read_csv("world_happiness_2019.csv")
print("Dataset Size : ",world_happiness.shape)
world_happiness.head()
Dataset Size :  (156, 9)
Out[6]:
Overall rank Country or region Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption
0 1 Finland 7.769 1.340 1.587 0.986 0.596 0.153 0.393
1 2 Denmark 7.600 1.383 1.573 0.996 0.592 0.252 0.410
2 3 Norway 7.554 1.488 1.582 1.028 0.603 0.271 0.341
3 4 Iceland 7.494 1.380 1.624 1.026 0.591 0.354 0.118
4 5 Netherlands 7.488 1.396 1.522 0.999 0.557 0.322 0.298

NOTE: Please make a note that while combining normal dataframe to geodataframe we have used geodataframe first in merge operation. The main reason for doing so is that it'll output GeoDataFrame as output else it'll output normal pandas dataframe as output without geo functionalities. Please make a note that there will be few NANs present in the dataframe because we don't have happiness data for all countries of the world.

In [7]:
world_happiness_final = world.merge(world_happiness, how="left", left_on=['name'], right_on=['Country or region'])
print("Type of DataFrame : ", type(world_happiness_final))
world_happiness_final.head()
Type of DataFrame :  <class 'geopandas.geodataframe.GeoDataFrame'>
Out[7]:
pop_est continent name iso_a3 gdp_md_est geometry Overall rank Country or region Score GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption
0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000... NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 153.0 Tanzania 3.231 0.476 0.885 0.499 0.417 0.276 0.147
2 603253 Africa W. Sahara ESH 906.5 POLYGON ((-8.66559 27.65643, -8.66512 27.58948... NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 35623680 North America Canada CAN 1674000.0 MULTIPOLYGON (((-122.84000 49.00000, -122.9742... 9.0 Canada 7.278 1.365 1.505 1.039 0.584 0.285 0.308
4 326625791 North America United States of America USA 18560000.0 MULTIPOLYGON (((-122.84000 49.00000, -120.0000... 19.0 United States of America 6.892 1.433 1.457 0.874 0.454 0.280 0.128

3. Choropleth Maps

We'll now explain a few choropleth maps using the happiness dataset which we created above by combining geopandas world geodataframe with the world happiness report dataset.

3.1 Happiness Choropleth Map

Below we are plotting our first choropleth map by simply calling the plot() method on geopandas GeoDataFrame by passing a column name as the first parameter to use for the map. We can pass other arguments like figsize, edgecolor, edgesize, etc for map. As it is a static matplotlib plot, we can call other matplotlib methods like title(), xlabel(), ylabel(), etc will work on it.

In [ ]:
world_happiness_final.plot("Score", figsize=(15,10))
plt.title("World Happiness Report");

World Happiness Choropleth Map

3.2 Generosity Choropleth Map

Below we are generating another choropleth map using the Generosity column of the dataset. Please make a note that we are using the styling of seaborn and ggplot combined for this map as well.

In [ ]:
with plt.style.context(("seaborn", "ggplot")):
    world_happiness_final.plot("Generosity",
                               figsize=(15,10),
                               edgecolor="black",)
    plt.title("World Generosity Report")

World Generosity Choropleth Map

3.3 Happiness Choropleth Map with Legend

We can also add legend and colormap to our choropleth map using the legend argument passed as True. We can also label our legend by passing arguments to the legend_kwds parameter.

In [ ]:
world_happiness_final.plot("Score",
                           figsize=(18,10),
                           legend=True,
                           legend_kwds={"label":"Happiness by Country"})
plt.title("World Happiness Report");

World Happiness Choropleth Map

Below we are adding border as well to each country in choropleth map using edge color attribute. We are also moving colormap below by passing value orientation: horizontal to legend_kwds parameter.

In [ ]:
world_happiness_final.plot("Score",
                           figsize=(18,12),
                           legend=True,
                           edgecolor="black",
                           legend_kwds={"label":"Happiness by Country", "orientation":"horizontal"})
plt.title("World Happiness Report");

World Happiness Choropleth Map

3.4 Happiness Choropleth Map with Legend & Different Colormaps

We can change the default colormap of the choropleth map by passing different values for parameter cmap. It accepts a valid value from a list of available cmap values.

In [ ]:
world_happiness_final.plot("Score",
                           figsize=(18,10),
                           legend=True,
                           legend_kwds={"label":"Happiness by Country"},
                           cmap=plt.cm.Greens,
                          )
plt.title("World Happiness Report");

World Happiness Choropleth Map

3.5 Multiple Choropleth Maps

Below we are explaining another example where we can use a matplotlib axes system and can lay down more than one map plot in one image. We are plotting the choropleth map for Happiness and GDP Per Capita columns of data.

In [ ]:
plt.figure(figsize=(30,15))
ax1 = plt.subplot(211)

world_happiness_final.plot("Score",
                           legend=True,
                           legend_kwds={"label":"Happiness by Country"},
                           cmap=plt.cm.Greens,
                           ax=ax1
                          )
plt.title("World Happiness Report");

ax2 = plt.subplot(212)
world_happiness_final.plot("GDP per capita",
                           legend=True,
                           legend_kwds={"label":"GDP per capita by Country"},
                           cmap=plt.cm.OrRd,
                           ax=ax2
                          )
plt.title("GDP per capita Report");

World Happiness and GDP Choropleth Map

3.6 Happiness Choropleth Map with boundaries

We can only highlight the boundaries of map plots without filling in any color. For this to work, we need to pass a special value none to parameter facecolor. Please make a note that we need to pass edgecolor as well which will be overridden with the color value of colormaps.

Below we are using healthy life expectancy as a column for plotting map.

In [ ]:
world_happiness_final.plot("Healthy life expectancy",
                           facecolor="none",
                           edgecolor="black",
                           figsize=(18,10),
                           legend=True,
                           legend_kwds={"label":"Healthy life expectancy by Country"},
                            cmap=plt.cm.autumn)
plt.title("World Healthy life expectancy Report");

World Healthy Life Expectancy Choropleth Map

3.7 Choropleths Map with Missing Values handled

It'll happen many times that we won't have data for all observations and we might want to highlight those observations for which we do not have values with different representations. We can use a parameter called missing_kwds passing it dictionary of a parameter on handling plotting of missing observation. Please make a note that this future is available from geopandas version 0.7 onwards only.

In [ ]:
plt.figure(figsize=(50,25))
ax1 = plt.subplot(311)
world_happiness_final.plot("Social support",
                           legend=True,
                           legend_kwds={"label":"Social support by Country"},
                           cmap=plt.cm.Greens,
                           ax=ax1,
                           missing_kwds={
                               "color": "grey",
                               "hatch":"."
                           }
                          )
plt.title("World Social support Report");

ax2 = plt.subplot(312)
world_happiness_final.plot("Freedom to make life choices",
                           legend=True,
                           legend_kwds={"label":"Freedom to make life choices by Country"},
                           cmap=plt.cm.OrRd,
                           ax=ax2,
                           missing_kwds={
                               "color":"grey",
                               "edgecolor":"black",
                               "hatch":"///",
                               "label":"Missing Values"
                           }
                          )
plt.title("Freedom to make life choices Report");

ax3 = plt.subplot(313)
world_happiness_final.plot("Perceptions of corruption",
                           legend=True,
                           legend_kwds={"label":"Perceptions of corruption by Country"},
                           cmap=plt.cm.BrBG,
                           ax=ax3,
                           missing_kwds={
                               "color":"grey",
                               "edgecolor":"black",
                               "hatch":"---",
                               "label":"Missing Values"
                           }
                          )
plt.title("Perceptions of corruption Report");

World Social Support, Freedom to make life choices and corruption Choropleth Map

4. Scatter Plots on Maps

We'll now explain scatter plots on maps with few examples. We'll use starbucks store location data available from kaggle for plotting these graphs. It has information about each Starbucks store locations as well as their address, city, country, phone number, latitude and longitude data.

In [16]:
starbucks_locations = pd.read_csv("stracbucks_store_locations.csv")
starbucks_locations.head()
Out[16]:
Brand Store Number Store Name Ownership Type Street Address City State/Province Country Postcode Phone Number Timezone Longitude Latitude
0 Starbucks 47370-257954 Meritxell, 96 Licensed Av. Meritxell, 96 Andorra la Vella 7 AD AD500 376818720 GMT+1:00 Europe/Andorra 1.53 42.51
1 Starbucks 22331-212325 Ajman Drive Thru Licensed 1 Street 69, Al Jarf Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.42
2 Starbucks 47089-256771 Dana Mall Licensed Sheikh Khalifa Bin Zayed St. Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.39
3 Starbucks 22126-218024 Twofour 54 Licensed Al Salam Street Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.38 24.48
4 Starbucks 17127-178586 Al Ain Tower Licensed Khaldiya Area, Abu Dhabi Island Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.54 24.51

Below we are plotting a simple map plot first and then using a scatter plot available in matplotlib to plot all Starbucks locations on it. Please make a note that we have longitude and latitude data for each store available in the dataset which we are utilizing in scatter plot.

In [ ]:
with plt.style.context(("seaborn", "ggplot")):
    world.plot(figsize=(18,10),
               color="white",
               edgecolor = "grey");

    plt.scatter(starbucks_locations.Longitude, starbucks_locations.Latitude, s=15, color="red", alpha=0.3)
    plt.xlabel("Longitude")
    plt.ylabel("Latitude")
    plt.title("Strabucks Store Locations On Earth");

Strabucks Store Locations On Earth Scatter Map

Below we are trying to get a count of Starbucks stores in each city of the world and then keeping only cities where more than 10 Starbucks are present per city. We are the first grouping data by the city to get average longitude and latitude. Then we generate another dataframe that has store count per city. We then combine dataframe with latitude & longitude with counts data. We only keep entries where more than 10 Starbucks is present.

In [19]:
citywise_geo_data = starbucks_locations.groupby("City").mean()[["Longitude","Latitude"]]
citywise_store_cnts = starbucks_locations.groupby("City").count()[["Store Number"]].rename(columns={"Store Number":"Count"})
citywise_store_cnts = citywise_geo_data.join(citywise_store_cnts).sort_values(by=["Count"], ascending=False)
citywise_store_cnts = citywise_store_cnts[citywise_store_cnts["Count"]>10]
citywise_store_cnts.head()
Out[19]:
Longitude Latitude Count
City
上海市 121.457232 31.211181 542
Seoul 127.011818 37.514215 243
北京市 116.422350 39.941453 234
New York -73.982759 40.753276 232
London -8.049259 50.676065 216

Below we are plotting world map by default first and then overlapping scatter plot of cities where more than 10 Starbucks are present. We are using a number of Starbucks as size to represent Starbucks counts per city.

In [ ]:
with plt.style.context(("seaborn", "ggplot")):
    world.plot(figsize=(18,10),
               color="white",
               edgecolor = "grey");

    plt.scatter(citywise_store_cnts.Longitude, citywise_store_cnts.Latitude, s=citywise_store_cnts.Count, color="green", alpha=0.5)
    plt.xlabel("Longitude")
    plt.ylabel("Latitude")
    plt.title("Strabucks Store Locations On Earth");

Strabucks Store Locations On Earth Scatter Map

We'll be publishing another tutorial on working with geopandas where we'll be explaining how to use geopandas with other data file types like shape files and geojson files.

References



Sunny Solanki  Sunny Solanki