Share @ LinkedIn Facebook  data-visualizaton, altair
Altair - Basic Interactive Plotting in Python

Altair - Basic Interactive Plotting in Python

Table of Contents

Introduction

Altair is a Python visualization library based on vega and vega-lite. The vega and vega lite are declarative programming languages where you specify properties of the graph as JSON and it plots graph based on that using Canvas or SVG. As Altair is built on top of these libraries, it provides almost the same functionalities as them in python. Altair's API is simple and easy to use which lets the developer spend more time on data analysis than getting visualizations right. We'll be explaining basic plotting using Altair as a part of this tutorial.

We'll first import all the necessary libraries to get started.

In [1]:
import altair as alt

import pandas as pd
import numpy as np

from sklearn.datasets import load_wine, load_boston

Load Datasets

We'll be using 3 datasets while explaining how to plot various charts using Altair.

We suggest that you download the apple ohlc dataset from yahoo finance and Starbucks store locations dataset from kaggle to continue with the tutorial.

Wine Dataset

In [2]:
wine = load_wine()

wine_df = pd.DataFrame(wine.data, columns=wine.feature_names)

wine_df["Category"] = ["Category_%d"%(cat+1) for cat in wine.target]

wine_df.head()
Out[2]:
alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline Category
0 14.23 1.71 2.43 15.6 127.0 2.80 3.06 0.28 2.29 5.64 1.04 3.92 1065.0 Category_1
1 13.20 1.78 2.14 11.2 100.0 2.65 2.76 0.26 1.28 4.38 1.05 3.40 1050.0 Category_1
2 13.16 2.36 2.67 18.6 101.0 2.80 3.24 0.30 2.81 5.68 1.03 3.17 1185.0 Category_1
3 14.37 1.95 2.50 16.8 113.0 3.85 3.49 0.24 2.18 7.80 0.86 3.45 1480.0 Category_1
4 13.24 2.59 2.87 21.0 118.0 2.80 2.69 0.39 1.82 4.32 1.04 2.93 735.0 Category_1

Apple OHLC Dataset

In [3]:
apple_df = pd.read_csv("datasets/AAPL.csv")
apple_df["Date"] = pd.to_datetime(apple_df["Date"])
apple_df = apple_df.set_index("Date")
apple_df.head()
Out[3]:
Open High Low Close Adj Close Volume
Date
2019-04-05 196.449997 197.100006 195.929993 197.000000 194.454758 18526600
2019-04-08 196.419998 200.229996 196.339996 200.100006 197.514709 25881700
2019-04-09 200.320007 202.850006 199.229996 199.500000 196.922470 35768200
2019-04-10 198.679993 200.740005 198.179993 200.619995 198.027985 21695300
2019-04-11 200.850006 201.000000 198.440002 198.949997 196.379578 20900800

Starbucks Store Locations Dataset

In [4]:
starbucks_locations = pd.read_csv("datasets/starbucks_store_locations.csv")
starbucks_locations.head()
Out[4]:
Brand Store Number Store Name Ownership Type Street Address City State/Province Country Postcode Phone Number Timezone Longitude Latitude
0 Starbucks 47370-257954 Meritxell, 96 Licensed Av. Meritxell, 96 Andorra la Vella 7 AD AD500 376818720 GMT+1:00 Europe/Andorra 1.53 42.51
1 Starbucks 22331-212325 Ajman Drive Thru Licensed 1 Street 69, Al Jarf Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.42
2 Starbucks 47089-256771 Dana Mall Licensed Sheikh Khalifa Bin Zayed St. Ajman AJ AE NaN NaN GMT+04:00 Asia/Dubai 55.47 25.39
3 Starbucks 22126-218024 Twofour 54 Licensed Al Salam Street Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.38 24.48
4 Starbucks 17127-178586 Al Ain Tower Licensed Khaldiya Area, Abu Dhabi Island Abu Dhabi AZ AE NaN NaN GMT+04:00 Asia/Dubai 54.54 24.51

Common Steps to Generate Charts using Altair

The generation of charts using Altair is a list of steps that are described below. These steps are commonly used to generate a chart using Altair.

  1. Create a Chart object passing dataframe to it.
  2. Call marker type (mark_point(), mark_bar(), etc) on chart object to select chart type that will be plotted.
  3. Call encode() method on output from 2nd step passing it various plot properties. As a part of this step, we provide details as to which column of the dataset will be used for what purpose. E.g. x='alcohol' will set alcohol values on x-axis, etc.

1. Scatter Plot

The first chart type that we'll plot using Altair is a scatter plot. We are plotting below the scatter plot showing the relation between alcohol and malic_acid properties of the wine dataset. This is the simplest way to generate a plot using Altair.

In [ ]:
alt.Chart(wine_df).mark_point().encode(x="alcohol", y="malic_acid")

Altair - Basic Interactive Plotting in Python

Below we are again plotting scatter chart between alcohol and malic_acid, but this time we have color-encoded points by category of wine as well.

This time we have created X and Y axes by creating X and Y axes object using Altair which lets us modify properties of x and y axes. We have modified the default names of X and Y axes. The Altair plots generally start x and y axes at 0 and we can modify it as explained below using Scale() setting not to start from zero. We also have introduced tooltip property which accepts a list of columns from the dataset whose value will be displayed when the mouse hovers over a particular point of scatter plot.

We have also used properties() method available with Altair which lets us modify plot size (height & width) and title.

We have also called the interactive() method at last which will convert static plot into an interactive one.

In [ ]:
alt.Chart(wine_df).mark_circle(
    size=100
).encode(
    alt.X("alcohol", title="Alcohol", scale=alt.Scale(zero=False)),
    alt.Y("malic_acid", title="Malic Acid", scale=alt.Scale(zero=False)),
    color="Category",
    tooltip=["alcohol", "malic_acid"]
).properties(
    height=300,
    width=300,
    title="Alcohol vs Malic Acid Color-encoded by Wine Category").interactive()

Altair - Basic Interactive Plotting in Python

We have generated another scatter plot which is almost the same as last time but we have used different markers to show different categories of wine. We have used the shape attribute for this purpose which accepts the dataframe column name with categorical data.

In [ ]:
alt.Chart(wine_df).mark_point(
    size=50
).encode(
    alt.X("alcohol", title="Alcohol", scale=alt.Scale(zero=False)),
    alt.Y("malic_acid", title="Malic Acid", scale=alt.Scale(zero=False)),
    color="Category",
    shape="Category",
    tooltip=["alcohol", "malic_acid"]
).properties(
    height=300,
    width=300,
    title="Alcohol vs Malic Acid Color-encoded by Wine Category").interactive()

Altair - Basic Interactive Plotting in Python

2. Bar Chart

The third type of chart that we'll introduce is a bar chart using Altair.

We are first creating dataframe with an average of each wine dataframe column according to wine categories as it'll be used by many successive charts for plotting.

In [8]:
avg_wine_df = wine_df.groupby(by="Category").mean().reset_index()
avg_wine_df
Out[8]:
Category alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline
0 Category_1 13.744746 2.010678 2.455593 17.037288 106.338983 2.840169 2.982373 0.290000 1.899322 5.528305 1.062034 3.157797 1115.711864
1 Category_2 12.278732 1.932676 2.244789 20.238028 94.549296 2.258873 2.080845 0.363662 1.630282 3.086620 1.056282 2.785352 519.507042
2 Category_3 13.153750 3.333750 2.437083 21.416667 99.312500 1.678750 0.781458 0.447500 1.153542 7.396250 0.682708 1.683542 629.895833

Below we have created our first bar chart using the mark_bar() method encoding x-axis as wine category and y-axis as average malic acid. We have also set chart width, height, and title as usual.

In [ ]:
alt.Chart(avg_wine_df).mark_bar(
    color='tomato'
).encode(
    x = 'Category', y = 'malic_acid'
).properties(
    width=300, height=300,
    title="Avg Malic Acid per Wine Category"
)

Altair - Basic Interactive Plotting in Python

Below we have created another bar chart which shows the average proline per wine category. We have also changed X and Y-axis in this case to make a bar chart horizontal.

In [ ]:
alt.Chart(avg_wine_df).mark_bar(
    color='dodgerblue'
).encode(
    x = 'proline', y = 'Category'
).properties(
    width=300, height=300,
    title="Avg Proline per Wine Category"
)

Altair - Basic Interactive Plotting in Python

3. Histogram

The third chart type that we'll be introducing is the histogram. We have used the mark_bar() method which is used to print bar charts. We have passed the x-axis column as proline along with bin attribute as True to inform Altair that we need to bin values of this column. We have also passed the y-axis value as count() which will be used to count values of proline and then bin them.

In [ ]:
alt.Chart(wine_df).mark_bar(
    color='lawngreen'
).encode(
    x =alt.X('proline', bin=True, title="Proline"),
    y="count()"
).properties(
    width=300,
    height=300,
    title="Proline Histogram")

Altair - Basic Interactive Plotting in Python

4. Line Chart

The fourth chart type that we would like to introduce is a line chart.

We are using mark_line() to plot a line chart showing the close price of Apple stock from April-2019 to March-2020.

In [ ]:
alt.Chart(apple_df.reset_index()).mark_line(
    color='red'
).encode(
    x = 'Date:T', y = alt.X('Close:Q', scale=alt.Scale(zero=False))
).properties(
    width=500,
    height=300,
    title="Apple Close Price from May-2019 to Mar-2020")

Altair - Basic Interactive Plotting in Python

When we created the line chart above we have specified the column data category with one character after the column name. We have separated them with one colon (Date:T, Close:Q). This gives hint to Altair that date column needs to be considered as datetime column and close column has quantitative data. We can explicitly specify column type like this if Altair is failing to recognize the exact type.

Below we have listed commonly used data category characters in Altair:

  • T: Date-time
  • Q: Quantitative
  • O: Ordered
  • N: Nominal

5. Area Chart

The fifth chart type that we have introduced is the Area chart using Altair. We can plot an area chart using the mark_area() method of Altair. We are highlighting the area below the close price of Apple stock from April-2019 till March-2020.

In [ ]:
alt.Chart(apple_df.reset_index()).mark_area(
    color='green'
).encode(
    x = 'Date:T', y = alt.X('Close:Q', scale=alt.Scale(zero=False))
).properties(
    width=300,
    height=300,
    title="Apple Close Price from May-2019 to Mar-2020")

Altair - Basic Interactive Plotting in Python

6. Box Plot

The sixth chart type we would like to introduce using Altair is a box plot. We are plotting box plot exploring the distribution of alcohol per wine category using the mark_boxplot() method.

In [ ]:
alt.Chart(wine_df).mark_boxplot(color="tomato").encode(
    x=alt.X('Category:N'),
    y=alt.Y('alcohol:Q', scale=alt.Scale(zero=False))
).properties(
    width=300,
    height=300,
    title="Distribution of Alcohol per Wine Category")

Altair - Basic Interactive Plotting in Python

7. Scatter Matrix

The seventh chart type that we have introduced using Altair is a scatter matrix chart. We are exploring the relationship between three columns (alcohol, malic_acid, and proline).

We have used a method named repeat() which accepts row and column names which will be repeated when plotting charts. It works like a loop inside a loop exploring the relationship between all possible combinations of columns. We have also color encoded scatter plots according to wine categories.

In [ ]:
alt.Chart(wine_df).mark_circle().encode(
    alt.X(alt.repeat("column"), type='quantitative', scale=alt.Scale(zero=False)),
    alt.Y(alt.repeat("row"), type='quantitative', scale=alt.Scale(zero=False)),
    color='Category:N'
).properties(
    width=150,
    height=150,
).repeat(
    row=['alcohol', 'malic_acid', 'proline'],
    column=['alcohol', 'malic_acid', 'proline']
).properties(
    title="ScatterMatrix of 'alcohol', 'malic_acid', 'proline'"
).interactive()

Altair - Basic Interactive Plotting in Python

8. CandleStick Chart

The eight chart type that we have introduced below is a candlestick chart. We are plotting a candlestick chart for apple stock prices for the month of March-2020.

The plotting of the candle stick chart is carried out in 3 steps. In the first step, we create a base plot with proper x and y-axis. We then create a rule chart based on low and high columns by extending the base chart. Then we create a bar chart based on open and close columns by extending the base chart. At last, we merge the bar and rule chart to create a candlestick chart.

In [ ]:
apple_mar_2020 = apple_df.loc["2020-3"].reset_index()

open_close_color = alt.condition("datum.Open <= datum.Close",
                                 alt.value("lawngreen"),
                                 alt.value("tomato"))

base = alt.Chart(apple_mar_2020).encode(
    alt.X('Date:T',
          axis=alt.Axis(
              format='%m/%d',
              labelAngle=-45,
              title='Date in 2009'
          )
    ),
    color=open_close_color,
)

rule = base.mark_rule().encode(
    alt.Y('Low:Q', title='Price',scale=alt.Scale(zero=False)),
    alt.Y2('High:Q')
)

bar = base.mark_bar().encode(
    alt.Y('Open:Q'),
    alt.Y2('Close:Q')
).properties(
    width=500,
    height=300,
    title="Apple Close Price from May-2019 to Mar-2020")

rule + bar

Altair - Basic Interactive Plotting in Python

9. Scatter Map

The last chart type that we would like to introduce is a scatter map. We'll be using the Starbucks store locations dataset for this purpose. We'll also need vega_datasets library installed for this purpose as it holds information about various world maps.

Below we are creating a world map without any markers added on top of it. We are using vega_datasets which provides world countries information. We first create a data source using the topo_feature() method passing it URL from which it'll download world map data. We are downloading data with country wise borders.

We then use this data source to plot the world map using the mark_geoshape() method. The stroke property used in mark_geoshape() refers to the color of country borders.

In [ ]:
from vega_datasets import data


source = alt.topo_feature(data.world_110m.url, 'countries')

background = alt.Chart(source).mark_geoshape(
    fill='lightgray',
    stroke='white'
).properties(
    width=500,
    height=300
).project('naturalEarth1')

background

Altair - Basic Interactive Plotting in Python

Below we are first creating a dataset for plotting to a scatter map. We are grouping the original dataset according to the state to get a count of stores per state. We are then creating another dataframe where we have average latitude and longitude of that state. We merge both data frames to create the final dataframe where we have information about Starbucks store count per state as well as state latitude and longitude. We'll use this information to plot to a scatter map.

In [18]:
mean_long_lat = starbucks_locations.groupby(by="State/Province").mean()[["Longitude", "Latitude"]]
count_per_state  = starbucks_locations.groupby(by="State/Province").count()[["Store Number"]].rename(columns={"Store Number":"Count"})

count_per_state = count_per_state.join(mean_long_lat).reset_index()
count_per_state.head()
Out[18]:
State/Province Count Longitude Latitude
0 0 89 121.035618 14.572697
1 1 193 90.336788 17.152539
2 10 275 101.766000 12.460582
3 11 706 121.629702 37.964255
4 12 145 123.439448 33.032483

Below we are creating a scatter plot of longitude versus latitude. We are using the count of the store column of the dataset to show the size of the marker. We then merge this scatter plot with a world map created earlier to create a scatter map.

We can notice from a scatter map easily that California has the highest number of Starbucks stores per stats which is more than 2.5k.

In [19]:
points  = alt.Chart(count_per_state).mark_circle(
    color="tomato"
).encode(
    x="Longitude:Q", y="Latitude:Q", size="Count:Q",
    tooltip = ["State/Province", "Count"]
).interactive()
In [ ]:
background + points

Altair - Basic Interactive Plotting in Python

This ends our small tutorial introducing the basic API of Altair to plot basic charts using it. Please feel free to let us know your views in the comments section.

References

List of other plotting libraries in python


Sunny Solanki  Sunny Solanki