bqplot
¶bqplot
is an interactive data visualization library developed by Bloomberg developers. It's totally based on d3.js
(data visualization javascript library) and ipywidgets
(python jupyter notebook widgets library). The main aim of bqplot is to bring in benefits ofd3.js
functionality to python along with utilizing widgets facility of ipywidgets by keeping all plot components as widgets to infuse flexibility. The library is developed with keeping interactive widgets in mind which allows us to change widgets value to reflect changes in the plot. All of the individual components of the graph in bqplot are interactive widgets based on ipywidgets. This gives a lot of flexibility with regard to creating interactive visualization as well as easy integration with other notebook widgets.
bqplot
provides 2 kinds of APIs for creating plots:
Matplotlib pyplot
like API: It provides the same set of functions as that of available in matplotlib.pyplot
module. We can easily create graphs by calling methods like scatter(), bar(), pie(), heatmap(), etc.
bqplot internal object model API: It provides API which lets us create an object for each individual graph components like figure, axis, scales, etc. Each of these objects behaves as a widget and can be linked to other widgets. We need to then combine all of this to create a plot. This API gives more flexibility.
We'll be covering bqplot's matplotlib like pyplot API in this tutorial. We'll also give various examples explaining about individual components of graph and modification of them to create aesthetically pleasing graphs. We'll be using various datasets to explain various chart types available with bqplot.
If you are interested in learning about plotting with internal object model API then please feel free to visit our tutorial on it:
We'll start by importing necessary libraries.
import bqplot
from bqplot import pyplot as plt
import pandas as pd
import numpy as np
import sklearn
import warnings
warnings.filterwarnings("ignore")
We'll be loading all datasets from the beginning and will be keeping them as pandas dataframe to make plotting easy.
The first dataset that we'll be loading is wine
dataset available with scikit-learn. It has information about wine ingredients and their presence in three different wine categories.
from sklearn.datasets import load_wine
wine = load_wine()
print("Dataset Features : ", wine.feature_names)
print("Dataset Size : ", wine.data.shape)
wine_df = pd.DataFrame(data=wine.data, columns=wine.feature_names)
wine_df["Category"] = wine.target
wine_df.head()
Another dataset that we'll be using for our explanation purpose is APPLE OHLC data downloaded from yahoo finance as CSV. We'll be loading it as a pandas dataframe.
apple_df = pd.read_csv("datasets/AAPL.csv", index_col=0, parse_dates=True)
apple_df.head()
The third dataset that we'll be using for an explanation of map charts is world happiness dataset available on kaggle. It has information about attributes like happiness score, perception of corruption, healthy life expectancy, social support by govt., freedom to make life choices, generosity and GDP per capita for various countries of the earth. We'll be loading it as a pandas dataframe.
happiness_df = pd.read_csv("datasets/world_happiness_2019.csv")
happiness_df.head()
We suggest that you download all datasets beforehand and keep it in the same directory as a jupyter notebook to follow along with a tutorial. We'll now start by plotting various plots to explain the usage of bqplot's pyplot API.
The first plot type that we'll introduce is a scatter plot. We'll plot the alcohol vs malic acid relationship using a scatter plot.
fig = plt.figure(title="Alcohol vs Malic Acid Relation")
scat = plt.scatter(x=wine_df["alcohol"], y=wine_df["malic_acid"])
plt.xlabel("Alcohol")
plt.ylabel("Malic Acid")
plt.show()
Please make a note that all the charts won't be interactive on web-page here but when you run it in a jupyter notebook then they'll be interactive.
Below we are trying to modify scatter plot by passing arguments related to color, edge color, edge width, marker size, market type, opacity, etc. Below we have explained another way of setting axis attributes by passing them as a dictionary to the axes_options
parameter. We need to use stroke
and stroke_width
parameters to modify the line property of markers. We have used square markers for this scatter plot and 2 different colors to color individual markers.
fig = plt.figure(title="Alcohol vs Malic Acid Relation", )
options = {'x':{'label':"Alcohol"}, 'y':{'label':'Malic Acid'}}
scat = plt.scatter(wine_df["alcohol"], wine_df["malic_acid"],
colors=["lime", "tomato"],
axes_options = options,
stroke="black", stroke_width=2.0,
default_size=150,
default_opacities=[0.7],
marker="square",
)
plt.show()
We can even access the layout object
from the figure object and then modify plot width
and height
by setting their values as pixels. We are also setting the x-axis label, y-axis label and x-axis limit to further enhance the graph. We are also color-encoding points according to the wine category. We also have changed the color bar location through the axes_options
parameter. We are color-encoding points of scatter plot by using different wine categories.
fig = plt.figure(title="Alcohol vs Malic Acid Relation")
fig.layout.height = "500px"
fig.layout.width = "600px"
options = {'color': dict(label='Category', orientation='vertical', side='right')}
scat = plt.scatter(x = wine_df["alcohol"], y = wine_df["malic_acid"],
color=wine_df["Category"],
axes_options = options,
stroke="black", stroke_width=2.0,
default_size=200,
default_opacities=[0.9],
marker="circle",
)
plt.xlabel("Alcohol")
plt.ylabel("Malic Acid")
plt.xlim(10.7, 15.3)
plt.show()
Below we are introducing tooltip
which will highlight Wine Category, Alcohol and Malic Acid values for that point when the mouse hovers over it. We need to pass graph attributes that will be used to generate tooltip contents. We are using the contents of the x-axis, y-axis and color (wine category) for displaying on the tooltip.
from bqplot import Tooltip
scat.tooltip = Tooltip(fields=["color", 'x', 'y'], labels=["Wine Category", "Alcohol", "Malic Acid"])
We can also enable movement of a point on the graph by setting enable_move
attribute to True.
scat.enable_move = True
Please make a note that majority of methods available through pyplot module of bqplot is almost same as that of pyplot module of matplotlib. If you have background in matplotlib then it'll be helpful with learning bqplot.
The second type of chart we'll introduce is a bar chart and it's a variety like a side by side as well as stacked bar charts.
Below We are plotting our first bar chart depicting the average magnesium per wine category. We have first grouped entries of wine dataframe to group entries according to wine categories and then have taken average to collect dataframe with average values of all columns per wine category. We'll be further using these average values per wine category dataframe in the future with other charts as well.
fig = plt.figure(title="Average Magnesium Per Wine Category")
fig.layout.height = "400px"
fig.layout.width = "600px"
avg_wine_df = wine_df.groupby(by="Category").mean()
bar_chart = plt.bar(x = avg_wine_df.index, y= avg_wine_df["magnesium"])
bar_chart.colors = ["tomato"]
bar_chart.tooltip = Tooltip(fields=["x", "y"], labels=["Wine Category", "Avg Magnesium"])
plt.xlabel("Wine Category")
plt.ylabel("Average Magnesium")
plt.show()
The below example demonstrates how to generate side by side bar chart. We are generating average ash and average flavonoids per wine category as a bar chart.
fig = plt.figure(title="Average Magnesium Per Wine Category",
fig_margin={'top':50, 'bottom':20, 'left':150, 'right':150},
legend_location="top-left")
avg_wine_df = wine_df.groupby(by="Category").mean()
bar_chart = plt.bar(x = avg_wine_df.index, y= [avg_wine_df["ash"], avg_wine_df["flavanoids"]],
labels = ["Ash", "Flavanoids"],
display_legend=True)
bar_chart.type = "grouped"
bar_chart.colors = ["tomato", "lime"]
bar_chart.tooltip = Tooltip(fields=["x", "y"], labels=["Wine Category", "Avg Ash/Flavanoids"])
plt.xlabel("Wine Category")
plt.ylabel("Average Magnesium")
plt.show()
Below we are explaining a stacked bar chart example. We are plotting average ash and flavonoids per wine category stacked over one another as a bar chart.
fig = plt.figure(title="Average Magnesium Per Wine Category",
fig_margin={'top':50, 'bottom':20, 'left':150, 'right':150},)
avg_wine_df = wine_df.groupby(by="Category").mean()
bar_chart = plt.bar(x = avg_wine_df.index, y= [avg_wine_df["ash"], avg_wine_df["flavanoids"]],
labels=["Ash", "Flavanoids"],
display_legend=True)
bar_chart.type = "stacked"
bar_chart.colors = bqplot.CATEGORY10
bar_chart.tooltip = Tooltip(fields=["x", "y"], labels=["Wine Category", "Avg Ash/Flavanoids"])
plt.xlabel("Wine Category")
plt.ylabel("Average Magnesium")
plt.show()
The third chart type that we would like to introduce is the famous line chart. We'll be plotting simple line chart as well as chart with more than one line per chart.
Below we are plotting apple stock close price for the whole period from May-2019 till Apr - 2020. We'll be using plot()
method by passing it date-range and closing prices to generate a line chart.
fig = plt.figure(title="Apple Stock Close Price")
line_chart = plt.plot(x=apple_df.index, y=apple_df.Close)
plt.xlabel("Date")
plt.ylabel("Close Price")
plt.show()
Below we are generating another line chart where we are plotting open, high, low and close prices of apple for a period of May-2019 till Apr-2020. We have combined all line charts in a single figure and also displaying legends to differentiate each line from another using different colors.
fig = plt.figure(title="Apple Stock Close Price", legend_location="top-left")
line_chart = plt.plot(x=apple_df.index, y=[apple_df.Open, apple_df.High, apple_df.Low, apple_df.Close],
labels=["Open","High", "Low", "Close"],
display_legend=True)
plt.xlabel("Date")
plt.ylabel("Close Price")
line_chart.tooltip = Tooltip(fields=["x", "y"], labels=["Date", "OHLC Price"])
plt.show()
The fourth chart type that we'll be introducing is histograms. The histograms are quite commonly used to see a distribution of values of a particular column of data. Below we are plotting alcohol distribution with 20 bins per histogram.
fig = plt.figure(title="Alcohol Distribution")
fig.layout.width = "600px"
fig.layout.height = "500px"
histogram = plt.hist(sample = wine_df["alcohol"], bins=20)
histogram.colors = ["orangered"]
histogram.stroke="blue"
histogram.stroke_width = 2.0
plt.grids(value="none")
plt.xlim(10.5,15.5)
plt.show()
The fifth chart type will be a pie chart. The pie charts are commonly used to see a distribution of each value in categorical variables. We'll be checking the distribution of wine categories. We'll also modify various styling attributes of the pie chart.
from collections import Counter
wine_cat = Counter(wine_df["Category"])
fig = plt.figure(title="Wine Category Distribution", animation_duration=1000)
pie = plt.pie(sizes = list(wine_cat.values()),
labels =["Category %d"%val for val in list(wine_cat.keys())],
display_values = True,
values_format=".0f",
display_labels='outside')
pie.stroke="black"
pie.colors = ["tomato","lawngreen", "skyblue"]
pie.opacities = [0.7,0.8,0.9]
pie.radius = 150
pie.inner_radius = 60
pie.label_color = 'orangered'
pie.font_size = '20px'
pie.font_weight = 'bold'
plt.show()
Our sixth chart type is box plots. The box plots are commonly used to check the concentration of the majority of values of a particular quantity. We'll be plotting a box plot for various columns of wine data.
fig = plt.figure(title="Box Plots")
mini_df = wine_df[["alcohol","malic_acid","ash","total_phenols", "flavanoids", "nonflavanoid_phenols", "proanthocyanins", "color_intensity", "hue"]]
boxes = plt.boxplot(x=range(mini_df.shape[1]), y=mini_df.values.T)
boxes.box_fill_color = 'lawngreen'
boxes.opacity = 0.6
boxes.box_width = 50
plt.grids(value="none")
plt.show()
The seventh chart type that we'll be introducing is a heatmap. We are using the heatmap below to depict the correlation between various columns of wine data.
fig = plt.figure(title="Correlation Heatmap",padding_y=0)
fig.layout.width = "700px"
fig.layout.height = "700px"
axes_options = {'color': {'orientation': "vertical","side":"right"}}
plt.heatmap(color=wine_df.corr().values, axes_options=axes_options)
plt.show()
The candlestick charts are very common in the finance industry and our eight chart type that we would like to introduce. It's used to represent a change in the value of the stock for a particular day over a period of time.
We are plotting a candlestick chart for apple stock for January-2020. We need an open, high, low and close price of the stock to generate candlestick charts.
fig = plt.figure(title="Apple CandleStick Chart")
fig.layout.width="800px"
apple_df_jan_2020 = apple_df["2020-1"]
ohlc = plt.ohlc(x=apple_df_jan_2020.index, y=apple_df_jan_2020[["Open","High","Low","Close"]],
marker="candle", stroke="blue")
ohlc.colors=["lime", "tomato"]
plt.xlabel("Date")
plt.show()
Below we have introduced another variation of candlestick chart which only displays lines instead of a bar for each change in stock value.
fig = plt.figure(title="Apple CandleStick Chart")
fig.layout.width="800px"
apple_df_jan_2020 = apple_df["2020-1"]
ohlc = plt.ohlc(x=apple_df_jan_2020.index, y=apple_df_jan_2020[["Open","High","Low","Close"]],
marker="bar", stroke="blue")
ohlc.colors=["lime", "tomato"]
plt.xlabel("Date")
plt.show()
Our ninth and last chart type that we'll like to introduce is choropleth maps. bqplot
provides a way to create interactive choropleth maps as well. We'll be utilizing the world happiness dataset that we had loaded earlier for plotting various choropleth maps.
We first need to create simple mapping method which takes as input map data and then maps each id of the country to particular value like happiness score, life expectancy, corruption of that country. bqplot
has a method geo()
which is used to generate choropleth mapping needs a mapping from country id to its value to generate choropleth maps as its color
parameter.
We'll follow below-mentioned steps to generate choropleth maps with bqplot
:
geo()
. It'll initialize the graph with data about each country in the world.color
attribute of the map object. We also have set the default color value of grey
when we don't find the mapping.def map_data_to_color_mapping(map_data, column="Score"):
"""
Function to Map Country ID to Column Value from Happiness DataFrame
"""
name_to_id_mapping = []
for entry in map_data:
if entry["properties"]["name"] == "Russian Federation":
name_to_id_mapping.append(("Russia", entry["id"]))
else:
name_to_id_mapping.append((entry["properties"]["name"], entry["id"]))
name_to_id_mapping = dict(name_to_id_mapping)
color = []
for name, idx in name_to_id_mapping.items():
score = happiness_df[happiness_df["Country or region"].str.contains(name)]["Score"].values
if len(score) > 0:
color.append((idx,score[0]))
return dict(color)
Below we are generating a happiness choropleth map which depicts the choropleth of happiness score for each country of the world.
fig = plt.figure(title='World Happiness Report')
plt.scales(scales={'color': bqplot.ColorScale(scheme='Blues')})
choropleth_map = plt.geo(map_data='WorldMap',
colors={'default_color': 'Grey'})
map_data = choropleth_map.map_data["objects"]["subunits"]["geometries"]
choropleth_map.color = map_data_to_color_mapping(map_data)
choropleth_map.tooltip = Tooltip(fields=["color"], labels=["Happiness Score"])
fig
Below we are generating a Healthy life expectancy choropleth map which depicts choropleth of Healthy life expectancy for each country of the world.
fig = plt.figure(title='World Healthy life expectancy Report')
plt.scales(scales={'color': bqplot.ColorScale(scheme='RdYlBu')})
choropleth_map = plt.geo(map_data='WorldMap',
colors={'default_color': 'white'})
map_data = choropleth_map.map_data["objects"]["subunits"]["geometries"]
choropleth_map.color = map_data_to_color_mapping(map_data, "Healthy life expectancy")
choropleth_map.tooltip = Tooltip(fields=["color"], labels=["Healthy life expectancy"])
fig
Below we are generating Perceptions of corruption choropleth map which depicts choropleth of Perceptions of corruption for each country of the world.
fig = plt.figure(title='World Perceptions of corruption Report')
plt.scales(scales={'color': bqplot.ColorScale(scheme='BrBG')})
choropleth_map = plt.geo(map_data='WorldMap',
colors={'default_color': 'white'})
map_data = choropleth_map.map_data["objects"]["subunits"]["geometries"]
choropleth_map.color = map_data_to_color_mapping(map_data, "Perceptions of corruption")
choropleth_map.tooltip = Tooltip(fields=["color"], labels=["Perceptions of corruption"])
fig
This ends our small tutorial on introducing pyplot
API of bqplot and various graphs available through this API. Please feel free to let us know your views in the comments section.