Bokeh is an interactive data visualization library built on top of javascript. Bokeh provides easy to use interface which can be used to design interactive graphs fast to perform in-depth data analysis. We'll start with plotting simple graphs and glyphs (basic shapes) which are available in bokeh.plotting
module. This module of bokeh has a list of default visual styles and tools.
So without further delay, let’s get started.
Plotting through bokeh.plotting
module requires a list of common import based on where you want resulting plot to be displayed (Jupyter Notebook or New browser tab or save to file). Below are common steps to be followed to create graphs.
bokeh.plotting.figure()
function.output_notebook()
for displaying graphs in notebook or output_file()
for opening in new tab/saving to file from bokeh.io
.show()
method for showing graphs in jupyter notebook/new browser tab or save()
for saving to file from bokeh.io
.As a part of this tutorial, we'll be using output_notebook()
to display all graphs inside the notebook.
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.resources import INLINE
output_notebook(resources=INLINE)
After calling output_notebook()
, all subsequent calls to show()
will display output graphs in notebook else it'll open graphs in a new browser tab. The successful execution of the above statement will show a message saying BokehJS 2.0.1 successfully loaded.
.
We need to load datasets which we'll be using for our purpose through this tutorial. Bokeh provides a list of datasets as pandas dataframe as a part of it's bokeh.sampledata
module. The first dataset that, we'll be using is autompg
dataset which has information about car models along with their mpg, no of cylinders, disposition, horsepower, weight. acceleration, year launched, origin, name, and manufacturer.
from bokeh.sampledata.autompg import autompg_clean as autompg_df
autompg_df.head()
Another dataset that we'll be using for our purpose is google stock dataset which has information about stock open, high, low, close prices per business day as well as daily volume data.
from bokeh.sampledata.stocks import GOOG as google
import pandas as pd
google_df = pd.DataFrame(google)
google_df["date"] = pd.to_datetime(google_df["date"])
google_df.head()
We'll start by plotting simple scatter plots. Plotting graphs through bokeh
has generally below mentioned simple steps.
figure()
.circle()
,square()
, cross()
, etc) on figure object created above.show()
method passing it figure object to display the graph.Our first scatter plot consists of autompg displ
column versus weight
column.
fig = figure(plot_width=400, plot_height=400)
fig.circle(
x=autompg_df["displ"], y=autompg_df["weight"]
)
show(fig)
We can see that it just plots graphs and lacks a lot of things like x-axis label, y-axis label, title, etc.
We'll now try various attributes of circle()
to improve a plot little. The default value for size
attribute is 4 which we'll change below along with circle color and circle edge color. We also have added attributes like alpha
(responsible for transparency of glyph-circle), line_color
(line color of glyph-circle), fill_color
(color of glyph-circle). We can also modify the x-axis and y-axis labels by accessing them through figure object and then setting axis_label
attribute with label names. We can also set plot width, height and title by setting value of parameters plot_height
, plot_width
and title
of figure()
method.
fig = figure(plot_width=400, plot_height=400, title="Disposition vs Weight")
fig.circle(
x=autompg_df["displ"],
y=autompg_df["weight"],
size=10, alpha=0.8,
line_color="red", fill_color="skyblue"
)
fig.xaxis.axis_label="Disposition"
fig.yaxis.axis_label="Weight"
show(fig)
We can even use glyphs other than circle as well for plotting. Below we have given some of the glyphs provided for plotting.
Below we are using square()
for plotting scatter plot consisting of square glyphs. We also have modified a few attributes which we have mentioned above.
import numpy as np
square_sizes = np.random.randint(1,25,size=(autompg_df.shape[0]))
fig = figure(plot_width=400, plot_height=400, title="Disposition vs Weight")
fig.square(
x=autompg_df["displ"],
y=autompg_df["weight"],
size=square_sizes, alpha=0.5,
line_color="orange", fill_color="orange"
)
fig.xaxis.axis_label="Disposition"
fig.yaxis.axis_label="Weight"
show(fig)
Please feel free to try above mentioned various glyphs to generate scatter plot with different marker styles.
Our second plot type will be a line plot. We'll be explaining it with a few examples below. We can simply pass x and y values to line()
method of figure object to create a line chart. We'll be first creating a line chart of random data generated through numpy
.
fig = figure(plot_width=600, plot_height=300, title="Sample Line Plot")
fig.line(x=range(10), y=np.random.randint(1,50, 10))
show(fig)
We can change the line color and line format by setting line_color
and line_dash
parameters of line()
function.
fig = figure(plot_width=600, plot_height=400, title="Sample Dashed Dotted Line Plot")
fig.line(
x=range(10), y=np.random.randint(1,50, 10),
line_width=3, line_color="lime", line_dash="dotdash"
)
show(fig)
Below we are creating another line chart that depicts changes in the close price of google stock over time using the dataframe which we had created above. We are also setting various attributes that we have mentioned above to improve the aesthetics of the graph. We also introduced ways to modified grid settings of the graph. We can access x-grid and y-grid from figure object and then set parameters like grid_line_color
and grid_line_alpha
to set the grid line color and line transparency.
fig = figure(plot_width=700, plot_height=400, x_axis_type="datetime", title="Google Stock Prices from 2005 - 2013")
fig.line(
x=google_df.date, y=google_df.close,
line_width=3, line_color="tomato",
)
fig.xaxis.axis_label = 'Time'
fig.yaxis.axis_label = 'Price ($)'
fig.xgrid.grid_line_color=None
fig.ygrid.grid_line_alpha=1.0
show(fig)
Another important glyph method provided for creating a line chart is step()
which creates a line chart using step format.
fig = figure(plot_width=700, plot_height=400, title="Sample Step Chart")
fig.step(
x=range(10), y=np.random.randint(1,10,size=10),
line_color="red", line_width=2
)
show(fig)
Bokeh also provides a method named multi_line()
which can be used to plot multiple lines on the same chart. We need to pass x and y arrays as a list to this method to create multiple line charts. We also have introduced a parameter named line_width
which modifies the width of line based on integer provided to it by that many pixels.
fig = figure(plot_width=700, plot_height=400, title="Sample Multi Line Chart")
fig.multi_line(
[list(range(5)), list(range(5,10))], ##Line Xs
[np.random.randint(1,25, size=5), np.random.randint(25,50, size=5)], ##Line Ys
color=["firebrick", "navy"], alpha=[0.6, 0.5], line_width=4)
show(fig)
Bokeh provides list of method for creating bar charts.
We'll explain each using various examples.
Below bar chart is a common bar chart created by calling the method vbar()
. It accepts parameter x
and top
for setting x-axis values for each bar and height of each bar respectively. We also have modified the look of the bar chart further by setting common attributes discussed above. We also tried to modified tick labels of the x-axis by accessing the x-axis through the figure object.
autompg_avg_by_origin = autompg_df.groupby(by="origin").mean()
fig = figure(plot_width=300, plot_height=300, title="Average mpg per region")
fig.vbar(x = [1,2,3],
width=0.5,
top=autompg_avg_by_origin.mpg,
fill_color="firebrick", line_color="blue", alpha=0.8)
fig.xaxis.axis_label="Region"
fig.yaxis.axis_label="MPG"
fig.xaxis.ticker = [1, 2, 3]
fig.xaxis.major_label_overrides = {1: 'North America', 2: 'Asia', 3: 'Europe'}
show(fig)
Below is another example of a bar chart but with a bar aligned horizontally. We can create a horizontal bar chart using hbar()
method of figure object. We need to pass y
and right
attributes to set y-axis values for each bar and height of each bar respectively. we also have modified the look of the chart by modifying common attributes which we have already discussed above.
fig = figure(plot_width=400, plot_height=300, title="Average mpg per region")
fig.hbar(y = [1,2,3],
height=0.5,
right=autompg_avg_by_origin.mpg,
fill_color="skyblue", line_color="red")
fig.xaxis.axis_label="MPG"
fig.yaxis.axis_label="Region"
fig.yaxis.ticker = [1, 2, 3]
fig.yaxis.major_label_overrides = {1: 'North America', 2: 'Asia', 3: 'Europe'}
show(fig)
We can even create vertical stacked bar charts by calling vbar_stack()
method on the figure object. We need to pass the dataframe to source
attributes of vbar_stack() method. We also need to pass a list of columns whose values will be stacked as a list to vbar_stack(). We can pass a list of colors to color
attribute for setting the color of each stacked bar. We also have modified other attributes to improve the look and feel of the chart. We also have introduced a process to create a legend. We need to import Legend
from bokeh.models
module to create legend as described below. After creating a legend object with setting as described below, we need to add it to figure by calling add_layout()
method on figure object and passing legend object along with its location.
from bokeh.models import Legend
fig = figure(plot_width=500, plot_height=400, title="Average mpg, accel per region")
v = fig.vbar_stack(['mpg','accel'], x="index",
width=0.6,
color=("lime", "tomato"), alpha=0.7,
source=autompg_avg_by_origin.reset_index())
fig.xaxis.axis_label="Region"
fig.yaxis.axis_label="Average mpg/accel"
fig.xaxis.ticker = [0,1,2]
fig.xaxis.major_label_overrides = {0: 'North America', 1: 'Asia', 2: 'Europe'}
legend = Legend(items=[
("mpg", [v[0]]),
("accel", [v[1]]),
], location=(0, -30))
fig.add_layout(legend, 'right')
show(fig)
Bokeh also provides a method named hbar_stack()
to create a horizontal stacked bar chart. We have explained its usage below with a simple example. The process to create a horizontal stacked bar chart is almost the same as that of vertical one with few minor changes in parameter names.
fig = figure(plot_width=600, plot_height=400, title="Average mpg,accel per region")
h = fig.hbar_stack(['mpg','accel'],
y="index",
height=0.6,
color=("blue", "green"), alpha=0.5,
source=autompg_avg_by_origin.reset_index())
fig.xaxis.axis_label="Average mpg/accel"
fig.yaxis.axis_label="Region"
fig.yaxis.ticker = [0,1,2]
fig.yaxis.major_label_overrides = {0: 'North America', 1: 'Asia', 2: 'Europe'}
legend = Legend(items=[
("mpg", [h[0]]),
("accel", [h[1]]),
], location=(0, -30))
fig.add_layout(legend, 'right')
show(fig)
We can also create rectangles on graph using rect()
and quad()
methods. We have explained its usage below with examples.
Below we are creating a scatter plot of the rectangle. We need to pass width
and height
to define the size of rectangles. We can also change the angle of rectangles by setting angle
parameter.
fig = figure(plot_width=400, plot_height=400, title="Sample Rectangle Glyph Chart")
fig.rect(
x=range(5), y=np.random.randint(1,25,size=5),
width=0.2, height=1,
angle=45,
color="lawngreen",
)
show(fig)
Bokeh provides a method named quad()
to create squares, rectangle shapes on the chart. We need to pass four lists representing top, bottom, left and right of the shape. Below we are explaining its usage with simple settings.
fig = figure(plot_width=400, plot_height=400, title="Sample Quads Glyph Chart")
fig.quad(top=[1.5, 2.5, 3.5], bottom=[1, 2, 3], left=[1, 2, 3],
right=[1.5, 2.5, 3.5], color="skyblue")
show(fig)
Areas plot let us plot covered regions on plot. Bokeh provides below methods for creating area charts.
Below we are explaining simple usage of varea()
which highlights area between two horizontal lines. We need to pass x
values for both lines and y1
& y2
values of both lines. The area between these two lines will be highlighted with color settings passed.
fig = figure(plot_width=400, plot_height=400, title="Sample Area Chart")
fig.varea(x=[1, 2, 3],
y1=autompg_avg_by_origin.accel,
y2=autompg_avg_by_origin.mpg,
fill_color="cyan", alpha=0.5)
show(fig)
Bokeh provides another method named harea()
which can be used to highlight an area between two vertical lines. It has almost the same settings as varea()
with little change in parameter names.
fig = figure(plot_width=400, plot_height=400, title="Sample Area Chart")
fig.harea(y=[1, 2, 3],
x1=autompg_avg_by_origin.accel,
x2=autompg_avg_by_origin.mpg,
fill_color="orangered", alpha=0.5)
show(fig)
We can even highlight the area below two horizontal lines by calling varea_stack()
method. The below example explains its usage in a simple way. In the same way, we can use harea_stack()
to highlight the area below two vertical lines.
fig = figure(plot_width=400, plot_height=400, title="Sample Stacked Area Chart")
fig.varea_stack(["accel","mpg"],
x="index",
color=("deeppink", "pink"),alpha=0.5,
source=autompg_avg_by_origin.reset_index())
show(fig)
Bokeh let us create a polygon using a method named patch()
and patches()
which helps us create single
and multiple
patches respectively.
The below example creates a simple polygon using patch()
method. We also have modified the look and feel of a polygon using various common attributes described above.
fig = figure(plot_width=300, plot_height=300, title="Sample Polygon Chart")
fig.patch([1, 2, 3, 4], [6, 8, 8, 7],
alpha=0.5,
line_width=2, line_color="black")
show(fig)
Another method provided by bokeh named patches()
can be used to create multiple polygons on the same chart. We need to pass x and y values for each polygon as a list to patches()
method. Each list passed to patches represents one polygon which itself consist of two list specifying x and y values of that polygon.
fig = figure(plot_width=300, plot_height=300, title="Sample Multiple Polygon Chart")
fig.patches([[1, 2, 2, ], [2, 1, 1, ]],[[2,3,4],[2,4,5]],
color=["lavender", "violet"],
line_width=2, line_color="black")
show(fig)
Bokeh provided another method named multi_polygons()
which can be used to create polygon shapes as well. Below we have described it's usage as well.
fig = figure(plot_width=300, plot_height=300, title="Sample Multiple Polygon Chart")
fig.multi_polygons(
xs=[[[ [0, 3, 3, 0],[1, 2, 2], [2, 1, 1] ]]],
ys=[[[ [1, 1, 5, 5],[2, 3, 4], [2, 4, 5] ]]],
color="red", alpha=0.6
)
show(fig)
Till now we described ways to create simple charts. But in the real-world we'll need to merge more than one type of glyphs to create more aesthetic graphs. Below we are explaining how we can easily use more than one glyph on the same figure and it'll combine them easily to create a more aesthetically pleasing chart.
Below we are combining line
and circle
to create a combined chart of both. We are highlighting joints of the line by circle
glyph.
fig = figure(plot_width=600, plot_height=300, title="Sample Merge Plot")
y = np.random.randint(1,50, 10)
fig.line(x=range(10), y=y, line_width=2)
fig.circle(x=range(10), y=y, color="red", size=10)
show(fig)
We can also combine glyph of different types to create a scatter chart with different types of markers as described below. We are plotting autompg displ
vs weight
per region
by representing entries of each region with different glyphs. We also introduced a new parameter named legend_name
which will be used to set legend values. We are also setting location
legend as top_left
by accessing the legend object from the figure object.
fig = figure(plot_width=400, plot_height=400, title="Disposition vs Weight color-encoded by Origin")
fig.circle(
x=autompg_df[autompg_df["origin"]=="North America"]["displ"],
y=autompg_df[autompg_df["origin"]=="North America"]["weight"],
fill_color="tomato",
size=12, alpha=0.7,
legend_label="North America"
)
fig.diamond(
x=autompg_df[autompg_df["origin"]=="Asia"]["displ"],
y=autompg_df[autompg_df["origin"]=="Asia"]["weight"],
fill_color="lawngreen",
size=14, alpha=0.5,
legend_label="Asia"
)
fig.square(
x=autompg_df[autompg_df["origin"]=="Europe"]["displ"],
y=autompg_df[autompg_df["origin"]=="Europe"]["weight"],
fill_color="skyblue",
size=10, alpha=0.5,
legend_label="Europe"
)
fig.xaxis.axis_label="Disposition"
fig.yaxis.axis_label="Weight"
fig.legend.location = "top_left"
show(fig)
Below we have given another example of combining graph where we are combining two bar charts to create a combined side by side bar chart.
fig = figure(plot_width=400, plot_height=400, title="Mpg and Accel Bar Chart")
fig.vbar(x = [1,3,5],
width=0.8,
top=autompg_avg_by_origin.mpg,
fill_color="lime", line_color="lime",
legend_label="mpg")
fig.vbar(x = [2,4,6],
width=0.8,
top=autompg_avg_by_origin.accel,
fill_color="tomato", line_color="tomato",
legend_label="accel")
fig.xaxis.axis_label="Mpg/Acceleration"
fig.yaxis.axis_label="Origin"
fig.legend.location = "top_right"
fig.xaxis.ticker = [1.5, 3.5, 5.5]
fig.xaxis.major_label_overrides = {1.5: 'North America', 3.5: 'Asia', 5.5: 'Europe'}
show(fig)
This ends our small tutorial on basic plotting with bokeh. Please feel free to let us know your views in the comments section.
If you want to