Share @ LinkedIn Facebook  bokeh, data-visualization
Bokeh - Basic Interactive Plotting in Python [Jupyter Notebook]

Interactive Plotting in Python using Bokeh

Table of Contents

Introduction

Bokeh is an interactive data visualization library built on top of javascript. Bokeh provides easy to use interface which can be used to design interactive graphs fast to perform in-depth data analysis. We'll start with plotting simple graphs and glyphs (basic shapes) which are available in bokeh.plotting module. This module of bokeh has a list of default visual styles and tools.

So without further delay, let’s get started.

Plotting through bokeh.plotting module requires a list of common import based on where you want resulting plot to be displayed (Jupyter Notebook or New browser tab or save to file). Below are common steps to be followed to create graphs.

  • Create graph object using bokeh.plotting.figure() function.
  • Calling output_notebook() for displaying graphs in notebook or output_file() for opening in new tab/saving to file from bokeh.io.
  • Calling show() method for showing graphs in jupyter notebook/new browser tab or save() for saving to file from bokeh.io.

As a part of this tutorial, we'll be using output_notebook() to display all graphs inside the notebook.

In [1]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.resources import INLINE
In [ ]:
output_notebook(resources=INLINE)

Bokeh Loading

After calling output_notebook(), all subsequent calls to show() will display output graphs in notebook else it'll open graphs in a new browser tab. The successful execution of the above statement will show a message saying BokehJS 2.0.1 successfully loaded..

Loading Dataset

We need to load datasets which we'll be using for our purpose through this tutorial. Bokeh provides a list of datasets as pandas dataframe as a part of it's bokeh.sampledata module. The first dataset that, we'll be using is autompg dataset which has information about car models along with their mpg, no of cylinders, disposition, horsepower, weight. acceleration, year launched, origin, name, and manufacturer.

In [2]:
from bokeh.sampledata.autompg import autompg_clean as autompg_df
autompg_df.head()
Out[2]:
mpg cyl displ hp weight accel yr origin name mfr
0 18.0 8 307.0 130 3504 12.0 70 North America chevrolet chevelle malibu chevrolet
1 15.0 8 350.0 165 3693 11.5 70 North America buick skylark 320 buick
2 18.0 8 318.0 150 3436 11.0 70 North America plymouth satellite plymouth
3 16.0 8 304.0 150 3433 12.0 70 North America amc rebel sst amc
4 17.0 8 302.0 140 3449 10.5 70 North America ford torino ford

Another dataset that we'll be using for our purpose is google stock dataset which has information about stock open, high, low, close prices per business day as well as daily volume data.

In [3]:
from bokeh.sampledata.stocks import GOOG as google
import pandas as pd

google_df = pd.DataFrame(google)
google_df["date"] = pd.to_datetime(google_df["date"])
google_df.head()
Out[3]:
date open high low close volume adj_close
0 2004-08-19 100.00 104.06 95.96 100.34 22351900 100.34
1 2004-08-20 101.01 109.08 100.50 108.31 11428600 108.31
2 2004-08-23 110.75 113.48 109.05 109.40 9137200 109.40
3 2004-08-24 111.24 111.60 103.57 104.87 7631300 104.87
4 2004-08-25 104.96 108.00 103.88 106.00 4598900 106.00

1. Scatter Plots

We'll start by plotting simple scatter plots. Plotting graphs through bokeh has generally below mentioned simple steps.

  • Create figure using figure().
  • Call any glyph function (like circle(),square(), cross(), etc) on figure object created above.
  • Call show() method passing it figure object to display the graph.

Our first scatter plot consists of autompg displ column versus weight column.

In [ ]:
fig = figure(plot_width=400, plot_height=400)

fig.circle(
            x=autompg_df["displ"], y=autompg_df["weight"]
        )

show(fig)

Bokeh Basics

We can see that it just plots graphs and lacks a lot of things like x-axis label, y-axis label, title, etc.

We'll now try various attributes of circle() to improve a plot little. The default value for size attribute is 4 which we'll change below along with circle color and circle edge color. We also have added attributes like alpha (responsible for transparency of glyph-circle), line_color (line color of glyph-circle), fill_color (color of glyph-circle). We can also modify the x-axis and y-axis labels by accessing them through figure object and then setting axis_label attribute with label names. We can also set plot width, height and title by setting value of parameters plot_height, plot_width and title of figure() method.

In [ ]:
fig = figure(plot_width=400, plot_height=400, title="Disposition vs Weight")

fig.circle(
           x=autompg_df["displ"],
           y=autompg_df["weight"],
           size=10, alpha=0.8,
           line_color="red", fill_color="skyblue"
        )

fig.xaxis.axis_label="Disposition"
fig.yaxis.axis_label="Weight"

show(fig)

Bokeh Basics

We can even use glyphs other than circle as well for plotting. Below we have given some of the glyphs provided for plotting.

  • asterisk()
  • circle()
  • circle_cross()
  • circle_x()
  • cross()
  • dash()
  • diamond()
  • diamond_cross()
  • inverted_triangle()
  • square()
  • square_cross()
  • square_x()
  • triangle()
  • x()

Below we are using square() for plotting scatter plot consisting of square glyphs. We also have modified a few attributes which we have mentioned above.

In [ ]:
import numpy as np

square_sizes = np.random.randint(1,25,size=(autompg_df.shape[0]))

fig = figure(plot_width=400, plot_height=400, title="Disposition vs Weight")

fig.square(
           x=autompg_df["displ"],
           y=autompg_df["weight"],
           size=square_sizes, alpha=0.5,
           line_color="orange", fill_color="orange"
        )

fig.xaxis.axis_label="Disposition"
fig.yaxis.axis_label="Weight"

show(fig)

Bokeh Basics

Please feel free to try above mentioned various glyphs to generate scatter plot with different marker styles.

2. Line Plots

Our second plot type will be a line plot. We'll be explaining it with a few examples below. We can simply pass x and y values to line() method of figure object to create a line chart. We'll be first creating a line chart of random data generated through numpy.

In [ ]:
fig = figure(plot_width=600, plot_height=300, title="Sample Line Plot")

fig.line(x=range(10), y=np.random.randint(1,50, 10))

show(fig)

Bokeh Basics

We can change the line color and line format by setting line_color and line_dash parameters of line() function.

In [ ]:
fig = figure(plot_width=600, plot_height=400, title="Sample Dashed Dotted Line Plot")

fig.line(
        x=range(10), y=np.random.randint(1,50, 10),
        line_width=3, line_color="lime", line_dash="dotdash"
        )

show(fig)

Bokeh Basics

Below we are creating another line chart that depicts changes in the close price of google stock over time using the dataframe which we had created above. We are also setting various attributes that we have mentioned above to improve the aesthetics of the graph. We also introduced ways to modified grid settings of the graph. We can access x-grid and y-grid from figure object and then set parameters like grid_line_color and grid_line_alpha to set the grid line color and line transparency.

In [ ]:
fig = figure(plot_width=700, plot_height=400, x_axis_type="datetime", title="Google Stock Prices from 2005 - 2013")

fig.line(
        x=google_df.date, y=google_df.close,
        line_width=3, line_color="tomato",
        )

fig.xaxis.axis_label = 'Time'
fig.yaxis.axis_label = 'Price ($)'

fig.xgrid.grid_line_color=None
fig.ygrid.grid_line_alpha=1.0

show(fig)

Bokeh Basics

Another important glyph method provided for creating a line chart is step() which creates a line chart using step format.

In [ ]:
fig = figure(plot_width=700, plot_height=400, title="Sample Step Chart")

fig.step(
        x=range(10), y=np.random.randint(1,10,size=10),
        line_color="red", line_width=2
        )

show(fig)

Bokeh Basics

Bokeh also provides a method named multi_line() which can be used to plot multiple lines on the same chart. We need to pass x and y arrays as a list to this method to create multiple line charts. We also have introduced a parameter named line_width which modifies the width of line based on integer provided to it by that many pixels.

In [ ]:
fig = figure(plot_width=700, plot_height=400, title="Sample Multi Line Chart")

fig.multi_line(
                [list(range(5)), list(range(5,10))],  ##Line Xs
                [np.random.randint(1,25, size=5), np.random.randint(25,50, size=5)], ##Line Ys
                color=["firebrick", "navy"], alpha=[0.6, 0.5], line_width=4)

show(fig)

Bokeh Basics

3. Bar Charts

Bokeh provides list of method for creating bar charts.

  • vbar()
  • hbar()
  • vbar_stack()
  • hbar_stack()

We'll explain each using various examples.

Below bar chart is a common bar chart created by calling the method vbar(). It accepts parameter x and top for setting x-axis values for each bar and height of each bar respectively. We also have modified the look of the bar chart further by setting common attributes discussed above. We also tried to modified tick labels of the x-axis by accessing the x-axis through the figure object.

In [ ]:
autompg_avg_by_origin = autompg_df.groupby(by="origin").mean()

fig = figure(plot_width=300, plot_height=300, title="Average mpg per region")

fig.vbar(x = [1,2,3],
         width=0.5,
         top=autompg_avg_by_origin.mpg,
         fill_color="firebrick", line_color="blue", alpha=0.8)

fig.xaxis.axis_label="Region"
fig.yaxis.axis_label="MPG"

fig.xaxis.ticker = [1, 2, 3]
fig.xaxis.major_label_overrides = {1: 'North America', 2: 'Asia', 3: 'Europe'}

show(fig)

Bokeh Basics

Below is another example of a bar chart but with a bar aligned horizontally. We can create a horizontal bar chart using hbar() method of figure object. We need to pass y and right attributes to set y-axis values for each bar and height of each bar respectively. we also have modified the look of the chart by modifying common attributes which we have already discussed above.

In [ ]:
fig = figure(plot_width=400, plot_height=300, title="Average mpg per region")

fig.hbar(y = [1,2,3],
         height=0.5,
         right=autompg_avg_by_origin.mpg,
         fill_color="skyblue", line_color="red")

fig.xaxis.axis_label="MPG"
fig.yaxis.axis_label="Region"

fig.yaxis.ticker = [1, 2, 3]
fig.yaxis.major_label_overrides = {1: 'North America', 2: 'Asia', 3: 'Europe'}

show(fig)

Bokeh Basics

We can even create vertical stacked bar charts by calling vbar_stack() method on the figure object. We need to pass the dataframe to source attributes of vbar_stack() method. We also need to pass a list of columns whose values will be stacked as a list to vbar_stack(). We can pass a list of colors to color attribute for setting the color of each stacked bar. We also have modified other attributes to improve the look and feel of the chart. We also have introduced a process to create a legend. We need to import Legend from bokeh.models module to create legend as described below. After creating a legend object with setting as described below, we need to add it to figure by calling add_layout() method on figure object and passing legend object along with its location.

In [ ]:
from bokeh.models import Legend

fig = figure(plot_width=500, plot_height=400, title="Average mpg, accel per region")

v = fig.vbar_stack(['mpg','accel'], x="index",
               width=0.6,
               color=("lime", "tomato"), alpha=0.7,
               source=autompg_avg_by_origin.reset_index())

fig.xaxis.axis_label="Region"
fig.yaxis.axis_label="Average mpg/accel"

fig.xaxis.ticker = [0,1,2]
fig.xaxis.major_label_overrides = {0: 'North America', 1: 'Asia', 2: 'Europe'}

legend = Legend(items=[
    ("mpg",   [v[0]]),
    ("accel",   [v[1]]),
], location=(0, -30))

fig.add_layout(legend, 'right')

show(fig)

Bokeh Basics

Bokeh also provides a method named hbar_stack() to create a horizontal stacked bar chart. We have explained its usage below with a simple example. The process to create a horizontal stacked bar chart is almost the same as that of vertical one with few minor changes in parameter names.

In [ ]:
fig = figure(plot_width=600, plot_height=400, title="Average mpg,accel per region")

h = fig.hbar_stack(['mpg','accel'],
               y="index",
               height=0.6,
               color=("blue", "green"), alpha=0.5,
               source=autompg_avg_by_origin.reset_index())

fig.xaxis.axis_label="Average mpg/accel"
fig.yaxis.axis_label="Region"

fig.yaxis.ticker = [0,1,2]
fig.yaxis.major_label_overrides = {0: 'North America', 1: 'Asia', 2: 'Europe'}

legend = Legend(items=[
    ("mpg",   [h[0]]),
    ("accel",   [h[1]]),
], location=(0, -30))

fig.add_layout(legend, 'right')

show(fig)

Bokeh Basics

4. Rectangles

We can also create rectangles on graph using rect() and quad() methods. We have explained its usage below with examples.

Below we are creating a scatter plot of the rectangle. We need to pass width and height to define the size of rectangles. We can also change the angle of rectangles by setting angle parameter.

In [ ]:
fig = figure(plot_width=400, plot_height=400, title="Sample Rectangle Glyph Chart")
fig.rect(
         x=range(5), y=np.random.randint(1,25,size=5),
         width=0.2, height=1,
         angle=45,
         color="lawngreen",
        )

show(fig)

Bokeh Basics

Bokeh provides a method named quad() to create squares, rectangle shapes on the chart. We need to pass four lists representing top, bottom, left and right of the shape. Below we are explaining its usage with simple settings.

In [ ]:
fig = figure(plot_width=400, plot_height=400, title="Sample Quads Glyph Chart")
fig.quad(top=[1.5, 2.5, 3.5], bottom=[1, 2, 3], left=[1, 2, 3],
       right=[1.5, 2.5, 3.5], color="skyblue")

show(fig)

Bokeh Basics

5. Areas

Areas plot let us plot covered regions on plot. Bokeh provides below methods for creating area charts.

  • varea()
  • harea()
  • varea_stack()
  • harea_stack()

Below we are explaining simple usage of varea() which highlights area between two horizontal lines. We need to pass x values for both lines and y1 & y2 values of both lines. The area between these two lines will be highlighted with color settings passed.

In [ ]:
fig = figure(plot_width=400, plot_height=400, title="Sample Area Chart")

fig.varea(x=[1, 2, 3],
        y1=autompg_avg_by_origin.accel,
        y2=autompg_avg_by_origin.mpg,
        fill_color="cyan", alpha=0.5)

show(fig)

Bokeh Basics

Bokeh provides another method named harea() which can be used to highlight an area between two vertical lines. It has almost the same settings as varea() with little change in parameter names.

In [ ]:
fig = figure(plot_width=400, plot_height=400, title="Sample Area Chart")

fig.harea(y=[1, 2, 3],
        x1=autompg_avg_by_origin.accel,
        x2=autompg_avg_by_origin.mpg,
        fill_color="orangered", alpha=0.5)

show(fig)

Bokeh Basics

We can even highlight the area below two horizontal lines by calling varea_stack() method. The below example explains its usage in a simple way. In the same way, we can use harea_stack() to highlight the area below two vertical lines.

In [ ]:
fig = figure(plot_width=400, plot_height=400, title="Sample Stacked Area Chart")

fig.varea_stack(["accel","mpg"],
                x="index",
                color=("deeppink", "pink"),alpha=0.5,
                source=autompg_avg_by_origin.reset_index())

show(fig)

Bokeh Basics

6. Patches

Bokeh let us create a polygon using a method named patch() and patches() which helps us create single and multiple patches respectively.

The below example creates a simple polygon using patch() method. We also have modified the look and feel of a polygon using various common attributes described above.

In [ ]:
fig = figure(plot_width=300, plot_height=300, title="Sample Polygon Chart")

fig.patch([1, 2, 3, 4], [6, 8, 8, 7],
          alpha=0.5,
          line_width=2, line_color="black")

show(fig)

Bokeh Basics

Another method provided by bokeh named patches() can be used to create multiple polygons on the same chart. We need to pass x and y values for each polygon as a list to patches() method. Each list passed to patches represents one polygon which itself consist of two list specifying x and y values of that polygon.

In [ ]:
fig = figure(plot_width=300, plot_height=300, title="Sample Multiple Polygon Chart")

fig.patches([[1, 2, 2, ], [2, 1, 1, ]],[[2,3,4],[2,4,5]],
             color=["lavender", "violet"],
             line_width=2, line_color="black")

show(fig)

Bokeh Basics

Bokeh provided another method named multi_polygons() which can be used to create polygon shapes as well. Below we have described it's usage as well.

In [ ]:
fig = figure(plot_width=300, plot_height=300, title="Sample Multiple Polygon Chart")

fig.multi_polygons(
                    xs=[[[ [0, 3, 3, 0],[1, 2, 2], [2, 1, 1] ]]],
                    ys=[[[ [1, 1, 5, 5],[2, 3, 4], [2, 4, 5] ]]],
                    color="red", alpha=0.6
                 )

show(fig)

Bokeh Basics

7. Combining Multiple Charts

Till now we described ways to create simple charts. But in the real-world we'll need to merge more than one type of glyphs to create more aesthetic graphs. Below we are explaining how we can easily use more than one glyph on the same figure and it'll combine them easily to create a more aesthetically pleasing chart.

Below we are combining line and circle to create a combined chart of both. We are highlighting joints of the line by circle glyph.

In [ ]:
fig = figure(plot_width=600, plot_height=300, title="Sample Merge Plot")

y = np.random.randint(1,50, 10)

fig.line(x=range(10), y=y, line_width=2)
fig.circle(x=range(10), y=y, color="red", size=10)

show(fig)

Bokeh Basics

We can also combine glyph of different types to create a scatter chart with different types of markers as described below. We are plotting autompg displ vs weight per region by representing entries of each region with different glyphs. We also introduced a new parameter named legend_name which will be used to set legend values. We are also setting location legend as top_left by accessing the legend object from the figure object.

In [ ]:
fig = figure(plot_width=400, plot_height=400, title="Disposition vs Weight color-encoded by Origin")

fig.circle(
           x=autompg_df[autompg_df["origin"]=="North America"]["displ"],
           y=autompg_df[autompg_df["origin"]=="North America"]["weight"],
           fill_color="tomato",
           size=12, alpha=0.7,
           legend_label="North America"
        )

fig.diamond(
           x=autompg_df[autompg_df["origin"]=="Asia"]["displ"],
           y=autompg_df[autompg_df["origin"]=="Asia"]["weight"],
           fill_color="lawngreen",
           size=14, alpha=0.5,
           legend_label="Asia"
        )

fig.square(
           x=autompg_df[autompg_df["origin"]=="Europe"]["displ"],
           y=autompg_df[autompg_df["origin"]=="Europe"]["weight"],
           fill_color="skyblue",
           size=10, alpha=0.5,
           legend_label="Europe"
        )

fig.xaxis.axis_label="Disposition"
fig.yaxis.axis_label="Weight"

fig.legend.location = "top_left"

show(fig)

Bokeh Basics

Below we have given another example of combining graph where we are combining two bar charts to create a combined side by side bar chart.

In [ ]:
fig = figure(plot_width=400, plot_height=400, title="Mpg and Accel Bar Chart")

fig.vbar(x = [1,3,5],
         width=0.8,
         top=autompg_avg_by_origin.mpg,
         fill_color="lime", line_color="lime",
         legend_label="mpg")

fig.vbar(x = [2,4,6],
         width=0.8,
         top=autompg_avg_by_origin.accel,
         fill_color="tomato", line_color="tomato",
         legend_label="accel")

fig.xaxis.axis_label="Mpg/Acceleration"
fig.yaxis.axis_label="Origin"
fig.legend.location = "top_right"

fig.xaxis.ticker = [1.5, 3.5, 5.5]
fig.xaxis.major_label_overrides = {1.5: 'North America', 3.5: 'Asia', 5.5: 'Europe'}

show(fig)

Bokeh Basics

This ends our small tutorial on basic plotting with bokeh. Please feel free to let us know your views in the comments section.

References


Sunny Solanki  Sunny Solanki