The pandas library is a preferred library by the majority of data scientists worldwide for loading and manipulating structured datasets. The pandas dataframe provides very convenient visualization functionality using the plot()
method on it. We can create easily create charts like scatter charts, bar charts, line charts, etc directly from the pandas dataframe by calling the plot()
method on it and passing it various parameters. The pandas visualization uses the matplotlib
library behind the scene for all visualizations. All the plots generated by matplotlib are static hence charts generated by pandas dataframe's .plot()
API will be static as well. The python has many data visualization libraries like plotly, bokeh, holoviews, etc which generates interactive plots. It can be very helpful if we can generate interactive charts directly from the pandas dataframe.
The cufflinks is one such library that lets us generate interactive charts based on plotly directly from pandas dataframe by calling iplot()
or figure()
method on it passing various parameters. The iplot()
provides the majority of parameters which are almost the same as that of plot()
which will make it easier for someone having knowledge on plot()
to get used to using it. We'll explore iplot()
method as a part of this tutorial to generate various interactive plotly charts from pandas dataframe.
We'll start by loading the necessary libraries. We have also set the default configuration for cufflinks using set_config_file()
where we have set the default theme as well as other parameters. We can retrieve a list of themes available with cufflinks using getThemes()
method.
import pandas as pd
import numpy as np
import cufflinks as cf
print("List of Cufflinks Themes : ", cf.getThemes())
cf.set_config_file(theme='pearl',sharing='public',offline=True)
We'll be using below mentioned three datasets for plotting various charts.
We'll be loading each dataset as pandas dataframe which will be later used for plotting.
from sklearn.datasets import load_wine, load_iris
iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df["FlowerType"] = [iris.target_names[t] for t in iris.target]
iris_df.head()
wine = load_wine()
wine_df = pd.DataFrame(data=wine.data, columns=wine.feature_names)
wine_df["WineType"] = [wine.target_names[t] for t in wine.target]
wine_df.head()
apple_df = pd.read_csv("datasets/AAPL.csv", index_col=0, parse_dates=True)
apple_df.head()
There two simple ways to create charts using cufflinks from pandas dataframe. We can either call iplot()
or figure()
method on the dataframe passing it different chart types. We'll now explain how to plot various charts from the dataframe using these two methods.
The first chart type that we'll create using iplot()
API is a scatter chart. Below we are creating a scatter chart from the pandas dataframe by calling iplot()
method on it passing a kind parameter as scatter
to indicate chart type. We need to pass different chart types to the kind
parameter in order to create various charts.
We have passed column names of the dataframe as x and y parameters to iplot()
method to indicate it to use those columns from the dataframe to plot a scatter chart. We also have set mode
parameters as markers
to indicate the type of chart as scatter else it'll plot line chart by default. We can also give a title to the plot as well as to x and y-axis.
iris_df.iplot(kind="scatter",
x="sepal length (cm)", y='sepal width (cm)',
mode='markers',
xTitle="Sepal Length (CM)", yTitle="Sepal Width (CM)",
title="Sepal Length vs Sepal Width Relationship")
Below we have created another scatter plot which is exactly the same as the previous scatter chart with only one difference which is that we have colored points according to different flower types. We have used the figure()
method to create a chart this time which has the same parameters as iplot()
. We have passed the column name to the categories
parameter in order to color points on a scatter chart according to their flower type. We also have overridden the default theme from pearl
to white
by setting the theme
parameter.
iris_df.figure(kind="scatter",
x="sepal length (cm)", y='sepal width (cm)',
mode='markers',categories="FlowerType",
theme="white",
xTitle="Sepal Length (CM)", yTitle="Sepal Width (CM)",
title="Sepal Length vs Sepal Width Relationship Color-encoded by Flower Type")
Below we have again created the same chart as the first scatter chart but have added the regression line to the data as well by setting bestfit
parameter to True
. We have also changed the point type in a scatter chart.
iris_df.iplot(kind="scatter",
x="sepal length (cm)", y='sepal width (cm)',
mode='markers',
colors="tomato", size=8, symbol="circle-open-dot",
bestfit=True, bestfit_colors=["dodgerblue"],
xTitle="Sepal Length (CM)", yTitle="Sepal Width (CM)",
title="Sepal Length vs Sepal Width Relationship along with Best Fit Line")
The second chart type that we'll introduce is a bar chart.
We'll first create a dataframe that has average ingredients per wine type. We can call groupby()
method on the wine dataframe to group records according to WineType
and then take the mean of that records to get the average of each ingredient per wine type. We have taken out two columns from the dataset because both have very high values which can skew our charts.
avg_wine_df = wine_df.groupby(by=["WineType"]).mean()
avg_wine_df = avg_wine_df.drop(columns=["magnesium", "proline"])
avg_wine_df
Below we have created our first bar chart by setting kind
parameter to bar
. We have passed alcohol
as y
value in order to plot bar chart of average alcohol used per wine type. We have also overridden default pearl
theme to solar
theme.
avg_wine_df.iplot(kind="bar", y="alcohol",
colors=["dodgerblue"],
bargap=0.5,
dimensions=(500,500),
theme="solar",
xTitle="Wine Type", yTitle="Avg. Alcohol", title="Average Alcohol Per Wine Type")
We have again created same chart as previous step but this time we have laid out bars horizontally. All other parameters are same as previous step. We can change bars from vertical to horizontal by setting orientation
parameter to h
.
avg_wine_df.iplot(kind="bar", y="alcohol",
yTitle="Wine Type", xTitle="Avg. Alcohol", title="Average Alcohol Per Wine Type",
colors=["tomato"], bargap=0.5,
sortbars=True,
dimensions=(500,400),
theme="polar",
orientation="h")
Below we have created side by side bar chart by directly calling iplot()
on the whole dataframe. We have set sortbars
parameter to True
in order to sort bars from the highest quantity to the lowest. We have also overridden the default chart theme from pearl
to ggplot
.
avg_wine_df.iplot(kind="bar",
sortbars=True,
yTitle="Wine Type", xTitle="Avg. Alcohol", title="Average Ingredients Per Wine Type",
theme="ggplot"
)
We can create a stacked bar chart easily by setting barmode
parameter to stack
. Below we have created a stacked bar chart to show the average distribution of ingredients per wine type.
avg_wine_df.iplot(kind="bar",
barmode="stack",
yTitle="Wine Type", xTitle="Avg. Alcohol", title="Average Ingredients Per Wine Type",
opacity=1.0,
)
We can create an individual bar chart for columns of the dataframe by setting the subplots
parameter to True
. It'll create a different bar charts for each column of the dataframe. We have set the keys
parameter to list of columns to use from the dataframe so that bar charts will be created for these 4 columns. We can pass a list of columns to use from the dataframe as a list to the keys
parameter.
avg_wine_df.iplot(kind="bar",
subplots=True,
sortbars=True,
keys = ["ash", "total_phenols", "hue", "malic_acid"],
yTitle="Wine Type", xTitle="Avg. Alcohol", title="Average Ingredients Per Wine Type",
theme="henanigans"
)
The third chart type that we'll introduce is a line chart. We can easily create a line chart by just calling iplot()
method on the dataframe and giving which column to use for the x and y-axis. If we don't give value for the x-axis then it'll use the index of the dataframe as the x-axis. In our case, the index of the dataframe is the date for prices. We have plotted below the line chart of Open
price over the whole period.
apple_df.iplot(y="Open",
xTitle="Date", yTitle="Price ($)", title="Open Price From Apr,2019 - Mar,2020")
We can plot more than one line on the chart by passing a list of column names from the dataframe as a list to the y
parameter and it'll add one line per column to the chart.
apple_df.iplot(y=["Open", "High", "Low", "Close"],
width=2.0,
xTitle="Date", yTitle="Price ($)", title="OHLC Price From Apr,2019 - Mar,2020")
Below we have created a line chart with two-line where 2nd line has a separate y-axis on the right side. We can set the secondary parameter by giving the column name to the secondary_y
parameter and the axis title for the secondary y-axis to secondary_y_title
. This can be very useful when the quantities which we want to plot are on a different scale
.
apple_df.iplot(y="Open",
secondary_y="Close", secondary_y_title="Close Price ($)",
xTitle="Date", yTitle="Open Price ($)", title="Open Price From Apr,2019 - Mar,2020")
Below we have again created a line chart but this time using mode
as lines+markers
which will add both line and points to the chart. We have also modified the default gridcolor
to black from gray.
apple_df.iplot(y="Open",
mode="lines+markers", size=4.0,
colors=["dodgerblue"],
gridcolor="black",
xTitle="Date", yTitle="Price ($)", title="Open Price From Apr,2019 - Mar,2020")
Below we have given another example of using subplots.
apple_df.iplot(y=["Open", "High", "Low", "Close"],
width=2.0,
subplots=True,
xTitle="Date", yTitle="Price ($)", title="OHLC Price From Apr,2019 - Mar,2020")
The fourth chart type that we'll introduce is area charts. We can easily create an area chart using the same parameters as that of a line chart with only one change. We need to set the fill
parameter to True
in order to create an area chart. Below we have created an area chart covering the area under the open price of Apple stock.
apple_df.iplot(y="Open",
fill=True,
xTitle="Date", yTitle="Price ($)", title="Open Price From Apr,2019 - Mar,2020",
)
apple_df.iplot(
keys=["Open", "High", "Low", "Close"],
subplots=True,
fill=True,
xTitle="Date", yTitle="Price ($)", title="OHLC Price From Apr,2019 - Mar,2020")
The fifth chart type is pie charts. We'll be creating a new dataframe from the wine dataframe which has information about the count of samples per each wine category. We can create this dataframe by grouping the original wine dataframe based on wine type and then calling the count()
method on it to get a count of samples per wine type.
wine_cnt = wine_df.groupby(by=["WineType"]).count()[["alcohol"]].rename(columns={"alcohol":"Count"}).reset_index()
wine_cnt
We can easily create a pie chart by calling iplot()
method on the dataframe passing it kind
parameter as pie
. We also need to pass which column to use for labels and which column to use for values. Below we have created a pie chart from the wine type count dataframe created in the previous cell. We have also modified how labels should be displayed by setting textinfo
parameter.
wine_cnt.iplot(kind="pie",
labels="WineType",
values="Count",
textinfo='percent+label', hole=.4,
)
Below we have created the same pie chart as the previous step with two minor changes. We have removed the internal circle and we have pulled out the class_2
wine type patch a little bit out to highlight it. We need to pass the pull
parameter list of floats which is the same size as labels and only one float should be greater than 0.
wine_cnt.reset_index().iplot(kind="pie",
labels="WineType",
values="Count",
textinfo='percent+label',
pull=[0, 0, 0.1],
)
The sixth chart type that we'll introduce is the histogram. We can easily create a histogram by setting the kind
parameter to hist
. We have passed the column name as the keys
parameter in order to create a histogram of that column.
wine_df.iplot(kind="hist",
bins=50, colors=["red"],
keys=["alcohol"],
dimensions=(600, 400),
title="Alcohol Histogram")
Below we have created another example of the histogram where we are plotting a histogram of three quantities.
wine_df.iplot(kind="hist",
bins=50, colors=["red", "blue", "green", "black"],
keys=["total_phenols", "flavanoids", "ash"],
title="Ash, Total Phenols & Flavanoids Histogram")
The seventh chart type that we'll introduce is the box plots. We can easily create a box plot from the pandas dataframe by setting the kind
parameter to box
in iplot()
method. We have below created a box plot of four quantities of iris flowers. We have passed column names of four features of the iris flower to the keys
parameter as a list.
iris_df.iplot(kind="box",
keys=iris.feature_names, boxpoints="outliers",
xTitle="Flower Features", title="IRIS Flower Features Box Plot")
The eight chart type is heatmaps. We'll first create a correlation dataframe for the wine dataset by calling the corr()
method on it.
wine_corr_df = wine_df.corr()
wine_corr_df
Once we have the correlation dataframe ready, we can easily create a heatmap by calling iplot()
method on it and passing the kind
parameter value as heatmap
. We have also provided colormap as Blues
. We can also set chart dimensions by passing width and height as tuple to the dimensions
parameter.
wine_corr_df.iplot(kind="heatmap",
colorscale="Blues",
dimensions=(900,900))
Below we have created another heatmap of the iris flowers dataset showing a correlation between various features.
iris_df.corr().iplot(kind="heatmap",
colorscale="Reds",
dimensions=(500,500))
The ninth chart type that we'll introduce is the candlestick chart. We can easily create a candlestick chart from the dataframe by calling iplot()
method on it and passing candle
as value to the kind
parameter. We also need to have Open, High, Low, and Close columns in the dataframe in that order. Below we have created a candlestick chart of whole apple OHLC data.
apple_df.iplot(kind="candle", keys=["Open", "High", "Low", "Close"])
Below we have created another example of a candlestick chart where we are plotting candles for only Apr-2019 data.
apple_df["2019-04"].iplot(kind="candle",
keys=["Open", "High", "Low", "Close"],
)
We can create an OHLC chart exactly the same way as a candlestick chart with the only difference which is we need to set the kind
parameter as ohlc
.
apple_df["2019-04"].iplot(kind="ohlc",
keys=["Open", "High", "Low", "Close"])
The tenth chart type that we'll plot using cufflinks is a bubble chart. The bubble chart can be used to represent three dimensions of data. The two-dimension are used to create a scatter plot and the third dimension is used to decide the sizes of points in the scatter plot.
Below we have created a bubble chart on the iris dataframe's first 50 samples by setting the kind
parameter to bubble
. We have used sepal length and sepal width as x and y dimension and petal width as size dimension.
iris_df[:50].iplot(kind="bubble", x="sepal length (cm)", y="sepal width (cm)", size="petal width (cm)",
colors=["tomato"],
xTitle="Sepal Length (CM)", yTitle="Sepal Width (CM)",
title="Sepal Length vs Sepal Width Bubble Chart")
We can also create a 3D bubble chart that can be used to represent 4 dimensions of data. The first three dimensions of data will be used to create a 3D scatter chart and 4th dimension will be used to decide the size of the point (bubble) in a scatter plot.
We are creating a 3D bubble chart by setting the kind
parameter to bubble3d
in iplot()
method. We have used sepal length, sepal width and petal width to create a 3D scatter chart and petal length to decide the sizes of points in a 3D scatter chart. We have also color encoded points in scatter plot based on flower types.
iris_df.iplot(kind="bubble3d",
x="sepal length (cm)", y="sepal width (cm)", z="petal width (cm)",
size="petal length (cm)",
colors=["dodgerblue", "lime", "tomato"], categories="FlowerType",
xTitle="Sepal Length (CM)", yTitle="Sepal Width (CM)", zTitle="Petal Width (CM)",
title="Sepal Length vs Sepal Width vs Petal Width Bubble 3D Chart")
We can create 3d scatter charts as well as using cufflinks. We need to set kind
parameter to scatter3d
in iplot()
method. We are creating 3d scatter chart of sepal length, sepal width, and petal width. We even have color encoded points in 3d scatter chart according to flower type.
iris_df.iplot(kind="scatter3d",
x="sepal length (cm)", y="sepal width (cm)", z="petal width (cm)",
size=5,
colors=["dodgerblue", "lime", "tomato"], categories="FlowerType",
xTitle="Sepal Length (CM)", yTitle="Sepal Width (CM)", zTitle="Petal Width (CM)",
title="Sepal Length vs Sepal Width vs Petal Width Scatter Chart")
The thirteenth chart type that we'll introduce is spread chart. Below we are creating a spread chart of high and low prices by setting the kind
parameter to spread
.
apple_df.iplot(kind="spread", keys=["High", "Low"],
title="High and Low Price Spread Chart")
The fourteenth and last chart type that we'll introduce is the ratio chart. We can create a ratio chart by setting the kind
parameter to ratio
. We are creating a ratio chart of open and close prices of apple OHLC data.
apple_df.iplot(kind="ratio", keys=["Open", "Close",],
title="Open & Close Price Ratio Chart")
This ends our small tutorial explaining how to use cufflinks to create plotly charts directly from the pandas dataframe. Please feel free to let us know your views in the comments section.
If you want to