Share @ LinkedIn Facebook  parallel-coodinates-chart, matplotlib, plotly
How to Plot Parallel Coordinates Plot in Python [Matplotlib & Plotly]?

How to Plot Parallel Coordinates Plot in Python [Matplotlib & Plotly]?

Parallel coordinates charts are commonly used to visualize and analyze high dimensional multivariate data. It represents each data sample as polyline connecting parallel lines where each parallel line represents an attribute of that data sample. If we take an example of IRIS flowers dataset which has 4 dimensions (petal width & length, sepal width, and length) recorded then there will be four parallel lines drawn vertically in 2d plane and each sample of the flower will be drawn as polyline connecting points on these four parallel lines according to that samples measurements. It’s a common practice to scale data in order to get all data variables in the same range for better understanding. The scaling let us analyze data variables which are on totally different scales.

The parallel coordinates chart can become very cluttered if there are many data points to be plotted. We can highlight only a few points in visualization to avoid cluttering. We'll be covering plotting parallel coordinates chart in python using pandas (matplotlib) and plotly. We'll be loading various datasets from scikit-learn in order to explain the plot better.

The radar charts are another alternative for analysis and visualization of multivariate data where parallel lines (axes) are organized radially. If you are interested in learning plotting radar charts in python then we have already covered detailed tutorial - How to Plot Radar Chart in Python? and we suggest that you go through it as well. Andrews’s plot is one more alternative to parallel coordinates plot which is a Fourier transform of parallel coordinates plot.

This ends our small introduction to the parallel coordinates chart. We'll now start by importing necessary libraries.

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import load_iris, load_boston, load_wine
from sklearn.preprocessing import MinMaxScaler

import plotly.express as px
import plotly.graph_objects as go

%matplotlib inline

We'll be first loading 3 datasets available from scikit-learn.

  • IRIS Flowers Dataset It has dimension measured for 3 different IRIS flower types.
  • Wine Dataset It has information about various ingredients of wine like alcohol, malic acid, ash, magnesium, etc for three different wine categories.
  • Boston Housing Price Dataset - It has information about various attributes of the house and surrounding area for Boston as well as house prices.

All datasets are available from the sklearn.datasets module. We'll be loading them and keeping them as a dataframe for using them later for parallel coordinates plot.

We'll be plotting charts with scaled data as well in order to compare it to non-scaled data. We have used scikit-learn MinMaxScaler scaler to scale data so that each column’s data gets into range [0-1]. Once data is into the same range [0-1] for all quantitative variables then it becomes easy to see its impact. We'll be scaling iris, Boston, and wine datasets using MinMaxScaler.

In [2]:
iris = load_iris()
iris_data = np.hstack((iris.data, iris.target.reshape(-1,1)))

iris_df = pd.DataFrame(data=iris_data, columns=iris.feature_names+ ["FlowerType"])
iris_df.head()
Out[2]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) FlowerType
0 5.1 3.5 1.4 0.2 0.0
1 4.9 3.0 1.4 0.2 0.0
2 4.7 3.2 1.3 0.2 0.0
3 4.6 3.1 1.5 0.2 0.0
4 5.0 3.6 1.4 0.2 0.0
In [3]:
iris_data_scaled = MinMaxScaler().fit_transform(iris.data)
iris_data_scaled = np.hstack((iris_data_scaled, iris.target.reshape(-1,1)))

iris_scaled_df = pd.DataFrame(data=iris_data_scaled, columns=iris.feature_names+ ["FlowerType"])
iris_scaled_df.head()
Out[3]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) FlowerType
0 0.222222 0.625000 0.067797 0.041667 0.0
1 0.166667 0.416667 0.067797 0.041667 0.0
2 0.111111 0.500000 0.050847 0.041667 0.0
3 0.083333 0.458333 0.084746 0.041667 0.0
4 0.194444 0.666667 0.067797 0.041667 0.0
In [4]:
wine = load_wine()
wine_data = np.hstack((wine.data, wine.target.reshape(-1,1)))

wine_df = pd.DataFrame(data=wine_data, columns=wine.feature_names+ ["WineCategory"])
wine_df.head()
Out[4]:
alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline WineCategory
0 14.23 1.71 2.43 15.6 127.0 2.80 3.06 0.28 2.29 5.64 1.04 3.92 1065.0 0.0
1 13.20 1.78 2.14 11.2 100.0 2.65 2.76 0.26 1.28 4.38 1.05 3.40 1050.0 0.0
2 13.16 2.36 2.67 18.6 101.0 2.80 3.24 0.30 2.81 5.68 1.03 3.17 1185.0 0.0
3 14.37 1.95 2.50 16.8 113.0 3.85 3.49 0.24 2.18 7.80 0.86 3.45 1480.0 0.0
4 13.24 2.59 2.87 21.0 118.0 2.80 2.69 0.39 1.82 4.32 1.04 2.93 735.0 0.0
In [5]:
wine_data_scaled = MinMaxScaler().fit_transform(wine.data)
wine_data_scaled = np.hstack((wine_data_scaled, wine.target.reshape(-1,1)))

wine_scaled_df = pd.DataFrame(data=wine_data_scaled, columns=wine.feature_names+ ["WineCategory"])
wine_scaled_df.head()
Out[5]:
alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline WineCategory
0 0.842105 0.191700 0.572193 0.257732 0.619565 0.627586 0.573840 0.283019 0.593060 0.372014 0.455285 0.970696 0.561341 0.0
1 0.571053 0.205534 0.417112 0.030928 0.326087 0.575862 0.510549 0.245283 0.274448 0.264505 0.463415 0.780220 0.550642 0.0
2 0.560526 0.320158 0.700535 0.412371 0.336957 0.627586 0.611814 0.320755 0.757098 0.375427 0.447154 0.695971 0.646933 0.0
3 0.878947 0.239130 0.609626 0.319588 0.467391 0.989655 0.664557 0.207547 0.558360 0.556314 0.308943 0.798535 0.857347 0.0
4 0.581579 0.365613 0.807487 0.536082 0.521739 0.627586 0.495781 0.490566 0.444795 0.259386 0.455285 0.608059 0.325963 0.0
In [6]:
boston = load_boston()
boston_data = np.hstack((boston.data, boston.target.reshape(-1,1)))

boston_df = pd.DataFrame(data=boston_data, columns=boston.feature_names.tolist()+ ["HousePrice"])
boston_df.head()
Out[6]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT HousePrice
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2
In [7]:
boston_data_scaled = MinMaxScaler().fit_transform(boston.data)
boston_data_scaled = np.hstack((boston_data_scaled, boston.target.reshape(-1,1)))

boston_scaled_df = pd.DataFrame(data=boston_data_scaled, columns=boston.feature_names.tolist()+ ["HousePrice"])
boston_scaled_df.head()
Out[7]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT HousePrice
0 0.000000 0.18 0.067815 0.0 0.314815 0.577505 0.641607 0.269203 0.000000 0.208015 0.287234 1.000000 0.089680 24.0
1 0.000236 0.00 0.242302 0.0 0.172840 0.547998 0.782698 0.348962 0.043478 0.104962 0.553191 1.000000 0.204470 21.6
2 0.000236 0.00 0.242302 0.0 0.172840 0.694386 0.599382 0.348962 0.043478 0.104962 0.553191 0.989737 0.063466 34.7
3 0.000293 0.00 0.063050 0.0 0.150206 0.658555 0.441813 0.448545 0.086957 0.066794 0.648936 0.994276 0.033389 33.4
4 0.000705 0.00 0.063050 0.0 0.150206 0.687105 0.528321 0.448545 0.086957 0.066794 0.648936 1.000000 0.099338 36.2

We'll be explaining two ways to plot a parallel coordinates chart.

  • Pandas [Matplotlib] - First we'll be explaining the usage of pandas for plotting parallel coordinates chart. Pandas provide ready-made function as a part of its visualization module for plotting parallel coordinates charts. Pandas use matplotlib behind the scene for plotting hence all charts will be static.
  • Plotly - Plotly provides two ways to create parallel coordinates charts. Plotly charts are interactive.
    • Plotly Express
    • Plotly Graph Objects

Pandas [Matplotlib]

The pandas module named plotting provides ready to use method named parallel_coordinates which can be used to plot parallel coordinates charts. We need to provide it dataframe which has all data and categorical column name according to which various category samples will be colored.

It also has a parameter named color which accepts a list of color names to use for categories of column provided.

IRIS Parallel Coordinates Charts

Below we have provided iris dataframe as input and FlowerType column for coloring samples according to flower category. We also have provided color names for categories.

In [ ]:
pd.plotting.parallel_coordinates(iris_df, "FlowerType", color=["lime", "tomato","dodgerblue"]);

How to Plot Parallel Coordinates Plot in Python?

Below we are again plotting parallel coordinates chart for iris data but with scaled data this time. We can see that this time we are able to make differences in samples clearly due to scaled data. It's advisable to scale data before plotting a parallel coordinates chart. As pandas use matplotlib behind the scene, we can decorate charts using matplotlib methods.

In [ ]:
with plt.style.context(("ggplot", "seaborn")):
    fig = plt.figure(figsize=(10,6))
    pd.plotting.parallel_coordinates(iris_scaled_df, "FlowerType",
                                     color=["lime", "tomato","dodgerblue"],
                                     alpha=0.2)

    plt.title("IRIS Flowers Parallel Coorinates Plot [Scaled Data]")

How to Plot Parallel Coordinates Plot in Python?

Below we are again plotting parallel coordinates chart using iris scaled data, but this time we have changed column order by providing a list of columns as input to cols parameter of parallel_coordinates method. We can also ignore columns of dataframe if we don't want to include them in the chart by providing a list of column names to be included in the chart to cols parameter.

In [ ]:
with plt.style.context(("ggplot", "seaborn")):
    fig = plt.figure(figsize=(10,6))
    pd.plotting.parallel_coordinates(iris_scaled_df, "FlowerType",
                                     cols= ["petal width (cm)", "sepal width (cm)", "petal length (cm)", "sepal length (cm)"],
                                     color=["lime", "tomato","dodgerblue"],
                                     alpha=0.2,
                                     axvlines_kwds={"color":"red"})
    plt.title("IRIS Flowers Parallel Coorinates Plot [Scaled Data]")

How to Plot Parallel Coordinates Plot in Python?

Wine Dataset Parallel Coordinates Chart

Below we have plotted parallel coordinates chart for wine scaled dataframe. We can see that few columns of data clearly show different values based on categories whereas for a few others its not clear.

In [ ]:
with plt.style.context(("ggplot")):
    fig = plt.figure(figsize=(15,8))
    pd.plotting.parallel_coordinates(wine_scaled_df, "WineCategory",
                                     color=["lime", "tomato","dodgerblue"],
                                     alpha=0.3,
                                     axvlines_kwds={"color":"red"})
    plt.xticks(rotation=90)
    plt.title("Wine Categories Parallel Coorinates Plot [Scaled Data]")

How to Plot Parallel Coordinates Plot in Python?

In [ ]:
with plt.style.context(("ggplot", "seaborn")):
    fig = plt.figure(figsize=(15,8))
    pd.plotting.parallel_coordinates(wine_scaled_df, "WineCategory",
                                     cols=["total_phenols", "flavanoids", "nonflavanoid_phenols", "proanthocyanins", "color_intensity", "hue", "od280/od315_of_diluted_wines", "proline"],
                                     color=["lime", "tomato","dodgerblue"],
                                     alpha=0.3,
                                     axvlines_kwds={"color":"red"})
    plt.xticks(rotation=90)
    plt.title("Wine Categories Parallel Coorinates Plot [Scaled Data]")

How to Plot Parallel Coordinates Plot in Python?

Plotly

Plotly is a very famous interactive data visualization library. It provided two modules named plotly.express and plotly.graph_objects for plotting parallel coordinates chart.

Plotly Express

The plotly.express module has a method named parallel_coordinates which accepts dataframe containing data and categorical column name which to use to color samples of data according to categories.

IRIS Dataset Parallel Coordinates Chart

Below we are creating a parallel coordinates chart for iris data. We have provided the FlowerType column to color attribute in order to color samples according to iris flower types.

In [ ]:
cols = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

fig = px.parallel_coordinates(iris_df, color="FlowerType", dimensions=cols,
                              title="IRIS Flowers Parallel Coorinates Plot")
fig.show()

How to Plot Parallel Coordinates Plot in Python?

Wine Data Parallel Coordinates Charts

Below we are creating a parallel coordinates chart for the wine dataset. We are providing column names as input to the dimensions parameter of the method.

In [ ]:
cols = ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids',
'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

fig = px.parallel_coordinates(wine_df, color="WineCategory", dimensions=cols,
                              color_continuous_scale=px.colors.diverging.RdYlBu, width=1000,
                              title="Wine Parallel Coorinates Plot")
fig.show()

How to Plot Parallel Coordinates Plot in Python?

Below we are again creating a parallel coordinates chart for the wine dataset but this time with the last few columns which has samples clearly showing differences based on the wine category. We have also changed the colors of the chart by setting the color_continuous_scale attribute.

In [ ]:
cols = ["total_phenols", "flavanoids", "nonflavanoid_phenols", "proanthocyanins", "color_intensity",
        "hue", "od280/od315_of_diluted_wines", "proline"]

fig = px.parallel_coordinates(wine_df, color="WineCategory", dimensions=cols,
                              color_continuous_scale=px.colors.diverging.Tealrose,
                              title="Wine Parallel Coorinates Plot")
fig.show()

How to Plot Parallel Coordinates Plot in Python?

Boston House Price Dataset Parallel Coordinates Chart

Below we are plotting the parallel coordinates chart for the Boston dataset. We are using HousePrice as an attribute to color samples. We can analyze which attributes are contributing to high house prices.

In [ ]:
cols = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT',]
fig = px.parallel_coordinates(boston_df, color="HousePrice", dimensions=cols,
                              color_continuous_scale=px.colors.sequential.Blues,
                              title="Boston House Price Coordinates Plot")
fig.show()

How to Plot Parallel Coordinates Plot in Python?

Plotly Graph Objects

The second way to generate parallel coordinates charts in plotly is by using the graph_objects module. It provides a method named Parcoords which can be used for plotting parallel coordinates charts. We need to provide values for two important parameters of Parcoords in order to generate the chart:

  • line - It accepts dictionary where we pass column name to be used to color samples of data. We can also specify a color to use for categories as a part of this dictionary.
  • dimensions - It accepts a list of dictionaries as input where each dictionary represents one dimension of data. We need to provide a list of values to be used and label for that values in the dictionary. We can also provide range for values. The contraintrange attribute let us highlight only that range of values from total values.

IRIS Dataset Parallel Coordinates Chart

Below we are plotting parallel coordinates chart for iris dataset.

In [ ]:
fig = go.Figure(data=
    go.Parcoords(
        line = dict(color = iris_df['FlowerType'],
                   colorscale = [[0,'lime'],[0.5,'tomato'],[1,'dodgerblue']]),
        dimensions = list([
            dict(range = [0,8],
                constraintrange = [4,8],
                label = 'Sepal Length', values = iris_df['sepal length (cm)']),
            dict(range = [0,8],
                label = 'Sepal Width', values = iris_df['sepal width (cm)']),
            dict(range = [0,8],
                label = 'Petal Length', values = iris_df['petal length (cm)']),
            dict(range = [0,8],
                label = 'Petal Width', values = iris_df['petal width (cm)'])
        ])
    )
)

fig.update_layout(
    title="IRIS Flowers Parallel Coorinates Plot"
)

fig.show()

How to Plot Parallel Coordinates Plot in Python?

In [ ]:
cols = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)',]

fig = go.Figure(data=
    go.Parcoords(
        line = dict(color = iris_df['FlowerType'],
                   colorscale = [[0,'lime'],[0.5,'tomato'],[1,'dodgerblue']]),
        dimensions = [dict(label=col, values=iris_df[col]) for col in cols]
    )
)

fig.update_layout(
    title="IRIS Flowers Parallel Coorinates Plot"
)

fig.show()

How to Plot Parallel Coordinates Plot in Python?

Wine Dataset Parallel Coordinates Chart

Below we are plotting parallel coordinates chart for wine dataset.

In [ ]:
cols = ["total_phenols", "flavanoids", "nonflavanoid_phenols", "proanthocyanins", "color_intensity",
        "hue", "od280/od315_of_diluted_wines", "proline"]


fig = go.Figure(data=
    go.Parcoords(
        line = dict(color = wine_df['WineCategory'],
                   colorscale = [[0,'lime'],[0.5,'tomato'],[1,'dodgerblue']]),
        dimensions = [dict(label=col, values=wine_df[col]) for col in cols]
    )
)

fig.update_layout(
    title="Wine Parallel Coorinates Plot"
)

fig.show()

How to Plot Parallel Coordinates Plot in Python?

Boston House Dataset Parallel Coordinates Chart

Below we are plotting the parallel coordinates chart for the Boston dataset.

In [ ]:
cols = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT',]

fig = go.Figure(data=
    go.Parcoords(
        line = dict(color = boston_df['HousePrice'],
                   colorscale = px.colors.sequential.Oranges),
        dimensions = [dict(label=col, values=boston_df[col]) for col in cols]
    )
)

fig.update_layout(
    title="Boston House Price Coordinates Plot"
)

fig.show()

How to Plot Parallel Coordinates Plot in Python?

Below we are again plotting parallel coordinates chart for Boston house price dataset but this time for houses with prices in the range of 25,000-50,000 only by setting cmin and cmax parameters of dictionary given to line parameter.

In [ ]:
cols = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT',]

fig = go.Figure(data=
    go.Parcoords(
        line = dict(color = boston_df['HousePrice'],
                   colorscale = px.colors.sequential.Blues,
                   cmin=25, cmax=50),
        dimensions = [dict(label=col, values=boston_df[col]) for col in cols]
    )
)

fig.update_layout(
    title="Boston House Price Coordinates Plot"
)

fig.show()

How to Plot Parallel Coordinates Plot in Python?

This ends our small tutorial on parallel coordinates charts plotting using python. Please feel free to let us know your views in the comments section.

References



Sunny Solanki  Sunny Solanki