Updated On : Jul-07,2020  parallel-coodinates-char…

# How to Plot Parallel Coordinates Plot in Python [Matplotlib & Plotly]?¶

Parallel coordinates charts are commonly used to visualize and analyze high dimensional multivariate data. It represents each data sample as polyline connecting parallel lines where each parallel line represents an attribute of that data sample. If we take an example of IRIS flowers dataset which has 4 dimensions (petal width & length, sepal width, and length) recorded then there will be four parallel lines drawn vertically in 2d plane and each sample of the flower will be drawn as polyline connecting points on these four parallel lines according to that samples measurements. It’s a common practice to scale data in order to get all data variables in the same range for better understanding. The scaling let us analyze data variables which are on totally different scales.

The parallel coordinates chart can become very cluttered if there are many data points to be plotted. We can highlight only a few points in visualization to avoid cluttering. We'll be covering plotting parallel coordinates chart in python using pandas (matplotlib) and plotly. We'll be loading various datasets from scikit-learn in order to explain the plot better.

The radar charts are another alternative for analysis and visualization of multivariate data where parallel lines (axes) are organized radially. If you are interested in learning plotting radar charts in python then we have already covered detailed tutorial - How to Plot Radar Chart in Python? and we suggest that you go through it as well. Andrews’s plot is one more alternative to parallel coordinates plot which is a Fourier transform of parallel coordinates plot.

This ends our small introduction to the parallel coordinates chart. We'll now start by importing necessary libraries.

In [1]:
```import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler

import plotly.express as px
import plotly.graph_objects as go

%matplotlib inline
```

We'll be first loading 3 datasets available from scikit-learn.

• IRIS Flowers Dataset It has dimension measured for 3 different IRIS flower types.
• Wine Dataset It has information about various ingredients of wine like alcohol, malic acid, ash, magnesium, etc for three different wine categories.
• Boston Housing Price Dataset - It has information about various attributes of the house and surrounding area for Boston as well as house prices.

All datasets are available from the `sklearn.datasets` module. We'll be loading them and keeping them as a dataframe for using them later for parallel coordinates plot.

We'll be plotting charts with scaled data as well in order to compare it to non-scaled data. We have used scikit-learn `MinMaxScaler` scaler to scale data so that each column’s data gets into range `[0-1]`. Once data is into the same range [0-1] for all quantitative variables then it becomes easy to see its impact. We'll be scaling iris, Boston, and wine datasets using `MinMaxScaler`.

In [2]:
```iris = load_iris()
iris_data = np.hstack((iris.data, iris.target.reshape(-1,1)))

iris_df = pd.DataFrame(data=iris_data, columns=iris.feature_names+ ["FlowerType"])
```
Out[2]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) FlowerType
0 5.1 3.5 1.4 0.2 0.0
1 4.9 3.0 1.4 0.2 0.0
2 4.7 3.2 1.3 0.2 0.0
3 4.6 3.1 1.5 0.2 0.0
4 5.0 3.6 1.4 0.2 0.0
In [3]:
```iris_data_scaled = MinMaxScaler().fit_transform(iris.data)
iris_data_scaled = np.hstack((iris_data_scaled, iris.target.reshape(-1,1)))

iris_scaled_df = pd.DataFrame(data=iris_data_scaled, columns=iris.feature_names+ ["FlowerType"])
```
Out[3]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) FlowerType
0 0.222222 0.625000 0.067797 0.041667 0.0
1 0.166667 0.416667 0.067797 0.041667 0.0
2 0.111111 0.500000 0.050847 0.041667 0.0
3 0.083333 0.458333 0.084746 0.041667 0.0
4 0.194444 0.666667 0.067797 0.041667 0.0
In [4]:
```wine = load_wine()
wine_data = np.hstack((wine.data, wine.target.reshape(-1,1)))

wine_df = pd.DataFrame(data=wine_data, columns=wine.feature_names+ ["WineCategory"])
```
Out[4]:
alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline WineCategory
0 14.23 1.71 2.43 15.6 127.0 2.80 3.06 0.28 2.29 5.64 1.04 3.92 1065.0 0.0
1 13.20 1.78 2.14 11.2 100.0 2.65 2.76 0.26 1.28 4.38 1.05 3.40 1050.0 0.0
2 13.16 2.36 2.67 18.6 101.0 2.80 3.24 0.30 2.81 5.68 1.03 3.17 1185.0 0.0
3 14.37 1.95 2.50 16.8 113.0 3.85 3.49 0.24 2.18 7.80 0.86 3.45 1480.0 0.0
4 13.24 2.59 2.87 21.0 118.0 2.80 2.69 0.39 1.82 4.32 1.04 2.93 735.0 0.0
In [5]:
```wine_data_scaled = MinMaxScaler().fit_transform(wine.data)
wine_data_scaled = np.hstack((wine_data_scaled, wine.target.reshape(-1,1)))

wine_scaled_df = pd.DataFrame(data=wine_data_scaled, columns=wine.feature_names+ ["WineCategory"])
```
Out[5]:
alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline WineCategory
0 0.842105 0.191700 0.572193 0.257732 0.619565 0.627586 0.573840 0.283019 0.593060 0.372014 0.455285 0.970696 0.561341 0.0
1 0.571053 0.205534 0.417112 0.030928 0.326087 0.575862 0.510549 0.245283 0.274448 0.264505 0.463415 0.780220 0.550642 0.0
2 0.560526 0.320158 0.700535 0.412371 0.336957 0.627586 0.611814 0.320755 0.757098 0.375427 0.447154 0.695971 0.646933 0.0
3 0.878947 0.239130 0.609626 0.319588 0.467391 0.989655 0.664557 0.207547 0.558360 0.556314 0.308943 0.798535 0.857347 0.0
4 0.581579 0.365613 0.807487 0.536082 0.521739 0.627586 0.495781 0.490566 0.444795 0.259386 0.455285 0.608059 0.325963 0.0
In [6]:
```boston = load_boston()
boston_data = np.hstack((boston.data, boston.target.reshape(-1,1)))

boston_df = pd.DataFrame(data=boston_data, columns=boston.feature_names.tolist()+ ["HousePrice"])
```
Out[6]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT HousePrice
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2
In [7]:
```boston_data_scaled = MinMaxScaler().fit_transform(boston.data)
boston_data_scaled = np.hstack((boston_data_scaled, boston.target.reshape(-1,1)))

boston_scaled_df = pd.DataFrame(data=boston_data_scaled, columns=boston.feature_names.tolist()+ ["HousePrice"])
```
Out[7]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT HousePrice
0 0.000000 0.18 0.067815 0.0 0.314815 0.577505 0.641607 0.269203 0.000000 0.208015 0.287234 1.000000 0.089680 24.0
1 0.000236 0.00 0.242302 0.0 0.172840 0.547998 0.782698 0.348962 0.043478 0.104962 0.553191 1.000000 0.204470 21.6
2 0.000236 0.00 0.242302 0.0 0.172840 0.694386 0.599382 0.348962 0.043478 0.104962 0.553191 0.989737 0.063466 34.7
3 0.000293 0.00 0.063050 0.0 0.150206 0.658555 0.441813 0.448545 0.086957 0.066794 0.648936 0.994276 0.033389 33.4
4 0.000705 0.00 0.063050 0.0 0.150206 0.687105 0.528321 0.448545 0.086957 0.066794 0.648936 1.000000 0.099338 36.2

We'll be explaining two ways to plot a parallel coordinates chart.

• Pandas [Matplotlib] - First we'll be explaining the usage of pandas for plotting parallel coordinates chart. Pandas provide ready-made function as a part of its visualization module for plotting parallel coordinates charts. Pandas use matplotlib behind the scene for plotting hence all charts will be static.
• Plotly - Plotly provides two ways to create parallel coordinates charts. Plotly charts are interactive.
• Plotly Express
• Plotly Graph Objects

## Pandas [Matplotlib]¶

The pandas module named `plotting` provides ready to use method named `parallel_coordinates` which can be used to plot parallel coordinates charts. We need to provide it dataframe which has all data and categorical column name according to which various category samples will be colored.

It also has a parameter named `color` which accepts a list of color names to use for categories of column provided.

#### IRIS Parallel Coordinates Charts¶

Below we have provided iris dataframe as input and `FlowerType` column for coloring samples according to flower category. We also have provided color names for categories.

In [ ]:
```pd.plotting.parallel_coordinates(iris_df, "FlowerType", color=["lime", "tomato","dodgerblue"]);
```

Below we are again plotting parallel coordinates chart for iris data but with scaled data this time. We can see that this time we are able to make differences in samples clearly due to scaled data. It's advisable to scale data before plotting a parallel coordinates chart. As pandas use `matplotlib` behind the scene, we can decorate charts using matplotlib methods.

In [ ]:
```with plt.style.context(("ggplot", "seaborn")):
fig = plt.figure(figsize=(10,6))
pd.plotting.parallel_coordinates(iris_scaled_df, "FlowerType",
color=["lime", "tomato","dodgerblue"],
alpha=0.2)

plt.title("IRIS Flowers Parallel Coorinates Plot [Scaled Data]")
```

Below we are again plotting parallel coordinates chart using iris scaled data, but this time we have changed column order by providing a list of columns as input to `cols` parameter of `parallel_coordinates` method. We can also ignore columns of dataframe if we don't want to include them in the chart by providing a list of column names to be included in the chart to `cols` parameter.

In [ ]:
```with plt.style.context(("ggplot", "seaborn")):
fig = plt.figure(figsize=(10,6))
pd.plotting.parallel_coordinates(iris_scaled_df, "FlowerType",
cols= ["petal width (cm)", "sepal width (cm)", "petal length (cm)", "sepal length (cm)"],
color=["lime", "tomato","dodgerblue"],
alpha=0.2,
axvlines_kwds={"color":"red"})
plt.title("IRIS Flowers Parallel Coorinates Plot [Scaled Data]")
```

#### Wine Dataset Parallel Coordinates Chart¶

Below we have plotted parallel coordinates chart for wine scaled dataframe. We can see that few columns of data clearly show different values based on categories whereas for a few others its not clear.

In [ ]:
```with plt.style.context(("ggplot")):
fig = plt.figure(figsize=(15,8))
pd.plotting.parallel_coordinates(wine_scaled_df, "WineCategory",
color=["lime", "tomato","dodgerblue"],
alpha=0.3,
axvlines_kwds={"color":"red"})
plt.xticks(rotation=90)
plt.title("Wine Categories Parallel Coorinates Plot [Scaled Data]")
```

In [ ]:
```with plt.style.context(("ggplot", "seaborn")):
fig = plt.figure(figsize=(15,8))
pd.plotting.parallel_coordinates(wine_scaled_df, "WineCategory",
cols=["total_phenols", "flavanoids", "nonflavanoid_phenols", "proanthocyanins", "color_intensity", "hue", "od280/od315_of_diluted_wines", "proline"],
color=["lime", "tomato","dodgerblue"],
alpha=0.3,
axvlines_kwds={"color":"red"})
plt.xticks(rotation=90)
plt.title("Wine Categories Parallel Coorinates Plot [Scaled Data]")
```

## Plotly¶

Plotly is a very famous interactive data visualization library. It provided two modules named `plotly.express` and `plotly.graph_objects` for plotting parallel coordinates chart.

### Plotly Express¶

The `plotly.express` module has a method named `parallel_coordinates` which accepts dataframe containing data and categorical column name which to use to color samples of data according to categories.

#### IRIS Dataset Parallel Coordinates Chart¶

Below we are creating a parallel coordinates chart for iris data. We have provided the `FlowerType` column to color attribute in order to color samples according to iris flower types.

In [ ]:
```cols = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

fig = px.parallel_coordinates(iris_df, color="FlowerType", dimensions=cols,
title="IRIS Flowers Parallel Coorinates Plot")
fig.show()
```

#### Wine Data Parallel Coordinates Charts¶

Below we are creating a parallel coordinates chart for the wine dataset. We are providing column names as input to the `dimensions` parameter of the method.

In [ ]:
```cols = ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids',
'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

fig = px.parallel_coordinates(wine_df, color="WineCategory", dimensions=cols,
color_continuous_scale=px.colors.diverging.RdYlBu, width=1000,
title="Wine Parallel Coorinates Plot")
fig.show()
```

Below we are again creating a parallel coordinates chart for the wine dataset but this time with the last few columns which has samples clearly showing differences based on the wine category. We have also changed the colors of the chart by setting the `color_continuous_scale` attribute.

In [ ]:
```cols = ["total_phenols", "flavanoids", "nonflavanoid_phenols", "proanthocyanins", "color_intensity",
"hue", "od280/od315_of_diluted_wines", "proline"]

fig = px.parallel_coordinates(wine_df, color="WineCategory", dimensions=cols,
color_continuous_scale=px.colors.diverging.Tealrose,
title="Wine Parallel Coorinates Plot")
fig.show()
```

#### Boston House Price Dataset Parallel Coordinates Chart¶

Below we are plotting the parallel coordinates chart for the Boston dataset. We are using `HousePrice` as an attribute to color samples. We can analyze which attributes are contributing to high house prices.

In [ ]:
```cols = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
'PTRATIO', 'B', 'LSTAT',]
fig = px.parallel_coordinates(boston_df, color="HousePrice", dimensions=cols,
color_continuous_scale=px.colors.sequential.Blues,
title="Boston House Price Coordinates Plot")
fig.show()
```

### Plotly Graph Objects¶

The second way to generate parallel coordinates charts in plotly is by using the `graph_objects` module. It provides a method named `Parcoords` which can be used for plotting parallel coordinates charts. We need to provide values for two important parameters of `Parcoords` in order to generate the chart:

• `line` - It accepts dictionary where we pass column name to be used to color samples of data. We can also specify a color to use for categories as a part of this dictionary.
• `dimensions` - It accepts a list of dictionaries as input where each dictionary represents one dimension of data. We need to provide a list of values to be used and label for that values in the dictionary. We can also provide `range` for values. The `contraintrange` attribute let us highlight only that range of values from total values.

#### IRIS Dataset Parallel Coordinates Chart¶

Below we are plotting parallel coordinates chart for iris dataset.

In [ ]:
```fig = go.Figure(data=
go.Parcoords(
line = dict(color = iris_df['FlowerType'],
colorscale = [[0,'lime'],[0.5,'tomato'],[1,'dodgerblue']]),
dimensions = list([
dict(range = [0,8],
constraintrange = [4,8],
label = 'Sepal Length', values = iris_df['sepal length (cm)']),
dict(range = [0,8],
label = 'Sepal Width', values = iris_df['sepal width (cm)']),
dict(range = [0,8],
label = 'Petal Length', values = iris_df['petal length (cm)']),
dict(range = [0,8],
label = 'Petal Width', values = iris_df['petal width (cm)'])
])
)
)

fig.update_layout(
title="IRIS Flowers Parallel Coorinates Plot"
)

fig.show()
```

In [ ]:
```cols = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)',]

fig = go.Figure(data=
go.Parcoords(
line = dict(color = iris_df['FlowerType'],
colorscale = [[0,'lime'],[0.5,'tomato'],[1,'dodgerblue']]),
dimensions = [dict(label=col, values=iris_df[col]) for col in cols]
)
)

fig.update_layout(
title="IRIS Flowers Parallel Coorinates Plot"
)

fig.show()
```

#### Wine Dataset Parallel Coordinates Chart¶

Below we are plotting parallel coordinates chart for wine dataset.

In [ ]:
```cols = ["total_phenols", "flavanoids", "nonflavanoid_phenols", "proanthocyanins", "color_intensity",
"hue", "od280/od315_of_diluted_wines", "proline"]

fig = go.Figure(data=
go.Parcoords(
line = dict(color = wine_df['WineCategory'],
colorscale = [[0,'lime'],[0.5,'tomato'],[1,'dodgerblue']]),
dimensions = [dict(label=col, values=wine_df[col]) for col in cols]
)
)

fig.update_layout(
title="Wine Parallel Coorinates Plot"
)

fig.show()
```

#### Boston House Dataset Parallel Coordinates Chart¶

Below we are plotting the parallel coordinates chart for the Boston dataset.

In [ ]:
```cols = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
'PTRATIO', 'B', 'LSTAT',]

fig = go.Figure(data=
go.Parcoords(
line = dict(color = boston_df['HousePrice'],
colorscale = px.colors.sequential.Oranges),
dimensions = [dict(label=col, values=boston_df[col]) for col in cols]
)
)

fig.update_layout(
title="Boston House Price Coordinates Plot"
)

fig.show()
```

Below we are again plotting parallel coordinates chart for Boston house price dataset but this time for houses with prices in the range of `25,000-50,000` only by setting `cmin` and `cmax` parameters of dictionary given to `line` parameter.

In [ ]:
```cols = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
'PTRATIO', 'B', 'LSTAT',]

fig = go.Figure(data=
go.Parcoords(
line = dict(color = boston_df['HousePrice'],
colorscale = px.colors.sequential.Blues,
cmin=25, cmax=50),
dimensions = [dict(label=col, values=boston_df[col]) for col in cols]
)
)

fig.update_layout(
title="Boston House Price Coordinates Plot"
)

fig.show()
```

This ends our small tutorial on parallel coordinates charts plotting using python. Please feel free to let us know your views in the comments section.

Sunny Solanki

## Support Us

Thank You for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the Coffee button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.

## Want to Share Your Views? Have Any Suggestions?

If you want to

• provide some suggestions on topic
• share your views
• include some details in tutorial
• suggest some new topics on which we should create tutorials/blogs
Please feel free to let us know in the comments section below (Guest Comments are allowed). We appreciate and value your feedbacks.