Share @ LinkedIn Facebook  svm, sklearn
Scikit-Learn - Support Vector Machine

Scikit-Learn - Support Vector Machine

Table of Contents

Introduction

Support Vector Machine constructs a hyperplane or list of hyperplanes in high dimensional space, which are then used for classification/regression tasks or other tasks like outlier detection.

Below is a list of SVM versions provided by sklearn.

  • Classification Tasks
    • LinearSVC
    • SVC
    • NuSVC
  • Regression Tasks
    • LinearSVR
    • SVR
    • NuSVR
In [1]:
import numpy as np
import pandas as pd

import sklearn

import warnings

warnings.filterwarnings('ignore')

np.set_printoptions(precision=2)

%matplotlib inline

Load Dataset

We'll be loading below mentioned two for our purpose.

  • Digits Dataset: We'll be using digits dataset which has images of size 8x8 for digits 0-9. We'll use digits data for classification tasks below.
  • Boston Housing Dataset: We'll be using the Boston housing dataset which has information about various house properties like average no of rooms, per capita crime rate in town, etc. We'll be using it for regression tasks.

Sklearn provides both of this dataset as a part of the datasets module. We can load them by calling load_digits() and load_boston() methods. It returns dictionary-like object BUNCH which can be used to retrieve features and target.

In [2]:
from sklearn.datasets import load_boston, load_digits

digits = load_digits()
X_digits, Y_digits = digits.data, digits.target

print('Dataset Size : ',X_digits.shape, Y_digits.shape)
Dataset Size :  (1797, 64) (1797,)
In [3]:
boston = load_boston()
X_boston, Y_boston = boston.data, boston.target

print('Dataset Size : ',X_boston.shape, Y_boston.shape)
Dataset Size :  (506, 13) (506,)

LinearSVR

The support vector machine model that we'll be introducing is LinearSVR. It is available as a part of svm module of sklearn. We'll divide the regression dataset into train/test sets, train LinearSVR with default parameter on it, evaluate performance on the test set and then tune model by trying various hyperparameters to improve performance further. We'll also introduce various important attributes of the trained model which can give useful insights once the model is trained.

Splitting Dataset into Train & Test sets

We'll split the dataset into two parts:

  • Training data which will be used for the training model.
  • Test data against which accuracy of the trained model will be checked.

train_test_split function of the model_selection module of sklearn will help us split data into two sets with 80% for training and 20% for test purposes. We are also using seed(random_state=123) with train_test_split so that we always get the same split and can reproduce results in the future as well.

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :  (404, 13) (102, 13) (404,) (102,)

Fitting Default LinearSVR to Train Data

In [5]:
from sklearn.svm import LinearSVR, NuSVR, OneClassSVM


linear_svr = LinearSVR(max_iter=1000000)
linear_svr.fit(X_train, Y_train)
Out[5]:
LinearSVR(max_iter=1000000)

Evaluate Model Accuracy on Test Set

In [6]:
Y_preds = linear_svr.predict(X_test)

print(Y_preds[:10])
print(Y_test[:10])

print('Test R^2 Score : %.3f'%linear_svr.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training R^2 Score : %.3f'%linear_svr.score(X_train, Y_train))
[ 6.41 26.36 37.17 13.61 30.4  38.01 24.36  9.36 13.6  32.05]
[15.  26.6 45.4 20.8 34.9 21.9 28.7  7.2 20.  32.2]
Test R^2 Score : 0.579
Training R^2 Score : 0.709

Important Attributes of Estimator

LinearSVR provides a list of important attributes that can provide important insights one model is trained. Below is a list of attributes available through LinearSVR.

  • coef_ - It returns an array representing weights assigned to each feature by model. It represents the importance of each feature as per model trained.
  • intercept_ - It represents intercept of linear kernel function.
In [7]:
print("Feature Importances :", linear_svr.coef_)
Feature Importances : [-0.14  0.04  0.03  0.87 -0.72  6.01 -0.03 -0.75  0.13 -0.01 -0.57  0.01
 -0.3 ]
In [8]:
print("Model Intercept :", linear_svr.intercept_)
Model Intercept : [1.49]

Finetuning Model By Doing Grid Search On Various Hyperparameters

Below is a list of common hyperparameters that need tuning for getting the best fit for our data. We'll try various hyperparameters settings to various splits of train/test data to find out best fit which will have almost the same accuracy for both train & test dataset or have quite less difference between accuracy.

  • C - It represents regularization applied to the linear kernel function. The strength of normalization is inversely proportional to C which means that low C will result in high regularization and vice-versa. The default value of 1.0 is set.
  • max_iter - It specifies the number of iteration to try before stopping algorithm. The value of -1 represents no limit and algorithm runs until convergence.

We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by doing 5-fold cross-validation on data.

In [9]:
%%time

from sklearn.model_selection import GridSearchCV

params = {
            'C': [0.1, 0.5, 1.0, 10.0],
         }

linear_svr_regressor_grid = GridSearchCV(LinearSVR(random_state=1, max_iter=1000000), param_grid=params, n_jobs=-1, cv=5, verbose=5)
linear_svr_regressor_grid.fit(X_train,Y_train)

print('Train R^2 Score : %.3f'%linear_svr_regressor_grid.best_estimator_.score(X_train, Y_train))
print('Test R^2 Score : %.3f'%linear_svr_regressor_grid.best_estimator_.score(X_test, Y_test))
print('Best R^2 Score Through Grid Search : %.3f'%linear_svr_regressor_grid.best_score_)
print('Best Parameters : ',linear_svr_regressor_grid.best_params_)
Fitting 5 folds for each of 4 candidates, totalling 20 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   18.5s
[Parallel(n_jobs=-1)]: Done  18 out of  20 | elapsed:  1.1min remaining:    7.3s
[Parallel(n_jobs=-1)]: Done  20 out of  20 | elapsed:  1.4min finished
Train R^2 Score : 0.712
Test R^2 Score : 0.584
Best R^2 Score Through Grid Search : 0.708
Best Parameters :  {'C': 1.0}
CPU times: user 11.6 s, sys: 62.7 ms, total: 11.6 s
Wall time: 1min 37s

Printing First Few Cross Validation Results

In [10]:
cross_val_results = pd.DataFrame(linear_svr_regressor_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))

cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 4
Out[10]:
mean_fit_time std_fit_time mean_score_time std_score_time param_C params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 2.174029 0.053218 0.001128 0.000120 0.1 {'C': 0.1} 0.651777 0.778885 0.700807 0.762146 0.600923 0.698908 0.066665 3
1 7.785498 0.342932 0.000914 0.000020 0.5 {'C': 0.5} 0.662025 0.798491 0.704975 0.762723 0.610880 0.707819 0.067436 2
2 12.054649 0.589841 0.000992 0.000131 1 {'C': 1.0} 0.653940 0.809422 0.704639 0.766378 0.605480 0.707972 0.073673 1
3 33.045799 2.934728 0.000853 0.000116 10 {'C': 10.0} 0.656110 0.756510 0.691831 0.663300 0.615075 0.676565 0.046903 4

LinearSVC

The support vector machine model that we'll be introducing is LinearSVC. It is available as a part of svm module of sklearn. We'll divide classification dataset into train/test sets, train LinearSVC with default parameter on it, evaluate performance on the test set, and then tune model by trying various hyperparameters to improve performance further. We'll also introduce various important attributes of the trained model which can give useful insights once the model is trained.

Splitting Dataset into Train & Test sets

NOTE

Please make a note that we are also using stratify parameter which will prevent unequal distribution of all classes in train and test sets.For each classes, we'll have 80% samples in train set and 20% samples in test set. This will make sure that we don't have any dominating class in either train or test set.

In [11]:
X_train, X_test, Y_train, Y_test = train_test_split(X_digits, Y_digits, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :  (1437, 64) (360, 64) (1437,) (360,)

Fitting Default LinearSVC to Train Data

In [12]:
from sklearn.svm import LinearSVC

linear_svc = LinearSVC(max_iter=1000000)
linear_svc.fit(X_train, Y_train)
Out[12]:
LinearSVC(max_iter=1000000)

Evaluate Model Accuracy on Test Set

In [13]:
Y_preds = linear_svc.predict(X_test)

print(Y_preds[:15])
print(Y_test[:15])

print('Test Accuracy : %.3f'%(Y_preds == Y_test).mean())
print('Test Accuracy : %.3f'%linear_svc.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%linear_svc.score(X_train, Y_train))
[3 3 4 4 1 3 1 0 7 4 0 0 5 1 6]
[3 3 4 4 1 3 1 0 7 4 0 0 5 1 6]
Test Accuracy : 0.961
Test Accuracy : 0.961
Training Accuracy : 0.998

Important Attributes of Estimator

The LinearSVC has the same attributes as that of LinearSVR.

In [14]:
print("Feature Importances Shape :", linear_svc.coef_.shape)
Feature Importances Shape : (10, 64)
In [15]:
print("Model Intercept :", linear_svc.intercept_)
Model Intercept : [-0.   -4.13 -0.01 -0.49  0.01 -0.05 -0.02 -0.02 -2.72 -2.74]

Finetuning Model By Doing Grid Search On Various Hyperparameters

Below is a list of common hyperparameters that needs tuning for getting the best fit for our data. We'll try various hyperparameters settings to various splits of train/test data to find out best fit which will have almost the same accuracy for both train & test dataset or have quite less difference between accuracy.

  • C - It represents regularization applied to the linear kernel function. The strength of normalization is inversely proportional to C which means that low C will result in high regularization and vice-versa. The default value of 1.0 is set.
  • penalty - It accepts one of the two string values. It applies a penalty to linear kernel function and prevents it from overfitting data.
    • l1 Penalty
    • l2 Penalty(default)
  • max_iter - It specifies the number of iteration to try before stopping algorithm. The value of -1 represents no limit and algorithm runs until convergence.

We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by doing 5-fold cross-validation on data.

In [16]:
%%time

params = {
            'C': [0.1, 0.5, 1.0, 10.0],
         }

linear_svc_classifier_grid = GridSearchCV(LinearSVC(random_state=1, max_iter=1000000), param_grid=params, n_jobs=-1, cv=5, verbose=5)
linear_svc_classifier_grid.fit(X_train,Y_train)

print('Train Accuracy : %.3f'%linear_svc_classifier_grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%linear_svc_classifier_grid.best_estimator_.score(X_test, Y_test))
print('Best Accuracy Through Grid Search : %.3f'%linear_svc_classifier_grid.best_score_)
print('Best Parameters : ',linear_svc_classifier_grid.best_params_)
Fitting 5 folds for each of 4 candidates, totalling 20 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   14.5s
[Parallel(n_jobs=-1)]: Done  18 out of  20 | elapsed:  1.3min remaining:    8.6s
[Parallel(n_jobs=-1)]: Done  20 out of  20 | elapsed:  1.6min finished
Train Accuracy : 0.997
Test Accuracy : 0.969
Best Accuracy Through Grid Search : 0.953
Best Parameters :  {'C': 0.1}
CPU times: user 2.13 s, sys: 23.2 ms, total: 2.15 s
Wall time: 1min 37s

Printing First Few Cross Validation Results

In [17]:
cross_val_results = pd.DataFrame(linear_svc_classifier_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))

cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 4
Out[17]:
mean_fit_time std_fit_time mean_score_time std_score_time param_C params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 1.597615 0.072143 0.025226 0.014656 0.1 {'C': 0.1} 0.951389 0.961806 0.940767 0.951220 0.958188 0.952674 0.007202 1
1 6.911928 0.794496 0.001103 0.000042 0.5 {'C': 0.5} 0.930556 0.947917 0.926829 0.954704 0.944251 0.940851 0.010545 2
2 13.447654 2.130439 0.001011 0.000039 1 {'C': 1.0} 0.930556 0.940972 0.923345 0.951220 0.937282 0.936675 0.009439 3
3 46.044674 8.834060 0.000922 0.000089 10 {'C': 10.0} 0.923611 0.923611 0.909408 0.937282 0.926829 0.924148 0.008917 4

SVR

The support vector machine model that we'll be introducing is SVR. It is available as a part of svm module of sklearn. We'll divide the regression dataset into train/test sets, train SVR with default parameter on it, evaluate performance on the test set, and then tune model by trying various hyperparameters to improve performance further. We'll also introduce various important attributes of the trained model which can give useful insights once the model is trained.

Splitting Dataset into Train & Test sets

In [18]:
X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :  (404, 13) (102, 13) (404,) (102,)

Fitting Default SVR to Train Data

In [19]:
from sklearn.svm import SVR

svr = SVR(cache_size=1000)
svr.fit(X_train, Y_train)
Out[19]:
SVR(cache_size=1000)

Evaluate Model Accuracy on Test Set

In [20]:
Y_preds = svr.predict(X_test)

print(Y_preds[:15])
print(Y_test[:15])

print('Test R^2 Score : %.3f'%svr.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training R^2 Score : %.3f'%svr.score(X_train, Y_train))
[13.24 23.85 24.48 14.8  22.27 15.57 24.27 13.18 22.49 24.57 22.96 24.18
 20.36 19.44 21.17]
[15.  26.6 45.4 20.8 34.9 21.9 28.7  7.2 20.  32.2 24.1 18.5 13.5 27.
 23.1]
Test R^2 Score : 0.103
Training R^2 Score : 0.227

Important Attributes of Estimator

SVR provides a list of important attributes that can provide important insights one model is trained. Below is a list of attributes available through SVR.

  • support_vectors_ - It represents support vectors of the trained model.
  • intercept_ - It represents intercept of linear kernel function.
In [21]:
print("Support Vectors Shape:", svr.support_vectors_.shape)
Support Vectors Shape: (389, 13)
In [22]:
print("Model Intercept :", svr.intercept_)
Model Intercept : [19.43]

Finetuning Model By Doing Grid Search On Various Hyperparameters

Below is a list of common hyperparameters that need tuning for getting the best fit for our data. We'll try various hyperparameters settings to various splits of train/test data to find out best fit which will have almost the same accuracy for both train & test dataset or have quite less difference between accuracy.

  • C - It represents regularization applied to the linear kernel function. The strength of normalization is inversely proportional to C which means that low C will result in high regularization and vice-versa. The default value of 1.0 is set.

  • kernel - It specifies kernel type to be used in SVM. It accepts either one of the below strings or callable.

    • linear
    • poly
    • rbf (default)
    • sigmoid
    • precomputed
  • degree - It accepts integer values and represents a degree for poly kernel. It's ignored when other kernels are used.
  • gamma - It represents kernel coefficient for rbf, poly and sigmoid kernels. It accepts one of the below strings or float as value.
    • scale (default) - 1 / (n_features * X.var())
    • auto - (1/ n_features)
  • cache_size - It accepts float values representing kernel cache size in MB. The default value is 200 MB. It's suggested to increase cachec_size based on RAM of computer to increase performance of SVM.
  • max_iter - It specifies number of iteration to try before stopping algorithm. The value of -1 represents no limit and algorithm runs until convergence.

We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by doing 5-fold cross-validation on data.

In [23]:
%%time

from sklearn.model_selection import GridSearchCV

params = {
            'C': [0.1, 1.0,],
            'kernel': ['linear','rbf', 'sigmoid', ],
            'gamma': ['auto', 'scale']
         }

svr_regressor_grid = GridSearchCV(SVR(cache_size=1000), param_grid=params, n_jobs=-1, cv=5, verbose=5)
svr_regressor_grid.fit(X_train,Y_train)

print('Train R^2 Score : %.3f'%svr_regressor_grid.best_estimator_.score(X_train, Y_train))
print('Test R^2 Score : %.3f'%svr_regressor_grid.best_estimator_.score(X_test, Y_test))
print('Best R^2 Score Through Grid Search : %.3f'%svr_regressor_grid.best_score_)
print('Best Parameters : ',svr_regressor_grid.best_params_)
Fitting 5 folds for each of 12 candidates, totalling 60 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    0.5s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:   12.7s finished
Train R^2 Score : 0.723
Test R^2 Score : 0.605
Best R^2 Score Through Grid Search : 0.714
Best Parameters :  {'C': 1.0, 'gamma': 'auto', 'kernel': 'linear'}
CPU times: user 4.59 s, sys: 4.72 ms, total: 4.59 s
Wall time: 17.1 s

Printing First Few Cross Validation Results

In [24]:
cross_val_results = pd.DataFrame(svr_regressor_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))

cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 12
Out[24]:
mean_fit_time std_fit_time mean_score_time std_score_time param_C param_gamma param_kernel params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 0.516886 0.149781 0.001687 0.000075 0.1 auto linear {'C': 0.1, 'gamma': 'auto', 'kernel': 'linear'} 0.702459 0.768035 0.727951 0.734547 0.576907 0.701980 0.065942 3
1 0.013797 0.000271 0.002895 0.000187 0.1 auto rbf {'C': 0.1, 'gamma': 'auto', 'kernel': 'rbf'} -0.013447 0.001137 -0.011739 -0.048919 -0.021288 -0.018851 0.016669 10
2 0.009250 0.001254 0.001722 0.000120 0.1 auto sigmoid {'C': 0.1, 'gamma': 'auto', 'kernel': 'sigmoid'} -0.016672 -0.001444 -0.013125 -0.053354 -0.026054 -0.022130 0.017488 11
3 0.510559 0.135548 0.001627 0.000090 0.1 scale linear {'C': 0.1, 'gamma': 'scale', 'kernel': 'linear'} 0.702459 0.768035 0.727951 0.734547 0.576907 0.701980 0.065942 3
4 0.013602 0.000387 0.002603 0.000111 0.1 scale rbf {'C': 0.1, 'gamma': 'scale', 'kernel': 'rbf'} 0.113661 0.169402 0.108838 0.089386 0.083959 0.113049 0.030331 6

SVC

The support vector machine model that we'll be introducing is SVC. It is available as a part of svm module of sklearn. We'll divide classification dataset into train/test sets, train SVC with default parameter on it, evaluate performance on the test set, and then tune model by trying various hyperparameters to improve performance further. We'll also introduce various important attributes of the trained model which can give useful insights once the model is trained.

Splitting Dataset into Train & Test sets

In [25]:
X_train, X_test, Y_train, Y_test = train_test_split(X_digits, Y_digits, train_size=0.80, test_size=0.20, random_state=123)

print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :  (1437, 64) (360, 64) (1437,) (360,)

Fitting Default SVC to Train Data

In [26]:
from sklearn.svm import SVC

svc = SVC(cache_size=1000)
svc.fit(X_train, Y_train)
Out[26]:
SVC(cache_size=1000)

Evaluate Model Accuracy on Test Set

In [27]:
Y_preds = svc.predict(X_test)

print(Y_preds[:15])
print(Y_test[:15])

print('Test Accuracy : %.3f'%(Y_preds == Y_test).mean())
print('Test Accuracy : %.3f'%svc.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%svc.score(X_train, Y_train))
[3 3 4 4 1 3 1 0 7 4 0 0 5 1 6]
[3 3 4 4 1 3 1 0 7 4 0 0 5 1 6]
Test Accuracy : 0.989
Test Accuracy : 0.989
Training Accuracy : 0.997

Important Attributes of Estimator

SVC has the same set of attributes as that of SVR.

In [28]:
print("Support Vectors :", svc.support_vectors_.shape)
Support Vectors : (660, 64)
In [29]:
#print("Feature Importances :", svc.coef_) ## Only for Linear Kernel
In [30]:
print("Model Intercept :", svc.intercept_)
Model Intercept : [-0.59 -0.33 -0.22 -0.49 -0.47 -0.06 -0.43 -0.31 -0.32  0.4   0.55  0.11
  0.43  0.55  0.07  0.7   0.43  0.07 -0.1   0.01  0.3  -0.13  0.14  0.05
 -0.29  0.09  0.22 -0.27 -0.17  0.02  0.29  0.6   0.06  0.18 -0.03  0.33
 -0.26  0.01 -0.38 -0.35 -0.4  -0.34  0.31 -0.05 -0.32]

Finetuning Model By Doing Grid Search On Various Hyperparameters

SVC has the same parameters as that of SVR

In [31]:
%%time

params = {
            'C': [0.1, 1.0, ],
            'kernel': ['linear', 'rbf', 'sigmoid',],
            'gamma': ['auto', 'scale']
         }

svc_classifier_grid = GridSearchCV(SVC(cache_size=1000), param_grid=params, n_jobs=-1, cv=5, verbose=5)
svc_classifier_grid.fit(X_train,Y_train)

print('Train Accuracy : %.3f'%svc_classifier_grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%svc_classifier_grid.best_estimator_.score(X_test, Y_test))
print('Best Accuracy Through Grid Search : %.3f'%svc_classifier_grid.best_score_)
print('Best Parameters : ',svc_classifier_grid.best_params_)
Fitting 5 folds for each of 12 candidates, totalling 60 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  12 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:    3.9s finished
Train Accuracy : 0.997
Test Accuracy : 0.989
Best Accuracy Through Grid Search : 0.988
Best Parameters :  {'C': 1.0, 'gamma': 'scale', 'kernel': 'rbf'}
CPU times: user 411 ms, sys: 16.1 ms, total: 428 ms
Wall time: 4.19 s

Printing First Few Cross Validation Results

In [32]:
cross_val_results = pd.DataFrame(svc_classifier_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))

cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 12
Out[32]:
mean_fit_time std_fit_time mean_score_time std_score_time param_C param_gamma param_kernel params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 0.052571 0.002183 0.014920 0.000300 0.1 auto linear {'C': 0.1, 'gamma': 'auto', 'kernel': 'linear'} 0.975694 0.986111 0.975610 0.982578 0.975610 0.979121 0.004409 2
1 0.417765 0.015139 0.058806 0.001837 0.1 auto rbf {'C': 0.1, 'gamma': 'auto', 'kernel': 'rbf'} 0.104167 0.104167 0.108014 0.108014 0.104530 0.105778 0.001830 10
2 0.301676 0.010570 0.046517 0.000632 0.1 auto sigmoid {'C': 0.1, 'gamma': 'auto', 'kernel': 'sigmoid'} 0.104167 0.104167 0.108014 0.108014 0.104530 0.105778 0.001830 10
3 0.052814 0.000838 0.015528 0.000516 0.1 scale linear {'C': 0.1, 'gamma': 'scale', 'kernel': 'linear'} 0.975694 0.986111 0.975610 0.982578 0.975610 0.979121 0.004409 2
4 0.228706 0.003448 0.050628 0.000324 0.1 scale rbf {'C': 0.1, 'gamma': 'scale', 'kernel': 'rbf'} 0.923611 0.972222 0.937282 0.954704 0.958188 0.949202 0.016958 6

NuSVR

The support vector machine model that we'll be introducing is NuSVR. It is available as a part of svm module of sklearn. We'll divide the regression dataset into train/test sets, train NuSVR with default parameter on it, evaluate performance on the test set, and then tune model by trying various hyperparameters to improve performance further. We'll also introduce various important attributes of the trained model which can give useful insights once the model is trained.

Splitting Dataset into Train & Test sets

In [33]:
X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :  (404, 13) (102, 13) (404,) (102,)

Fitting Default NuSVR to Train Data

In [34]:
from sklearn.svm import NuSVR

nu_svr = NuSVR(cache_size=1000)
nu_svr.fit(X_train, Y_train)
Out[34]:
NuSVR(cache_size=1000)

Evaluate Model Accuracy on Test Set

In [35]:
Y_preds = nu_svr.predict(X_test)

print(Y_preds[:15])
print(Y_test[:15])

print('Test R^2 Score : %.3f'%nu_svr.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training R^2 Score : %.3f'%nu_svr.score(X_train, Y_train))
[13.73 24.76 25.28 15.44 23.1  16.22 25.11 13.7  23.71 25.25 23.87 25.04
 21.54 20.78 22.26]
[15.  26.6 45.4 20.8 34.9 21.9 28.7  7.2 20.  32.2 24.1 18.5 13.5 27.
 23.1]
Test R^2 Score : 0.148
Training R^2 Score : 0.255

Important Attributes of Estimator

NuSVR provides a list of important attributes that can provide important insights one model is trained. Below is a list of attributes available through NuSVR.

  • support_vectors_ - It represents support vectors of the trained model.
  • coef_ - It returns an array representing weights assigned to each feature by model. It represents the importance of each feature as per model trained.
  • intercept_ - It represents intercept of linear kernel function.
In [36]:
print("Support Vectors :", nu_svr.support_vectors_.shape)
Support Vectors : (204, 13)
In [37]:
#print("Feature Importances :", nu_svr.coef_) ## Only for Linear Kernel
In [38]:
print("Model Intercept :", nu_svr.intercept_)
Model Intercept : [19.65]

Finetuning Model By Doing Grid Search On Various Hyperparameters

Below is a list of common hyperparameters that needs tuning for getting the best fit for our data. We'll try various hyperparameters settings to various splits of train/test data to find out best fit which will have almost the same accuracy for both train & test dataset or have quite less difference between accuracy.

  • nu - It accepts float value between 0-1. It represents an upper bound of the fraction of margin errors and lowers bound on the fraction of support vectors.
  • kernel - It specifies kernel type to be used in SVM. It accepts either one of the below strings or callable.
    • linear
    • poly
    • rbf (default)
    • sigmoid
    • precomputed
  • degree - It accepts integer values and represents a degree for poly kernel. It's ignored when other kernels are used.
  • gamma - It represents kernel coefficient for rbf, poly and sigmoid kernels. It accepts one of the below strings or float as value.
    • scale (default) - 1 / (n_features * X.var())
    • auto - (1/ n_features)
  • cache_size - It accepts float values representing kernel cache size in MB. The default value is 200 MB. It's suggested to increase cachec_size based on RAM of the computer to increase the performance of SVM.
  • max_iter - It specifies the number of iteration to try before stopping algorithm. The value of -1 represents no limit and algorithm runs until convergence.

We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by doing 5-fold cross-validation on data.

In [39]:
%%time

from sklearn.model_selection import GridSearchCV

params = {
            'nu': [0.1, 1.0,],
            'kernel': ['linear', 'rbf', 'sigmoid',],
            'gamma': ['auto', 'scale']
         }

svr_regressor_grid = GridSearchCV(NuSVR(cache_size=1000), param_grid=params, n_jobs=-1, cv=5, verbose=5)
svr_regressor_grid.fit(X_train,Y_train)

print('Train R^2 Score : %.3f'%svr_regressor_grid.best_estimator_.score(X_train, Y_train))
print('Test R^2 Score : %.3f'%svr_regressor_grid.best_estimator_.score(X_test, Y_test))
print('Best R^2 Score Through Grid Search : %.3f'%svr_regressor_grid.best_score_)
print('Best Parameters : ',svr_regressor_grid.best_params_)
Fitting 5 folds for each of 12 candidates, totalling 60 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    9.0s
[Parallel(n_jobs=-1)]: Done  53 out of  60 | elapsed:   25.7s remaining:    3.4s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:   46.9s finished
Train R^2 Score : 0.725
Test R^2 Score : 0.610
Best R^2 Score Through Grid Search : 0.713
Best Parameters :  {'gamma': 'auto', 'kernel': 'linear', 'nu': 1.0}
CPU times: user 8.76 s, sys: 13.3 ms, total: 8.77 s
Wall time: 55.5 s

Printing First Few Cross Validation Results

In [40]:
cross_val_results = pd.DataFrame(svr_regressor_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))

cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 12
Out[40]:
mean_fit_time std_fit_time mean_score_time std_score_time param_gamma param_kernel param_nu params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 1.903638 0.435652 0.001060 0.000065 auto linear 0.1 {'gamma': 'auto', 'kernel': 'linear', 'nu': 0.1} 0.682853 0.687668 0.717526 0.655335 0.605221 0.669721 0.037807 3
1 13.480651 5.694576 0.001730 0.000147 auto linear 1 {'gamma': 'auto', 'kernel': 'linear', 'nu': 1.0} 0.679955 0.799852 0.716207 0.757667 0.613447 0.713426 0.064110 1
2 0.003685 0.000221 0.000962 0.000036 auto rbf 0.1 {'gamma': 'auto', 'kernel': 'rbf', 'nu': 0.1} -0.191299 -0.429605 -0.142359 -0.123709 -0.200381 -0.217471 0.109919 12
3 0.017416 0.000186 0.002699 0.000028 auto rbf 1 {'gamma': 'auto', 'kernel': 'rbf', 'nu': 1.0} 0.005197 0.021444 -0.003875 -0.020708 -0.005375 -0.000664 0.013838 7
4 0.002699 0.000031 0.000793 0.000030 auto sigmoid 0.1 {'gamma': 'auto', 'kernel': 'sigmoid', 'nu': 0.1} -0.192127 -0.432815 -0.133293 -0.111075 -0.207635 -0.215389 0.114452 11

NuSVC

The support vector machine model that we'll be introducing is NuSVC. It is available as a part of svm module of sklearn. We'll divide the regression dataset into train/test sets, train NuSVC with default parameter on it, evaluate performance on the test set, and then tune model by trying various hyperparameters to improve performance further. We'll also introduce various important attributes of the trained model which can give useful insights once the model is trained.

Splitting Dataset into Train & Test sets

In [41]:
X_train, X_test, Y_train, Y_test = train_test_split(X_digits, Y_digits, train_size=0.80, test_size=0.20, random_state=123)

print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :  (1437, 64) (360, 64) (1437,) (360,)

Fitting Default NuSVC to Train Data

In [42]:
from sklearn.svm import NuSVC

nu_svc = NuSVC(cache_size=1000)
nu_svc.fit(X_train, Y_train)
Out[42]:
NuSVC(cache_size=1000)

Evaluate Model Accuracy on Test Set

In [43]:
Y_preds = nu_svc.predict(X_test)

print(Y_preds[:15])
print(Y_test[:15])

print('Test Accuracy : %.3f'%(Y_preds == Y_test).mean())
print('Test Accuracy : %.3f'%nu_svc.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%nu_svc.score(X_train, Y_train))
[3 3 4 4 1 3 1 0 7 4 0 0 5 1 6]
[3 3 4 4 1 3 1 0 7 4 0 0 5 1 6]
Test Accuracy : 0.958
Test Accuracy : 0.958
Training Accuracy : 0.972

Important Attributes of Estimator

NuSVC has the same set of attributes as that of NuSVR.

In [44]:
print("Support Vectors :", nu_svc.support_vectors_.shape)
Support Vectors : (1215, 64)
In [45]:
#print("Feature Importances :", nu_svc.coef_) ## Only for Linear Kernel
In [46]:
print("Model Intercept :", nu_svc.intercept_)
Model Intercept : [-0.41 -0.28 -0.24 -0.38 -0.31 -0.09 -0.31 -0.36 -0.44  0.19  0.37  0.28
  0.24  0.46  0.11  0.48  0.16  0.14 -0.03  0.07  0.22 -0.08  0.02 -0.04
 -0.11  0.01  0.1  -0.18 -0.13 -0.12  0.07  0.34 -0.02  0.05 -0.01  0.18
 -0.06  0.02 -0.27 -0.2  -0.27 -0.26  0.12  0.03 -0.14]

Finetuning Model By Doing Grid Search On Various Hyperparameters

NuSVC has same parameters as that of NuSVR

In [47]:
%%time

params = {
            'nu': [0.1, 1.0, ],
            'kernel': ['linear', 'rbf', 'sigmoid',],
            'gamma': ['auto', 'scale']
         }

svc_classifier_grid = GridSearchCV(NuSVC(cache_size=1000), param_grid=params, n_jobs=-1, cv=5, verbose=5)
svc_classifier_grid.fit(X_train,Y_train)

print('Train Accuracy : %.3f'%svc_classifier_grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%svc_classifier_grid.best_estimator_.score(X_test, Y_test))
print('Best Accuracy Through Grid Search : %.3f'%svc_classifier_grid.best_score_)
print('Best Parameters : ',svc_classifier_grid.best_params_)
Fitting 5 folds for each of 12 candidates, totalling 60 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  12 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done  60 out of  60 | elapsed:    1.7s finished
Train Accuracy : 0.997
Test Accuracy : 0.992
Best Accuracy Through Grid Search : 0.989
Best Parameters :  {'gamma': 'scale', 'kernel': 'rbf', 'nu': 0.1}
CPU times: user 422 ms, sys: 16.1 ms, total: 438 ms
Wall time: 1.98 s

Printing First Few Cross Validation Results

In [48]:
cross_val_results = pd.DataFrame(svc_classifier_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))

cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 12
Out[48]:
mean_fit_time std_fit_time mean_score_time std_score_time param_gamma param_kernel param_nu params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 0.100268 0.000486 0.019382 0.000251 auto linear 0.1 {'gamma': 'auto', 'kernel': 'linear', 'nu': 0.1} 0.968750 0.996528 0.975610 0.989547 0.986063 0.983299 0.009924 2
1 0.001678 0.000147 0.000000 0.000000 auto linear 1 {'gamma': 'auto', 'kernel': 'linear', 'nu': 1.0} NaN NaN NaN NaN NaN NaN NaN 7
2 0.520766 0.032240 0.063453 0.011014 auto rbf 0.1 {'gamma': 'auto', 'kernel': 'rbf', 'nu': 0.1} 0.479167 0.475694 0.466899 0.554007 0.456446 0.486443 0.034685 4
3 0.002256 0.001235 0.000000 0.000000 auto rbf 1 {'gamma': 'auto', 'kernel': 'rbf', 'nu': 1.0} NaN NaN NaN NaN NaN NaN NaN 8
4 0.039665 0.003305 0.006151 0.000455 auto sigmoid 0.1 {'gamma': 'auto', 'kernel': 'sigmoid', 'nu': 0.1} 0.104167 0.104167 0.108014 0.108014 0.104530 0.105778 0.001830 6

This ends our small tutorial introducing various SVM estimators available as a part of sklearn. Please feel free to let us know your views in the comments section.

References



Sunny Solanki  Sunny Solanki