Updated On : May-24,2020 Time Investment : ~30 mins

# Scikit-Learn - Cross-Validation & Hyperparameter Tuning Using Grid Search & Randomized Search¶

## 1. Cross Validation ¶

We generally split our dataset into train and test sets. We then train our model with train data and evaluate it on test data. This kind of approach lets our model only see a training dataset which is generally around 4/5 of the data.

A better way to generalize the performance of the model is cross-validation as it lets us use more data. In cross-validation, various models are built using different training and non-overlapping test sets. Performance on test sets is then aggregated for better results.

### Image Explaining 5-Fold Cross Validation¶

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import sklearn

from collections import Counter

np.set_printoptions(precision=2)

%matplotlib inline
```

## Default Classification Tasks Approach ¶

Below we are trying the default approach to classification tasks where we divide data into train/test sets, train model, and evaluate it on the test set. We are trying only one combination of the dataset without any kind of cross-validation. It does not explore data fully hence can result in the less generic model.

```from sklearn import datasets

X_iris, Y_iris = iris.data, iris.target
print('Dataset Size : ', X_iris.shape, Y_iris.shape)
```
```Dataset Size :  (150, 4) (150,)
```

### Splitting Datasets Into Train/Test Sets¶

```from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X_iris, Y_iris, train_size=0.80, test_size=0.20, random_state=12, stratify=Y_iris)
print('Train/Test Sizes : ',X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
```
```Train/Test Sizes :  (120, 4) (30, 4) (120,) (30,)
```

### Training Model¶

```from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()
knn.fit(X_train, Y_train)
```
```KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')```

### Evaluating Model On Test Set.¶

```print('Train Accuracy : %.2f'%knn.score(X_train, Y_train))
print('Test Accuracy : %.2f'%knn.score(X_test, Y_test))
```
```Train Accuracy : 0.96
Test Accuracy : 1.00
```

## Default Regression Tasks Approach ¶

Below we are trying the default approach to regression tasks where we divide data into train/test sets, train model, and evaluate it on the test set. We are trying only one combination of the dataset without any kind of cross-validation. It does not explore data fully hence can result in the less generic model.

```boston = datasets.load_boston()
X_boston, Y_boston = boston.data, boston.target
print('Dataset Size : ', X_boston.shape, Y_boston.shape)
```
```Dataset Size :  (506, 13) (506,)
```

### Splitting Datasets Into Train/Test Sets¶

```from sklearn.neighbors import KNeighborsRegressor

X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston, train_size=0.80, test_size=0.20, random_state=12)
print('Train/Test Sizes : ',X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
```
```Train/Test Sizes :  (404, 13) (102, 13) (404,) (102,)
```

### Training Model¶

```knn = KNeighborsRegressor()
knn.fit(X_train, Y_train)
```
```KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')```

### Evaluating Model On Test Set.¶

```print('Train R^2 Score : %.2f'%knn.score(X_train, Y_train))
print('Test R^2 Score : %.2f'%knn.score(X_test, Y_test))
```
```Train R^2 Score : 0.71
Test R^2 Score : 0.54
```

The above implementation considers only one set of train and test sets. It has not seen the whole dataset. We might get even better results if we try a few other possible combinations of train/test splits. Hence it’s worth trying various combinations to find out good results that generalize well.

`sklearn` also provides various splitting strategies as mentioned below:

• KFold
• StratifiedKFold
• ShuffleSplit
• StratifiedShuffleSPlit

`sklearn` provides `cross_val_score` method which tries various combinations of train/test splits and produces results of each split test score as output.

`sklearn` also provides a `cross_validate` method which is exactly the same as `cross_val_score` except that it returns a dictionary which has fit time, score time and test scores for each splits.

We are trying below `StratifiedKFold` and `StratifiedShuffleSplit` for classification dataset(iris) and `KFold` and `ShuffleSplit` for regression dataset(boston).

## KFold ¶

K-Fold cross-validation is quite common cross-validation. In K-Fold CV, the total dataset is generally divided into 5/10 folds and then for each iteration of model training, one fold is taken as the test set and remaining folds are combined to the created train set.

```from sklearn.model_selection import cross_val_score, cross_validate
from sklearn.model_selection import KFold,StratifiedKFold, ShuffleSplit, StratifiedShuffleSplit
```
```print('Classifying Without Any Cross Validation : ', cross_val_score(KNeighborsRegressor(), X_boston, Y_boston, cv=5)) # Default KFold CV
print('Classifying With KFold Cross Validation : ', cross_val_score(KNeighborsRegressor(), X_boston, Y_boston, cv=KFold(n_splits=5)))
```
```Classifying Without Any Cross Validation :  [-1.11  0.15 -0.43 -0.01 -0.17]
Classifying With KFold Cross Validation :  [-1.11  0.15 -0.43 -0.01 -0.17]
```
```print('Classifying Without Any Cross Validation : \n', cross_validate(KNeighborsRegressor(), X_boston, Y_boston, cv=5)) # Default KFold CV
print('\nClassifying With KFold Cross Validation : \n', cross_validate(KNeighborsRegressor(), X_boston, Y_boston, cv=KFold(n_splits=5)))
```
```Classifying Without Any Cross Validation :
{'fit_time': array([0., 0., 0., 0., 0.]), 'score_time': array([0.  , 0.01, 0.  , 0.  , 0.  ]), 'test_score': array([-1.11,  0.15, -0.43, -0.01, -0.17])}

Classifying With KFold Cross Validation :
{'fit_time': array([0., 0., 0., 0., 0.]), 'score_time': array([0., 0., 0., 0., 0.]), 'test_score': array([-1.11,  0.15, -0.43, -0.01, -0.17])}
```

We are trying to split the IRIS classification dataset with `KFold`. Notice that we are also printing each class distribution in train and test sets after splits. Please make a note that class distribution is not proper in training and test sets. By class distribution, we mean that each class of classification dataset has the same amount of presence in both train and test sets. It means that if one class is representing 30% samples of the whole dataset then in both train and test sets it should have 30% representation.

Hence we should generally use `StratifiedKFold` for `classification` datasets and `KFold` for `regression` datasets.

```kfold = KFold(n_splits=5)
for i, (train_indexes, test_indexes) in enumerate(kfold.split(X_iris)):
mask = np.array([(False if j in train_indexes else True) for j in range(len(Y_iris))])
print('Split[%d] Train Index Distribution by class : '%(i+1),np.bincount(Y_iris[train_indexes])/len(Y_iris))
print('Split[%d] Test Index Distribution by class : '%(i+1), np.bincount(Y_iris[test_indexes])/len(Y_iris))
```
```Split[1] Train Index Distribution by class :  [0.13 0.33 0.33]
Split[1] Test Index Distribution by class :  [0.2]
Split[2] Train Index Distribution by class :  [0.2  0.27 0.33]
Split[2] Test Index Distribution by class :  [0.13 0.07]
Split[3] Train Index Distribution by class :  [0.33 0.13 0.33]
Split[3] Test Index Distribution by class :  [0.  0.2]
Split[4] Train Index Distribution by class :  [0.33 0.27 0.2 ]
Split[4] Test Index Distribution by class :  [0.   0.07 0.13]
Split[5] Train Index Distribution by class :  [0.33 0.33 0.13]
Split[5] Test Index Distribution by class :  [0.  0.  0.2]
```

### Visualizing Splits Of KFold¶

Below we are visualizing splits created by KFold from the previous step. We had maintained how it split data at each step into train and test data. Please make a note from the plot that Y-axis represents a split number. We can notice that in the first split it took the first 30 samples as the test set and remaining 120 samples as a train set. We then select the next 30 samples as the train set in the next iteration and so on.

```with plt.style.context(('seaborn', 'ggplot')):
plt.yticks(range(5), range(1,6))
plt.grid(None);
```

## StratifiedKFold ¶

The `StratifiedKFold` is commonly used for classification tasks. It works almost like `KFold` with the only difference that it maintains class distribution the same in train/test sets as that of original dataset distribution. So if we have one class which has a 30% sample in the original dataset then when we split it into train/test sets, both train and test sets will also have a 30% distribution of this class.

```print('Classifying Without Any Cross Validation : ', cross_val_score(KNeighborsClassifier(), X_iris, Y_iris, cv=5)) ## It uses StratifiedKFold default
print('Classifying With Stratified KFold Cross Validation : ', cross_val_score(KNeighborsClassifier(), X_iris, Y_iris, cv=StratifiedKFold(n_splits=5)))
```
```Classifying Without Any Cross Validation :  [0.97 1.   0.93 0.97 1.  ]
Classifying With Stratified KFold Cross Validation :  [0.97 1.   0.93 0.97 1.  ]
```
```print('Classifying Without Any Cross Validation : \n', cross_validate(KNeighborsClassifier(), X_iris, Y_iris, cv=5)) ## It uses StratifiedKFold default
print('\nClassifying With Stratified KFold Cross Validation : \n', cross_validate(KNeighborsClassifier(), X_iris, Y_iris, cv=StratifiedKFold(n_splits=5)))
```
```Classifying Without Any Cross Validation :
{'fit_time': array([0., 0., 0., 0., 0.]), 'score_time': array([0., 0., 0., 0., 0.]), 'test_score': array([0.97, 1.  , 0.93, 0.97, 1.  ])}

Classifying With Stratified KFold Cross Validation :
{'fit_time': array([0., 0., 0., 0., 0.]), 'score_time': array([0., 0., 0., 0., 0.]), 'test_score': array([0.97, 1.  , 0.93, 0.97, 1.  ])}
```

`cross_val_score` method will first divide the dataset into the first 5 folds and for each iteration, it takes one of the fold as the test set and other folds as a train set. It generally uses `KFold` by default for creating folds for regression problems and `StratifiedKFold` for classification problems.

We are trying to split the classification dataset with `StratifiedKFold`. Notice that we are also printing each class distribution in train and test sets after splits. Here we can note that class distribution is proper in train and test sets.

```skfold = StratifiedKFold(n_splits=5)
for i, (train_indexes, test_indexes) in enumerate(skfold.split(X_iris, Y_iris)):
print('Split[%d] Train Index Distribution by class : '%(i+1),np.bincount(Y_iris[train_indexes])/len(Y_iris))
print('Split[%d] Test Index Distribution by class : '%(i+1), np.bincount(Y_iris[test_indexes])/len(Y_iris))
mask = np.array([(False if j in train_indexes else True) for j in range(len(Y_iris))])
```
```Split[1] Train Index Distribution by class :  [0.27 0.27 0.27]
Split[1] Test Index Distribution by class :  [0.07 0.07 0.07]
Split[2] Train Index Distribution by class :  [0.27 0.27 0.27]
Split[2] Test Index Distribution by class :  [0.07 0.07 0.07]
Split[3] Train Index Distribution by class :  [0.27 0.27 0.27]
Split[3] Test Index Distribution by class :  [0.07 0.07 0.07]
Split[4] Train Index Distribution by class :  [0.27 0.27 0.27]
Split[4] Test Index Distribution by class :  [0.07 0.07 0.07]
Split[5] Train Index Distribution by class :  [0.27 0.27 0.27]
Split[5] Test Index Distribution by class :  [0.07 0.07 0.07]
```

### Visualizing Splits Of StratifiedKFold¶

Below we are visualizing splits created by StratifiedKFold from the previous step. We had maintained how it split data at each step into train and test data. Please make a note from the plot that Y-axis represents a split number. We can notice that in the first split it took the first 30 samples as the test set and remaining 120 samples as train set while maintaining class proportion as well. We then select the next 30 samples as the train set in the next iteration and so on.

```with plt.style.context(('seaborn', 'ggplot')):
plt.yticks(range(5), range(1,6))
plt.grid(None);
```

## ShuffleSplit ¶

The `ShuffleSplit` as its name suggests splits dataset based on randomly selected indices. It's commonly used for regression tasks.

```print('Classifying Without Any Cross Validation : ', cross_val_score(KNeighborsRegressor(), X_boston, Y_boston, cv=5)) # Default KFold CV
print('Classifying With ShuffleSplit Cross Validation : ', cross_val_score(KNeighborsRegressor(), X_boston, Y_boston, cv=ShuffleSplit(n_splits=5)))
```
```Classifying Without Any Cross Validation :  [-1.11  0.15 -0.43 -0.01 -0.17]
Classifying With ShuffleSplit Cross Validation :  [0.48 0.64 0.54 0.68 0.54]
```
```print('Classifying Without Any Cross Validation : \n', cross_validate(KNeighborsRegressor(), X_boston, Y_boston, cv=5)) # Default KFold CV
print('\nClassifying With ShuffleSplit Cross Validation : \n', cross_validate(KNeighborsRegressor(), X_boston, Y_boston, cv=ShuffleSplit(n_splits=5)))
```
```Classifying Without Any Cross Validation :
{'fit_time': array([0., 0., 0., 0., 0.]), 'score_time': array([0., 0., 0., 0., 0.]), 'test_score': array([-1.11,  0.15, -0.43, -0.01, -0.17])}

Classifying With ShuffleSplit Cross Validation :
{'fit_time': array([0., 0., 0., 0., 0.]), 'score_time': array([0., 0., 0., 0., 0.]), 'test_score': array([0.24, 0.17, 0.65, 0.55, 0.53])}
```

We are trying to split the classification dataset with `ShuffleSplit`. Notice that we are also printing each class distribution in train and test sets after splits. Please make a note that class distribution is not proper in training and test sets. Hence we should generally use `StratifiedShuffleSplit` for classification datasets and `ShuffleSplit` for regression datasets.

```shuffle_split = ShuffleSplit(n_splits=5)
for i, (train_indexes, test_indexes) in enumerate(shuffle_split.split(X_iris)):
print('Split[%d] Train Index Distribution by class : '%(i+1),np.bincount(Y_iris[train_indexes])/len(Y_iris))
print('Split[%d] Test Index Distribution by class : '%(i+1), np.bincount(Y_iris[test_indexes])/len(Y_iris))
mask = np.array([(False if j in train_indexes else True) for j in range(len(Y_iris))])
```
```Split[1] Train Index Distribution by class :  [0.31 0.31 0.28]
Split[1] Test Index Distribution by class :  [0.02 0.03 0.05]
Split[2] Train Index Distribution by class :  [0.29 0.29 0.31]
Split[2] Test Index Distribution by class :  [0.04 0.04 0.02]
Split[3] Train Index Distribution by class :  [0.31 0.29 0.29]
Split[3] Test Index Distribution by class :  [0.02 0.04 0.04]
Split[4] Train Index Distribution by class :  [0.3 0.3 0.3]
Split[4] Test Index Distribution by class :  [0.03 0.03 0.03]
Split[5] Train Index Distribution by class :  [0.29 0.28 0.33]
Split[5] Test Index Distribution by class :  [0.04 0.05 0.01]
```

### Visualizing Splits Of ShuffleSplit¶

We can notice from below visualization that `ShuffleSplit` selected samples randomly unlike `KFold` which selects samples serially.

```with plt.style.context(('seaborn', 'ggplot')):
plt.yticks(range(5), range(1,6))
plt.grid(None);
```

## StratifiedShuffleSplit ¶

The `StratifiedShuffleSplit` works exactly like `ShuffleSplit` but designed for classification tasks where we need to maintain class proportion after splitting of data.

```print('Classifying Without Any Cross Validation : ', cross_val_score(KNeighborsClassifier(), X_iris, Y_iris, cv=5)) ## It uses StratifiedKFold default
print('Classifying With StratifiedShuffleSplit Cross Validation : ', cross_val_score(KNeighborsClassifier(), X_iris, Y_iris, cv=StratifiedShuffleSplit(n_splits=5)))
```
```Classifying Without Any Cross Validation :  [0.97 1.   0.93 0.97 1.  ]
Classifying With StratifiedShuffleSplit Cross Validation :  [1. 1. 1. 1. 1.]
```
```print('Classifying Without Any Cross Validation : \n', cross_validate(KNeighborsClassifier(), X_iris, Y_iris, cv=5)) ## It uses StratifiedKFold default
print('\nClassifying With StratifiedShuffleSplit Cross Validation : \n', cross_validate(KNeighborsClassifier(), X_iris, Y_iris, cv=StratifiedShuffleSplit(n_splits=5)))
```
```Classifying Without Any Cross Validation :
{'fit_time': array([0., 0., 0., 0., 0.]), 'score_time': array([0., 0., 0., 0., 0.]), 'test_score': array([0.97, 1.  , 0.93, 0.97, 1.  ])}

Classifying With StratifiedShuffleSplit Cross Validation :
{'fit_time': array([0., 0., 0., 0., 0.]), 'score_time': array([0., 0., 0., 0., 0.]), 'test_score': array([0.93, 1.  , 0.87, 1.  , 1.  ])}
```

We are trying to split the classification dataset with `StratifiedShuffleSplit`. Notice that we are also printing each class distribution in train and test sets after splits. Here we can note that class distribution is proper in train and test sets.

```shuffle_split = StratifiedShuffleSplit(n_splits=5)
for i, (train_indexes, test_indexes) in enumerate(shuffle_split.split(X_iris, Y_iris)):
print('Split[%d] Train Index Distribution by class : '%(i+1),np.bincount(Y_iris[train_indexes])/len(Y_iris))
print('Split[%d] Test Index Distribution by class : '%(i+1), np.bincount(Y_iris[test_indexes])/len(Y_iris))
mask = np.array([(False if j in train_indexes else True) for j in range(len(Y_iris))])
```
```Split[1] Train Index Distribution by class :  [0.3 0.3 0.3]
Split[1] Test Index Distribution by class :  [0.03 0.03 0.03]
Split[2] Train Index Distribution by class :  [0.3 0.3 0.3]
Split[2] Test Index Distribution by class :  [0.03 0.03 0.03]
Split[3] Train Index Distribution by class :  [0.3 0.3 0.3]
Split[3] Test Index Distribution by class :  [0.03 0.03 0.03]
Split[4] Train Index Distribution by class :  [0.3 0.3 0.3]
Split[4] Test Index Distribution by class :  [0.03 0.03 0.03]
Split[5] Train Index Distribution by class :  [0.3 0.3 0.3]
Split[5] Test Index Distribution by class :  [0.03 0.03 0.03]
```

### Visualising Splits Of StratifiedShuffleSplit¶

```with plt.style.context(('seaborn', 'ggplot')):
plt.yticks(range(5), range(1,6))
plt.grid(None);
```

`sklearn` also provides `validatation_curve` method which can take single hyperparameters and list of various values for that hyperparameters, then it returns train and test scores for various cross-validation folds. It's generally used for plotting purposes.

```from sklearn.model_selection import validation_curve

n_neighbors = [1, 3, 5, 10, 20, 50]
train_scores, test_scores = validation_curve(KNeighborsRegressor(), X_iris, Y_iris, param_name="n_neighbors",
param_range=n_neighbors, cv=StratifiedShuffleSplit(n_splits=5, random_state=123))

with plt.style.context(('seaborn', 'ggplot')):
plt.plot(n_neighbors, train_scores.mean(axis=1), label="train accuracy")
plt.plot(n_neighbors, test_scores.mean(axis=1), label="test accuracy")
plt.ylabel('Accuracy')
plt.xlabel('Number of neighbors')
#plt.xlim([50, 0])
plt.legend(loc="best");
```

# 2. Hyperparameter Tuning Using Grid Search & Randomized Search ¶

All complex machine learning model has more than one hyperparameters. Most of the models have default values set for these parameters. If we fit train data with the default model then it might happen that it does not fit data well. It can overfit data or underfit data as well. We need to find a proper trade-off between overfitting & underfit by doing grid search through various values of hyperparameters of the model.

Grid Search does try the list of all combinations of values given for a list of hyperparameters with model and records the performance of model based on evaluation metrics and keeps track of the best model and hyperparameters as well. We can try all parameters by writing a loop inside a loop for each hyperparameter values.

```X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston,
train_size=0.80,
test_size=0.20,
random_state=12)
```
```from sklearn.ensemble import RandomForestRegressor

best_score = 0.0
best_params = {'max_depth': None, 'max_features': 'auto','n_estimators': 10}
for max_depth in [None, 2,3,5]:
for max_features in ['auto','sqrt', 'log2']:
for n_estimators in [10,100]:
score = cross_val_score(RandomForestRegressor(n_estimators=n_estimators,
max_features=max_features,
max_depth=max_depth,
random_state=123
),
X_train,
Y_train,
cv=ShuffleSplit(n_splits=5, random_state=123),
n_jobs=-1).mean()
if score > best_score:
best_score= score
best_params['max_depth'],best_params['max_features'], best_params['n_estimators'] = max_depth, max_features, n_estimators

print('max_depth : %s, max_features : %s, n_estimators : %s , Average R^2 Score : %.2f'%(str(max_depth), max_features, str(n_estimators), score))

print('\nBest Score : %.2f, Best Params : %s'%(best_score, str(best_params)))
```
```max_depth : None, max_features : auto, n_estimators : 10 , Average R^2 Score : 0.89
max_depth : None, max_features : auto, n_estimators : 100 , Average R^2 Score : 0.90
max_depth : None, max_features : sqrt, n_estimators : 10 , Average R^2 Score : 0.85
max_depth : None, max_features : sqrt, n_estimators : 100 , Average R^2 Score : 0.88
max_depth : None, max_features : log2, n_estimators : 10 , Average R^2 Score : 0.85
max_depth : None, max_features : log2, n_estimators : 100 , Average R^2 Score : 0.88
max_depth : 2, max_features : auto, n_estimators : 10 , Average R^2 Score : 0.68
max_depth : 2, max_features : auto, n_estimators : 100 , Average R^2 Score : 0.68
max_depth : 2, max_features : sqrt, n_estimators : 10 , Average R^2 Score : 0.57
max_depth : 2, max_features : sqrt, n_estimators : 100 , Average R^2 Score : 0.60
max_depth : 2, max_features : log2, n_estimators : 10 , Average R^2 Score : 0.57
max_depth : 2, max_features : log2, n_estimators : 100 , Average R^2 Score : 0.60
max_depth : 3, max_features : auto, n_estimators : 10 , Average R^2 Score : 0.81
max_depth : 3, max_features : auto, n_estimators : 100 , Average R^2 Score : 0.83
max_depth : 3, max_features : sqrt, n_estimators : 10 , Average R^2 Score : 0.76
max_depth : 3, max_features : sqrt, n_estimators : 100 , Average R^2 Score : 0.73
max_depth : 3, max_features : log2, n_estimators : 10 , Average R^2 Score : 0.76
max_depth : 3, max_features : log2, n_estimators : 100 , Average R^2 Score : 0.73
max_depth : 5, max_features : auto, n_estimators : 10 , Average R^2 Score : 0.87
max_depth : 5, max_features : auto, n_estimators : 100 , Average R^2 Score : 0.89
max_depth : 5, max_features : sqrt, n_estimators : 10 , Average R^2 Score : 0.78
max_depth : 5, max_features : sqrt, n_estimators : 100 , Average R^2 Score : 0.81
max_depth : 5, max_features : log2, n_estimators : 10 , Average R^2 Score : 0.78
max_depth : 5, max_features : log2, n_estimators : 100 , Average R^2 Score : 0.81

Best Score : 0.90, Best Params : {'max_depth': None, 'max_features': 'auto', 'n_estimators': 100}
```
```rf_best = RandomForestRegressor(**best_params)
rf_best.fit(X_train, Y_train)

print("Test R^2 Score : ", rf_best.score(X_test, Y_test))
```
```Test R^2 Score :  0.8749413705189064
```

### GridSearchCV ¶

`sklearn` provides `GridSearchCV` class which takes a list of hyperparameters and their values as a dictionary and will try all combinations on the model and also will keep track of results as well for each Cross-Validation Folds.

```from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

grid = GridSearchCV(RandomForestRegressor(random_state=123),
param_grid = {'max_depth': [None, 2,3,5], 'max_features' : ['auto','sqrt', 'log2'], 'n_estimators': [10,100],},
cv = ShuffleSplit(n_splits=5, random_state=123),
verbose=50,
n_jobs=-1)

grid.fit(X_train, Y_train)

print('\nBest R^2 Score : %.2f'%grid.best_score_, ' Best Params : ', str(grid.best_params_))
```
```Fitting 5 folds for each of 24 candidates, totalling 120 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=-1)]: Batch computation too fast (0.0339s.) Setting batch_size=10.
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=-1)]: Done   3 tasks      | elapsed:    0.1s
[Parallel(n_jobs=-1)]: Done   4 tasks      | elapsed:    0.1s
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    0.1s
[Parallel(n_jobs=-1)]: Done   6 tasks      | elapsed:    0.2s
[Parallel(n_jobs=-1)]: Done   7 tasks      | elapsed:    0.2s
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:    0.3s
[Parallel(n_jobs=-1)]: Done  18 tasks      | elapsed:    0.7s
[Parallel(n_jobs=-1)]: Done  28 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done  38 tasks      | elapsed:    0.8s
[Parallel(n_jobs=-1)]: Done  48 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done  58 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done  68 out of 120 | elapsed:    1.3s remaining:    1.0s
[Parallel(n_jobs=-1)]: Done 110 out of 120 | elapsed:    1.8s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done 120 out of 120 | elapsed:    1.9s finished

Best R^2 Score : 0.90  Best Params :  {'max_depth': None, 'max_features': 'auto', 'n_estimators': 100}
```

Grid objects also keep tracks of all hyperparameters tried on all cross-validation splits along with information about their score, fit times, mean scores, standard scores, mean fit times, standard fit times. It also ranks models best on performance with best models ranked 1 and next one 2 and so on.

```grid.cv_results_.keys()
```
`dict_keys(['mean_fit_time', 'std_fit_time', 'mean_score_time', 'std_score_time', 'param_max_depth', 'param_max_features', 'param_n_estimators', 'params', 'split0_test_score', 'split1_test_score', 'split2_test_score', 'split3_test_score', 'split4_test_score', 'mean_test_score', 'std_test_score', 'rank_test_score'])`
```pd.DataFrame(grid.cv_results_)[['param_max_depth', 'param_max_features', 'param_n_estimators','mean_test_score', 'rank_test_score']]
```
param_max_depth param_max_features param_n_estimators mean_test_score rank_test_score
0 None auto 10 0.890995 2
1 None auto 100 0.902970 1
2 None sqrt 10 0.848199 7
3 None sqrt 100 0.875427 4
4 None log2 10 0.848199 7
5 None log2 100 0.875427 4
6 2 auto 10 0.684550 19
7 2 auto 100 0.681664 20
8 2 sqrt 10 0.566652 23
9 2 sqrt 100 0.598139 21
10 2 log2 10 0.566652 23
11 2 log2 100 0.598139 21
12 3 auto 10 0.812761 10
13 3 auto 100 0.825652 9
14 3 sqrt 10 0.757299 15
15 3 sqrt 100 0.727955 17
16 3 log2 10 0.757299 15
17 3 log2 100 0.727955 17
18 5 auto 10 0.873375 6
19 5 auto 100 0.885054 3
20 5 sqrt 10 0.783386 13
21 5 sqrt 100 0.811359 11
22 5 log2 10 0.783386 13
23 5 log2 100 0.811359 11

Grid object also keeps the best model available as the `best_estimator_` parameter so that it can be used for prediction purposes further.

```grid.best_estimator_
```
```RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_jobs=None, oob_score=False, random_state=123, verbose=0,
warm_start=False)```
```print('First Few preds : ', grid.predict(X_boston)[:5])
print('Actual Values   : ', Y_boston[:5])
```
```First Few preds :  [25.36 23.09 35.49 33.91 35.45]
Actual Values   :  [24.  21.6 34.7 33.4 36.2]
```
```print("Test R^2 Score : ", grid.score(X_test, Y_test))
```
```Test R^2 Score :  0.8723191006047755
```

### RandomizedSearchCV ¶

The `RandomizedSearchCV` is another approach of performing hyperparameter tunning. Unlike `GridSearchCV` which tries all possible parameter settings passed to it, `RandomizedSearchCV` tries only a specified number of parameter settings from total parameter search space. It accepts a parameter named `n_iter` (integer) which lets `RandomizedSearchCV` select that many parameter settings from all possible parameter settings to try on model. Below we are explaining the usage of it using Boston housing dataset that was split into train/test sets when explaining `GridSearchCV`.

```from sklearn.model_selection import RandomizedSearchCV

grid = RandomizedSearchCV(RandomForestRegressor(random_state=123), n_iter=5,
param_distributions = {'max_depth': [None, 2,3,5], 'max_features' : ['auto','sqrt', 'log2'], 'n_estimators': [10,100],},
cv = ShuffleSplit(n_splits=5, random_state=123),
verbose=50,
n_jobs=-1)

grid.fit(X_train, Y_train)

print('\nBest R^2 Score : %.2f'%grid.best_score_, ' Best Params : ', str(grid.best_params_))
```
```Fitting 5 folds for each of 5 candidates, totalling 25 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=-1)]: Batch computation too fast (0.0148s.) Setting batch_size=26.
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    0.0s
[Parallel(n_jobs=-1)]: Done   3 out of  25 | elapsed:    0.0s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done   4 out of  25 | elapsed:    0.0s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done   5 out of  25 | elapsed:    0.0s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done   6 out of  25 | elapsed:    0.0s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done   7 out of  25 | elapsed:    0.0s remaining:    0.1s
[Parallel(n_jobs=-1)]: Done   8 out of  25 | elapsed:    0.1s remaining:    0.1s
[Parallel(n_jobs=-1)]: Done  25 out of  25 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  25 out of  25 | elapsed:    0.2s finished

Best R^2 Score : 0.85  Best Params :  {'n_estimators': 10, 'max_features': 'log2', 'max_depth': None}
```

We can notice from the above output that even though a possible number of parameter settings is quite high but it only tries 5 different parameter settings. It’s showing total 25 fits because it'll do cross-validation with 5 splits per each parameter setting.

Below we are printing results of each parameter setting converted to pandas dataframe.

```pd.DataFrame(grid.cv_results_)[['param_max_depth', 'param_max_features', 'param_n_estimators','mean_test_score', 'rank_test_score']]
```
param_max_depth param_max_features param_n_estimators mean_test_score rank_test_score
0 3 sqrt 10 0.757299 4
1 5 log2 10 0.783386 3
2 2 sqrt 10 0.566652 5
3 None log2 10 0.848199 1
4 3 auto 10 0.812761 2
```grid.best_estimator_
```
```RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='log2', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10,
n_jobs=None, oob_score=False, random_state=123, verbose=0,
warm_start=False)```
```print('First Few preds : ', grid.predict(X_boston)[:5])
print('Actual Values   : ', Y_boston[:5])
```
```First Few preds :  [23.9  25.11 37.14 33.9  34.87]
Actual Values   :  [24.  21.6 34.7 33.4 36.2]
```
```print("Test R^2 Score : ", grid.score(X_test, Y_test))
```
```Test R^2 Score :  0.8680488725558099
```

This ends our small tutorial on cross-validation and hyperparameter tunning using a grid search using scikit-learn. Please feel free to let us know your views in the comments section.

Sunny Solanki

## Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

## Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

## Want to Share Your Views? Have Any Suggestions?

If you want to

• provide some suggestions on topic