Regression is a process where we try to predict a continuous target variable based on independent variables. Scikit-Learn offers various regression models for performing regression learning.

- Predicting house price from other attributes like area, no of bedrooms, no of washrooms, parking facility, etc.
- Predicting stock prices based on other attributes.
- Sales of a particular item in the future.
- Temperature prediction
- & many more

Let’s use below scikit-learn's various regression models for our purpose.

Scikit-Learn also provides few datasets in-built with a package that we can load directly into memory and use for our purpose. We'll be using one such dataset called the Boston Housing dataset for our purpose. We'll be predicting the house price of a dataset based on other attributes from the dataset.

Below we are starting with importing necessary libraries.

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
import warnings
import sys
print("Python Version : ",sys.version)
print("Scikit-Learn Version : ",sklearn.__version__)
warnings.filterwarnings("ignore") ## We'll silent future warnings using this command.
np.set_printoptions(precision=3)
## Beow magic function fits plot inside of current notebook.
## There is another option to it (%matplotlib notebook) which opens plot in new notebook.
%matplotlib inline
```

In the Linear Regression Model, we try to fit the line through data in a way that has a minimum distance from all points in the dataset. Once we have found out proper line which has a minimum distance from all points in data and further optimization is not possible then we use that line to do further prediction on unseen data in the future.

It's also known as `Ordinary Least Squares`

because optimization function tries to minimize the squared distance between the line and all points in Train/Test Set.

We'll load Boston housing data provided by scikit-learn. It returns Bunch object which is almost the same as the dictionary. We'll also print details about the dataset.

```
from sklearn.datasets import load_boston ## function for loading boston data.
boston = load_boston()
#print(type(boston)) ## It returns Bunch object which is similar to dictionary.
#print(boston.DESCR) ## DESCR attribute describes dataset.
print('Feature Names : ' + str(boston.feature_names))
print('Dataset shape : ' + str(boston.data.shape))
print('Target shape : ' + str(boston.target.shape))
```

We'll split the dataset into two parts:

`Training data`

which will be used for the training model.`Test data`

against which accuracy of the trained model will be checked.

`train_test_split`

function of `model_selection`

module of sklearn will help us split data into two sets with `80%`

for training and `20%`

for test purposes. We are also using `seed(random_state=123)`

with `train_test_split`

so that we always get the same split and can reproduce results in the future as well.

```
from sklearn.model_selection import train_test_split # Function for splitting dataset into train/test set.
X = boston.data
Y = boston.target
## We can specify either one of train_size and test_size. Sklearn find out other by itself. I included both for explanation purpose.
## random_state is used to reproduce same data splits again. If we don't set random_state then it generates different splits everytime.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, test_size = 0.2, random_state = 123)
print('Train & Test sizes : ',X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
```

We are initializing the LinearRegression model below which is the basic model used extensively for regression tasks.

```
from sklearn.linear_model import LinearRegression ## Linear Regression Implementation
linear_regressor = LinearRegression()
linear_regressor
```

We can train a model by passing train data and train labels. It returns objects of trained classifier as well after training.

```
linear_regressor.fit(X_train,Y_train)
```

Almost all models in Scikit-Learn API provides `predict()`

method which can be used to predict the target variables on Test Set passed to it.

We are comparing below housing prices predicted by our model with actual house prices of test data and train data.

```
y_test_pred = linear_regressor.predict(X_test)
print('First Few Actual Housing Prices(Test Data) : ' + str(Y_test[:5]))
print('First Few Predicted Housing Prices(Test Data) : ' + str(y_test_pred[:5]))
```

Scikit-Learn's LinearRegresson model has a `score()`

method which returns coefficient of determination $R^2$ based on the dataset and target variables passed to it. It returns a value between [0-1] with 1 being best. If it returns negative value means that the model performed quite bad.

**Note:** Do not confuse $R^2$ with MSE as both are quite different. One can calculate MSE by using `mean_squared_error`

provided by the `metrics`

module of sklearn.

**Formula of $R^2:$**

$R^2 = (1 - u/v)$

where

$u = MSE =((y_{true} - y_{pred})^2).sum()$

$v=((y_{true} - y_{true}.mean())^2).sum()$

```
print('R^2 Score on Test Data : %.3f'%linear_regressor.score(X_train, Y_train))
```

As we discussed above, linear regression tries to generate lines through data in a way that mean squared error between actual labels and target is least. It is also the reason why its referred to as Ordinary Least Squares by many ML Practitioners as it tries to minimize squared differences between predicted and actual labels. We can access coordinates of that line through `coef_`

and `intercept_`

attributes of regressor.

```
print('Weight Coefficients : '+ str(linear_regressor.coef_))
print('\nY-Axis Intercept : '+ str(linear_regressor.intercept_))
```

```
sorted_labels_acc_to_test_y = list(sorted(zip(Y_test, y_test_pred), key=lambda x: x[1]))
sorted_test_y, sorted_test_preds = zip(*sorted_labels_acc_to_test_y)
with plt.style.context(('ggplot', 'seaborn')):
plt.scatter(range(len(sorted_test_y)),sorted_test_y, s=75, alpha=0.7, label='Actual')
plt.scatter(range(len(sorted_test_preds)), sorted_test_preds, s=75, alpha=0.7, label='Prediction')
plt.ylabel('House Price')
plt.title('Actual vs Predicted House Prices of Test Data')
plt.legend(loc='best')
```

Ridge regression is another estimator where we introduce regularization(`L2 regularization`

) in the cost minimization function. The introduction of this regularization pushes all weights near zero but not making them exactly zero. It makes all the weight quite small.

```
from sklearn.linear_model import Ridge ## Linear Regression Implementation
ridge_regressor = Ridge()
ridge_regressor
```

```
ridge_regressor.fit(X_train,Y_train)
```

```
y_test_pred = ridge_regressor.predict(X_test)
print('First Few Actual Housing Prices(Test Data) : ' + str(Y_test[:5]))
print('First Few Predicted Housing Prices(Test Data) : ' + str(y_test_pred[:5]))
print('\nR^2 Score on Test Data : %.3f'%ridge_regressor.score(X_test, Y_test))
```

```
sorted_labels_acc_to_test_y = list(sorted(zip(Y_test, y_test_pred), key=lambda x: x[1]))
sorted_test_y, sorted_test_preds = zip(*sorted_labels_acc_to_test_y)
with plt.style.context(('ggplot', 'seaborn')):
plt.scatter(range(len(sorted_test_y)),sorted_test_y, s=75, alpha=0.7, label='Actual')
plt.scatter(range(len(sorted_test_preds)), sorted_test_preds, s=75, alpha=0.7, label='Prediction')
plt.ylabel('House Price')
plt.title('Actual vs Predicted House Prices of Test Data')
plt.legend(loc='best')
```

Below is a list of hyperparameters that we can tune to get the best estimator for our data.

**fit_intercept**- It's boolean value referring whether to include`intercept`

in model or not ($y =mx + c$ - here`c`

is referring to intercept).`default=True`

**alpha**- It's regularization strength and helps in reducing overfitting.`default=1.0`

**solver**- Algorithms for optimization. It accepts string from list ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']`default=auto`

**max_iter**- It refers to maximum number of iterations for solver to try.`default=1000`

It's a wrapper class provided by sklearn which loops through all parameters provided as `params_grid`

parameter with a number of cross-validation folds provided as `cv`

parameter, evaluates model performance on all combinations and stores all results in `cv_results_`

attribute. It also stores model which performs best in all cross-validation folds in `best_estimator_`

attribute and best score in `best_score_`

attribute.

**Note:** `n_jobs`

parameter is provided by many estimators. It accepts a number of cores to use for parallelization. If the value of `-1`

is given then it uses all cores. We are also using `%%time`

which jupyter notebook cell magic command which prints time taken by that cell to complete running. Time will be different on different computers based on their configurations.

```
%%time
from sklearn.model_selection import GridSearchCV
params = {'alpha' : [500, 200, 100, 50,10, 1, 0.1, 0.01],
'fit_intercept': [True, False],
'solver': ['svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']}
grid = GridSearchCV(Ridge(random_state=1), param_grid=params, cv=3, n_jobs=-1)
grid.fit(X_train, Y_train)
print('Train Accuracy : %.3f'%grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%grid.best_estimator_.score(X_test, Y_test))
print('Best Score Through Grid Search : %.3f'%grid.best_score_)
print('Best Parameters : ',grid.best_params_)
```

GridSearchCV object maintains all different parameters tried and results generated for each split of data in an attribute `cv_results_`

as a dictionary. Below we are loading that cross-validation results as pandas dataframe and printing first few entries.

```
cross_val_results = pd.DataFrame(grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.
```

Lasso Regression is another estimator where we introduce an L1 type of regularization in cost minimization function. L1 type regularization makes few coefficients zero whichever does not have much influence on target variable prediction.

```
from sklearn.linear_model import Lasso
lasso_regressor = Lasso()
lasso_regressor
```

```
lasso_regressor.fit(X_train,Y_train)
```

```
y_test_pred = lasso_regressor.predict(X_test)
print('First Few Actual Housing Prices(Test Data) : ' + str(Y_test[:5]))
print('First Few Predicted Housing Prices(Test Data) : ' + str(y_test_pred[:5]))
print('\nR^2 Score on Test Data : %.3f'%lasso_regressor.score(X_test, Y_test))
```

```
sorted_labels_acc_to_test_y = list(sorted(zip(Y_test, y_test_pred), key=lambda x: x[1]))
sorted_test_y, sorted_test_preds = zip(*sorted_labels_acc_to_test_y)
with plt.style.context(('ggplot', 'seaborn')):
plt.scatter(range(len(sorted_test_y)),sorted_test_y, s=75, alpha=0.7, label='Actual')
plt.scatter(range(len(sorted_test_preds)), sorted_test_preds, s=75, alpha=0.7, label='Prediction')
plt.ylabel('House Price')
plt.title('Actual vs Predicted House Prices of Test Data')
plt.legend(loc='best')
```

Lasso has exactly the same hyperparameters to tune as that of ridge regression except that it does not have different solver available like ridge.

```
%%time
from sklearn.model_selection import GridSearchCV
params = {'alpha' : [500, 200, 100, 50,10, 1, 0.1, 0.01],
'fit_intercept': [True, False],
}
grid = GridSearchCV(Lasso(random_state=1), param_grid=params, cv=3, n_jobs=-1)
grid.fit(X_train, Y_train)
print('Train Accuracy : %.3f'%grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%grid.best_estimator_.score(X_test, Y_test))
print('Best Score Through Grid Search : %.3f'%grid.best_score_)
print('Best Parameters : ',grid.best_params_)
```

```
cross_val_results = pd.DataFrame(grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.
```

ElasticNet is another estimator that uses both L1 and L2 penalty. It's useful in cases where few features are related to one another.

```
from sklearn.linear_model import ElasticNet
elasticnet_regressor = ElasticNet()
elasticnet_regressor
```

```
elasticnet_regressor.fit(X_train,Y_train)
```

```
y_test_pred = elasticnet_regressor.predict(X_test)
print('First Few Actual Housing Prices(Test Data) : ' + str(Y_test[:5]))
print('First Few Predicted Housing Prices(Test Data) : ' + str(y_test_pred[:5]))
print('\nR^2 Score on Test Data : %.3f'%elasticnet_regressor.score(X_test, Y_test))
```

```
sorted_labels_acc_to_test_y = list(sorted(zip(Y_test, y_test_pred), key=lambda x: x[1]))
sorted_test_y, sorted_test_preds = zip(*sorted_labels_acc_to_test_y)
with plt.style.context(('ggplot', 'seaborn')):
plt.scatter(range(len(sorted_test_y)),sorted_test_y, s=75, alpha=0.7, label='Actual')
plt.scatter(range(len(sorted_test_preds)), sorted_test_preds, s=75, alpha=0.7, label='Prediction')
plt.ylabel('House Price')
plt.title('Actual vs Predicted House Prices of Test Data')
plt.legend(loc='best')
```

ElasticNet has all parameters the same as that of Ridge and Lasso with one extra parameter which maintains the proportion of L1 and L2 penalty to be used in the regression model.

**l1_ratio**- It's float value between [0,1] for controlling proportion of L1 and L2 penalties. The Value of 0 refers to the L2 penalty and the value of 1 refers to the L1 penalty. All in-between values refers to combinations of both L1 and L2.`default=0.5`

```
%%time
from sklearn.model_selection import GridSearchCV
params = {'alpha' : [500, 200, 100, 50,10, 1, 0.1, 0.01],
'fit_intercept': [True, False],
'l1_ratio': [0,0.3, 0.5, 0.7, 1.0]
}
grid = GridSearchCV(ElasticNet(random_state=1), param_grid=params, cv=3, n_jobs=-1)
grid.fit(X_train, Y_train)
print('Train Accuracy : %.3f'%grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%grid.best_estimator_.score(X_test, Y_test))
print('Best Score Through Grid Search : %.3f'%grid.best_score_)
print('Best Parameters : ',grid.best_params_)
```

```
cross_val_results = pd.DataFrame(grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.
```

Please make a note of $R^2$ calculated by each model on the train and test data. Ridge performs better than Linear Regression. Lasso Performs better than Ridge and Linear Regression. Elastic Net seems to perform almost the same as Lasso or little better than it. One can try these models on various datasets to check the performance of each one.

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at **coderzcolumn07@gmail.com**. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs