## Homework 1: Lasso and Ridge regression on the Advertising data set

In this homework you will perform lasso and ridge regression on the advertising data set.
The following cell imports the data set (adjust the path as necessary).

In [2]:
import pandas as pd

adv = pd.read_csv('./datasets/Advertising.csv', index_col=0)
X = adv.values[:,0:3]
y = adv.values[:,3]

**Task**: Perform both lasso and ridge regression on the data.
Read the documentation for the functions `RidgeCV` and `LassoCV` from `sklearn.linear_model`.
These functions select the regularization parameter by cross-validation, as we did by hand in Problem 8.
Use 5-fold cross-validation and the following range for $\alpha$: `np.logspace(-4,4,100)`.

**Solution**:

In [3]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LassoCV, RidgeCV

adv = pd.read_csv('./datasets/Advertising.csv', index_col=0)
X = adv.values[:,0:3]
y = adv.values[:,3]

Alpha = np.logspace(-4,4,100)

ridge_poly = RidgeCV(cv=5,alphas=Alpha)
ridge_poly.fit(X,y)

lasso_poly = LassoCV(cv=5, alphas=Alpha)
lasso_poly.fit(X,y)

LassoCV(alphas=array([1.00000e-04, 1.20450e-04, ..., 8.30218e+03, 1.00000e+04]),
    copy_X=True, cv=5, eps=0.001, fit_intercept=True, max_iter=1000,
    n_alphas=100, n_jobs=None, normalize=False, positive=False,
    precompute='auto', random_state=None, selection='cyclic', tol=0.0001,
    verbose=False)

**Task**: Compute the $R^2$-scores for both lasso and ridge regression. Which coefficients are selected by the lasso?

**Solution**:

In [4]:
ridge_poly.coef_

array([ 4.57629772e-02,  1.84966340e-01, -1.75482797e-04])

In [5]:
ridge_poly.score(X,y)

0.8971206766074686

In [6]:
lasso_poly.coef_

array([0.04557567, 0.17930643, 0.        ])

In [7]:
lasso_poly.score(X,y)

0.8965664034716233

**Task**: At the end of Exercise 3 and in the lecture, we noticed that there may exist interactions or higher order variations among the predictor variables.
Use the function `PolynomialFeatures` from `sklearn.preprocessing` to add quadratic terms as well as interaction terms between the predictors. How does this improve your $R^2$-score?

**Solution**:

In [8]:
from sklearn.preprocessing import PolynomialFeatures
deg2 = PolynomialFeatures(degree=2)
X2 = deg2.fit_transform(X)

ridge_poly = RidgeCV(cv=5,alphas=Alpha)
ridge_poly.fit(X2,y)

lasso_poly = LassoCV(cv=5, alphas=Alpha)
lasso_poly.fit(X2,y);

In [9]:
ridge_poly.coef_

array([ 0.00000000e+00,  4.96893182e-02,  9.42785581e-03,  4.92727481e-03,
       -1.05025820e-04,  1.12424510e-03, -4.32094721e-05,  2.66284411e-04,
        1.08818342e-04,  2.06637263e-05])

In [10]:
ridge_poly.score(X2,y)

0.986396878676749

In [11]:
lasso_poly.coef_

array([ 0.00000000e+00,  5.01397308e-02,  0.00000000e+00,  0.00000000e+00,
       -1.07173978e-04,  1.12719843e-03, -3.85449940e-05,  4.07309225e-04,
        1.71195821e-04,  4.70557602e-05])

In [12]:
lasso_poly.score(X2,y)

0.9862333741123677