{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Homework 1: Lasso and Ridge regression on the Advertising data set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this homework you will perform lasso and ridge regression on the advertising data set.\n", "The following cell imports the data set (adjust the path as necessary)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "adv = pd.read_csv('./datasets/Advertising.csv', index_col=0)\n", "X = adv.values[:,0:3]\n", "y = adv.values[:,3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Perform both lasso and ridge regression on the data.\n", "Read the documentation for the functions `RidgeCV` and `LassoCV` from `sklearn.linear_model`.\n", "These functions select the regularization parameter by cross-validation, as we did by hand in Problem 8.\n", "Use 5-fold cross-validation and the following range for $\\alpha$: `np.logspace(-4,4,100)`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Solution**:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LassoCV(alphas=array([1.00000e-04, 1.20450e-04, ..., 8.30218e+03, 1.00000e+04]),\n", " copy_X=True, cv=5, eps=0.001, fit_intercept=True, max_iter=1000,\n", " n_alphas=100, n_jobs=None, normalize=False, positive=False,\n", " precompute='auto', random_state=None, selection='cyclic', tol=0.0001,\n", " verbose=False)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "from sklearn.linear_model import LassoCV, RidgeCV\n", "\n", "adv = pd.read_csv('./datasets/Advertising.csv', index_col=0)\n", "X = adv.values[:,0:3]\n", "y = adv.values[:,3]\n", "\n", "Alpha = np.logspace(-4,4,100)\n", "\n", "ridge_poly = RidgeCV(cv=5,alphas=Alpha)\n", "ridge_poly.fit(X,y)\n", "\n", "lasso_poly = LassoCV(cv=5, alphas=Alpha)\n", "lasso_poly.fit(X,y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Compute the $R^2$-scores for both lasso and ridge regression. Which coefficients are selected by the lasso?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Solution**:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "array([ 4.57629772e-02, 1.84966340e-01, -1.75482797e-04])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ridge_poly.coef_" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "0.8971206766074686" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ridge_poly.score(X,y)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.04557567, 0.17930643, 0. ])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lasso_poly.coef_" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "0.8965664034716233" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lasso_poly.score(X,y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: At the end of Exercise 3 and in the lecture, we noticed that there may exist interactions or higher order variations among the predictor variables.\n", "Use the function `PolynomialFeatures` from `sklearn.preprocessing` to add quadratic terms as well as interaction terms between the predictors. How does this improve your $R^2$-score?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Solution**:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import PolynomialFeatures\n", "deg2 = PolynomialFeatures(degree=2)\n", "X2 = deg2.fit_transform(X)\n", "\n", "ridge_poly = RidgeCV(cv=5,alphas=Alpha)\n", "ridge_poly.fit(X2,y)\n", "\n", "lasso_poly = LassoCV(cv=5, alphas=Alpha)\n", "lasso_poly.fit(X2,y);" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "array([ 0.00000000e+00, 4.96893182e-02, 9.42785581e-03, 4.92727481e-03,\n", " -1.05025820e-04, 1.12424510e-03, -4.32094721e-05, 2.66284411e-04,\n", " 1.08818342e-04, 2.06637263e-05])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ridge_poly.coef_" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "0.986396878676749" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ridge_poly.score(X2,y)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.00000000e+00, 5.01397308e-02, 0.00000000e+00, 0.00000000e+00,\n", " -1.07173978e-04, 1.12719843e-03, -3.85449940e-05, 4.07309225e-04,\n", " 1.71195821e-04, 4.70557602e-05])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lasso_poly.coef_" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "0.9862333741123677" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lasso_poly.score(X2,y)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }