{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n", "\n", "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name below.\n", "\n", "Rename this problem sheet as follows:\n", "\n", " ps{number of lab}_{your user name}_problem{number of problem sheet in this lab}\n", " \n", "for example\n", " \n", " ps2_blja_problem1\n", "\n", "Submit your homework within one week until next Monday, 9 a.m." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "NAME = \"\"\n", "EMAIL = \"\"\n", "USERNAME = \"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Data Science" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lab 12 - Lasso and Ridge regression on the Advertising data set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this problem you will perform lasso and ridge regression on the advertising data set.\n", "The following cell imports the data set (adjust the path as necessary)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "adv = pd.read_csv('./Advertising.csv', index_col=0)\n", "X = adv.values[:,0:3]\n", "y = adv.values[:,3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task (2 points)**: Perform both lasso and ridge regression on the data.\n", "Read the documentation for the functions `RidgeCV` and `LassoCV` from `sklearn.linear_model`.\n", "These functions select the regularization parameter by cross-validation, as we did by hand in Problem 8.\n", "Use 5-fold cross-validation and the following range for $\\alpha$: `np.logspace(-4,4,100)`. The remaining default parameters should be left as they are.\n", "\n", "Train your models `ridge_model` and `lasso_model` using all of the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "2ca0ba50b87a862844fddd9928eb4353", "grade": false, "grade_id": "cell-97825af95e882d93", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "import numpy as np\n", "from sklearn.linear_model import LassoCV, RidgeCV\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "172cee96d23e72df337743af168fd8a0", "grade": true, "grade_id": "cell-3c7260bb60d04644", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert type(ridge_model) == RidgeCV\n", "assert type(lasso_model) == LassoCV\n", "assert abs(ridge_model.coef_.mean() - 0.07685127823946546) < 1e-8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task (1 point)**: Compute the $R^2$-scores for both lasso and ridge regression and store them in `r2_ridge` and `r2_lasso`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "7a5b61a809bae2e06baed84e9f4871c5", "grade": false, "grade_id": "cell-e674e10749bf7258", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "5ab4bc8a69e999085a589c551330e82f", "grade": true, "grade_id": "cell-95a81a0abd58874e", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert abs(r2_ridge - 0.8971206766074686) < 1e-8\n", "assert abs(r2_lasso - 0.8965664034716233) < 1e-8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Which coefficients are selected by the lasso?\n", "Store them in the list `lasso_sel` and use the enumeration common in python, i.e., start with 0 instead of 1." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "8dce06034ec03480580950d050e4e9fa", "grade": false, "grade_id": "cell-6e7d3fd7404629d0", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "d4d581387f5f2425be24f14a2aa30892", "grade": true, "grade_id": "cell-bd4d8f08c56c81aa", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert type(lasso_sel) == list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task (1 point)**: In the lecture we noticed that there may exist interactions or higher order variations among the predictor variables.\n", "Use the function `PolynomialFeatures` from `sklearn.preprocessing` to add quadratic terms as well as interaction terms between the predictors and store this array as `X2`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "57552a9f80dd76cf40195f889302b9f9", "grade": false, "grade_id": "cell-dbc78ac847f93e4d", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "d4ecacf285feb51a648bb4f74c1d4cc3", "grade": true, "grade_id": "cell-69bfed8b2a1a1e4c", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert X2.shape == (200, 10)\n", "assert abs(X2.mean() - 4023.66445) < 1e-8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task (1 point)**: Again, use `RidgeCV` and `LassoCV` with 5-fold cross-validation and the same range of possible $\\alpha$-values from above to train these two models.\n", "Store them as `ridge_model2` and `lasso_model2`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "d500660f36898b08d979658f5e778210", "grade": false, "grade_id": "cell-3056024899b7f417", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "874202e738915a6906ee0b0d2ba01030", "grade": true, "grade_id": "cell-53f1cdb01f2fc159", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert type(ridge_model2) == RidgeCV\n", "assert type(lasso_model2) == LassoCV\n", "assert abs(ridge_model2.coef_.mean() - 0.006541622512541504) < 1e-8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task (1 point)**: How does this improve your $R^2$-score? Store the $R^2$-scores in variables `r2_ridge_model2` and `r2_lasso_model2`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "d8425f69649cf9d7c570e5afae51021f", "grade": false, "grade_id": "cell-13c0907c2dfc57e0", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "5845ac8e0e2c8acd3ab27133cc7b4559", "grade": true, "grade_id": "cell-6cac1728dc167fd8", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert abs(r2_ridge_model2 - 0.986396878676749) < 1e-8\n", "assert abs(r2_lasso_model2 - 0.9862333741123677) < 1e-8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task (1 point)**: What columns of the enhanced array `X2` are selected by the trained lasso model. Use python enumeration and store your answer in the list `lasso_sel2`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "d7847757f14aa8371432f93ce915ccc4", "grade": false, "grade_id": "cell-6e317cd4db97bcc9", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "0689a37ad3a2a3dd5b37368cdcb2f4e6", "grade": true, "grade_id": "cell-a7d36986eb7699ae", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert type(lasso_sel2) == list\n", "assert lasso_sel2 == [1, 4, 5, 6, 7, 8, 9]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }