{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n",
    "\n",
    "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name below.\n",
    "\n",
    "Rename this problem sheet as follows:\n",
    "\n",
    "    ps{number of lab}_{your user name}_problem{number of problem sheet in this lab}\n",
    "    \n",
    "for example\n",
    "    \n",
    "    ps2_blja_problem1\n",
    "\n",
    "Submit your homework within one week until next Monday, 9 a.m."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "NAME = \"\"\n",
    "EMAIL = \"\"\n",
    "USERNAME = \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Data Science"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab 12 - Lasso and Ridge regression on the Advertising data set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this problem you will perform lasso and ridge regression on the advertising data set.\n",
    "The following cell imports the data set (adjust the path as necessary)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "adv = pd.read_csv('./Advertising.csv', index_col=0)\n",
    "X = adv.values[:,0:3]\n",
    "y = adv.values[:,3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task (2 points)**: Perform both lasso and ridge regression on the data.\n",
    "Read the documentation for the functions `RidgeCV` and `LassoCV` from `sklearn.linear_model`.\n",
    "These functions select the regularization parameter by cross-validation, as we did by hand in Problem 8.\n",
    "Use 5-fold cross-validation and the following range for $\\alpha$: `np.logspace(-4,4,100)`. The remaining default parameters should be left as they are.\n",
    "\n",
    "Train your models `ridge_model` and `lasso_model` using all of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "2ca0ba50b87a862844fddd9928eb4353",
     "grade": false,
     "grade_id": "cell-97825af95e882d93",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.linear_model import LassoCV, RidgeCV\n",
    "\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "172cee96d23e72df337743af168fd8a0",
     "grade": true,
     "grade_id": "cell-3c7260bb60d04644",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert type(ridge_model) == RidgeCV\n",
    "assert type(lasso_model) == LassoCV\n",
    "assert abs(ridge_model.coef_.mean() - 0.07685127823946546) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task (1 point)**: Compute the $R^2$-scores for both lasso and ridge regression and store them in `r2_ridge` and `r2_lasso`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "7a5b61a809bae2e06baed84e9f4871c5",
     "grade": false,
     "grade_id": "cell-e674e10749bf7258",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "5ab4bc8a69e999085a589c551330e82f",
     "grade": true,
     "grade_id": "cell-95a81a0abd58874e",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert abs(r2_ridge - 0.8971206766074686) < 1e-8\n",
    "assert abs(r2_lasso - 0.8965664034716233) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Which coefficients are selected by the lasso?\n",
    "Store them in the list `lasso_sel` and use the enumeration common in python, i.e., start with 0 instead of 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "8dce06034ec03480580950d050e4e9fa",
     "grade": false,
     "grade_id": "cell-6e7d3fd7404629d0",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d4d581387f5f2425be24f14a2aa30892",
     "grade": true,
     "grade_id": "cell-bd4d8f08c56c81aa",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert type(lasso_sel) == list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task (1 point)**: In the lecture we noticed that there may exist interactions or higher order variations among the predictor variables.\n",
    "Use the function `PolynomialFeatures` from `sklearn.preprocessing` to add quadratic terms as well as interaction terms between the predictors and store this array as `X2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "57552a9f80dd76cf40195f889302b9f9",
     "grade": false,
     "grade_id": "cell-dbc78ac847f93e4d",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d4ecacf285feb51a648bb4f74c1d4cc3",
     "grade": true,
     "grade_id": "cell-69bfed8b2a1a1e4c",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert X2.shape == (200, 10)\n",
    "assert abs(X2.mean() - 4023.66445) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task (1 point)**: Again, use `RidgeCV` and `LassoCV` with 5-fold cross-validation and the same range of possible $\\alpha$-values from above to train these two models.\n",
    "Store them as `ridge_model2` and `lasso_model2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d500660f36898b08d979658f5e778210",
     "grade": false,
     "grade_id": "cell-3056024899b7f417",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "874202e738915a6906ee0b0d2ba01030",
     "grade": true,
     "grade_id": "cell-53f1cdb01f2fc159",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert type(ridge_model2) == RidgeCV\n",
    "assert type(lasso_model2) == LassoCV\n",
    "assert abs(ridge_model2.coef_.mean() - 0.006541622512541504) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task (1 point)**: How does this improve your $R^2$-score? Store the $R^2$-scores in variables `r2_ridge_model2` and `r2_lasso_model2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d8425f69649cf9d7c570e5afae51021f",
     "grade": false,
     "grade_id": "cell-13c0907c2dfc57e0",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "5845ac8e0e2c8acd3ab27133cc7b4559",
     "grade": true,
     "grade_id": "cell-6cac1728dc167fd8",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert abs(r2_ridge_model2 - 0.986396878676749) < 1e-8\n",
    "assert abs(r2_lasso_model2 - 0.9862333741123677) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task (1 point)**: What columns of the enhanced array `X2` are selected by the trained lasso model. Use python enumeration and store your answer in the list `lasso_sel2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d7847757f14aa8371432f93ce915ccc4",
     "grade": false,
     "grade_id": "cell-6e317cd4db97bcc9",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "0689a37ad3a2a3dd5b37368cdcb2f4e6",
     "grade": true,
     "grade_id": "cell-a7d36986eb7699ae",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert type(lasso_sel2) == list\n",
    "assert lasso_sel2 == [1, 4, 5, 6, 7, 8, 9]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}