{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n",
    "\n",
    "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name below.\n",
    "\n",
    "Rename this problem sheet as follows:\n",
    "\n",
    "    ps{number of lab}_{your user name}_problem{number of problem sheet in this lab}\n",
    "    \n",
    "for example\n",
    "    \n",
    "    ps2_blja_problem1\n",
    "\n",
    "Submit your homework within one week until next Monday, 9 a.m."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "NAME = \"\"\n",
    "EMAIL = \"\"\n",
    "USERNAME = \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "c1d10555f944a7d1cf302f22ca3825f7",
     "grade": false,
     "grade_id": "cell-2c0be3d551ee40ab",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "# Introduction to Data Science\n",
    "## Lab 4: Further aspects of linear regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "6c4344f7c14157f80995777e849d035d",
     "grade": false,
     "grade_id": "cell-6e6033c0dc952d4a",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "### Part A - Limitations of the t-test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "3fb22f0f77f31ca26dc8fce8dae156e2",
     "grade": false,
     "grade_id": "cell-6e50608863f3b9af",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "In this notebook, we investigate the limitations of a single-variable **t-test** for the predictor coefficients $\\beta$ in a linear regression setting.\n",
    "Recall the following statements from the lecture (Slide 105):\n",
    "* Does a single small $p$-value indicate at least one variable relevant? No.\n",
    "* Example: $p=100$, $H_0 : \\beta_1 = \\dots = \\beta_p = 0$ true. Then by chance, $5\\%$ of $p$-values below $0.05$. Almost guaranteed that $p<0.05$ for at least one variable by chance.\n",
    "* Thus, for large $p$, looking only at $p$-values of individual $t$-statistics tends to discover spurious relationships.\n",
    "\n",
    "In what follows, we use slightly different values than in the above mentioned example, setting $n = 100$ and $p = 20$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "20cfc0dfdc9f2f4d47ee527a4d26ba5a",
     "grade": false,
     "grade_id": "cell-9bf98be6407d8b03",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Set parameters n (number of training samples) and p (number of predictor variables)\n",
    "n = 100\n",
    "p = 20"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "aab988416708ad77fdb83c3e4da77d67",
     "grade": false,
     "grade_id": "cell-6d855e4e79bf6669",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "For this purpose, we generate random uncorrelated input and output vectors.\n",
    "\n",
    "**Task**: Write the function `drawSample` that generates **uniformly distributed** arrays of random variables\n",
    "* $X$ should be of size (n, p+1) with values in $[0,1]$; the first column is reserved for the intercept and should contain a only ones\n",
    "* $y$ should be of size (n,) with values in $[-0.5,0.5]$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "7aa585db04266dd7eea6fc43e5f75839",
     "grade": false,
     "grade_id": "cell-8b1fa66b64d265fb",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def drawSample(n,p):\n",
    "    \"\"\" This function draws a\n",
    "    sample for our experiment. \"\"\"\n",
    "    \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()\n",
    "    \n",
    "    return (X,y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "054f97fab92420b6f28b53f990dea3fb",
     "grade": true,
     "grade_id": "cell-c21e8f578a5dd23f",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert drawSample(40,4)[0].shape == (40,5), 'Wrong shape of X'\n",
    "assert drawSample(40,4)[1].shape == (40,), 'Wrong shape of y'\n",
    "assert all(drawSample(40,4)[0][:,0]==1), 'Check the first column of X'\n",
    "assert drawSample(40,4)[1].min() > -0.5 and drawSample(40,4)[1].max() < 0.5, 'Wrong range of y'\n",
    "assert drawSample(40,4)[0].min() > 0 and drawSample(40,4)[0].max() <= 1, 'Wrong range of X'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "220a4f82353a9b6305bf253437852e67",
     "grade": false,
     "grade_id": "cell-6c4fe93ee7103cf8",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "The following function computes single-variable t-statistics for the model\n",
    "$$ y \\approx X \\beta $$\n",
    "whose parameters $\\beta \\in \\mathbb{R}^{p+1}$ are estimated via\n",
    "$$ \\hat \\beta = (X^\\top X)^{-1} X^\\top y. $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "58b2a92b0a16f788fca540080f1f61f7",
     "grade": false,
     "grade_id": "cell-888372a43aa00783",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "from scipy.stats import t\n",
    "\n",
    "def printTStatistic(X, y, p_threshold = 0.10, print_table=True):\n",
    "    n, m = X.shape\n",
    "    p = m - 1\n",
    "\n",
    "    # Invert X^T * X\n",
    "    V = np.linalg.inv((X.T).dot(X))\n",
    "    \n",
    "\n",
    "    # Compute regression coefficients beta\n",
    "    beta = V.dot( X.T.dot(y) )\n",
    "\n",
    "    # Extract diagonal of matrix (X^T * X)^-1\n",
    "    v = V.diagonal()\n",
    "\n",
    "    # Predict y using beta\n",
    "    y_pred = X.dot(beta)\n",
    "\n",
    "    # Compute estimate of sigma\n",
    "    sigma_hat = np.sqrt( 1./(n-p-1) * np.power(y - y_pred,2).sum() )\n",
    "\n",
    "    # Compute the standard errors\n",
    "    SE = np.sqrt(v) * sigma_hat\n",
    "\n",
    "    # Compute the values of the t-statistic\n",
    "    t_vals = beta / SE\n",
    "\n",
    "    # Compute the corresponding p values\n",
    "    p_vals = 2*t.cdf(-np.absolute(t_vals), n-p-1)\n",
    "\n",
    "    if print_table:\n",
    "        \n",
    "        # Print header\n",
    "        print('|  Coefficient  | Estimate |    SE    | t-statistic |  p-value  | p < %4.2f |' % p_threshold)\n",
    "        print('----------------------------------------------------------------------------')\n",
    "        \n",
    "        # Print \n",
    "        for i in range(p+1):\n",
    "            pval = p_vals[i]\n",
    "            if pval < 0.0001:\n",
    "                pval_str = '< 0.0001'\n",
    "            else:\n",
    "                pval_str = '  %5.4f' % pval\n",
    "            print('|    beta_%02d    |  %6.3f  |  %6.4f  |    %5.2f    | %s  |     %d    |' % (i, beta[i], SE[i], t_vals[i], pval_str, pval < p_threshold))\n",
    "    \n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "c00a6dfcd14ba5b82ecd3b1e734dfa41",
     "grade": false,
     "grade_id": "cell-982d858d0c6e6425",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task**: Test the function `printTStatistic` using an example drawn with your function `drawSample`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "7e24ae0ba1409b728f705385c3b77c28",
     "grade": true,
     "grade_id": "cell-92e5e11d02c2543d",
     "locked": false,
     "points": 1,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "6358fe85b9acff1f9ab728f98f59f362",
     "grade": false,
     "grade_id": "cell-54186beed1ed6c35",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Now, we want to find out, how many predictor variables are statistically significant for a threshold of $0.10$ in our setting with `n = 100` and `p = 20`.\n",
    "\n",
    "**Task**: Expand the function `printTStatistic` from above. It should **return the proportion of significant predictor variables** at a certain threshold `p_threshold`. Test it using the example below; execute the next cell multiple times (by hitting `Ctrl + Enter`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "44956a27060c2be1608c5e6aa875677c",
     "grade": true,
     "grade_id": "cell-cec12dcb08dc21c7",
     "locked": false,
     "points": 1,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "cb5b56cc5e6504f7b5b7bf3964d9ea86",
     "grade": false,
     "grade_id": "cell-0ada6a2275e9f16e",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task**: Write a small script that carries out the experiment `1000` times and computes the mean proportion of significant values in our experiment. It should be around `p_threshold`.\n",
    "\n",
    "**Hint 1**: Use the keyword argument `print_table` to suppress the printing of the tables.\n",
    "\n",
    "**Hint 2**: You can collect the returned values in a list initialized by `vals = []`. You can append a new value `new_val` using `vals.append(new_val)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "e7d2eff96689491da971484e4e87d239",
     "grade": true,
     "grade_id": "cell-900fd83ad0f07f69",
     "locked": false,
     "points": 2,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "612e5cbde5a2cfc9bb18094f6997d14d",
     "grade": false,
     "grade_id": "cell-ade74e5fc814e08d",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "### Part B: \"Nonlinear\" linear regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "f60c62a1297b2604afb2f429fcfa3b20",
     "grade": false,
     "grade_id": "cell-56275c08b9d03356",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "The goal of this problem is to approximate given data points $(x_i,y_i)$ for $i=1,\\ldots,n$ by polynomials of degree $p$.\n",
    "This can be done by solving the linear regression problem:\n",
    "\n",
    "$$\n",
    " y_i \\approx \\beta_0 + \\beta_1 \\, x_i + \\beta_2 \\, x_i^2 + \\ldots + \\beta_p \\, x_i^p\n",
    "$$\n",
    "\n",
    "By splitting our data into a training and test data set, we want to illustrate graphically the problem of overfitting."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "f31b01b4889c82cd5383e7e54c083fcd",
     "grade": false,
     "grade_id": "cell-160b12607c089949",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task**: Define the 'unknown' function\n",
    "\n",
    "$$\n",
    "f(x) = \\sin(10 \\, x) + 5 \\, \\cos(3 \\, x)\n",
    "$$\n",
    "\n",
    "using `numpy`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "57cbeb5aec80bf01037439faa3650392",
     "grade": false,
     "grade_id": "cell-f885a94e9c772a30",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Define the 'unknown' function f\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "e40d5dd536dc0711c06b7c67fe619e73",
     "grade": true,
     "grade_id": "cell-97535adf15a536a9",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert(np.abs(f(np.pi)+5) < 1e-8)\n",
    "assert(np.abs(f(np.pi/2)) < 1e-8)\n",
    "assert(np.abs(f(0)-5) < 1e-8)\n",
    "assert(np.abs(f(2) - 5.71) < 1e-2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "e230df4127b89855d831e0c900a7b70c",
     "grade": false,
     "grade_id": "cell-07c30dcca25eb800",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task**: Generate a uniformly distributed random vector `x` of size `n = 200`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "27a8aca60ea1224422f2aee65d4f5e72",
     "grade": false,
     "grade_id": "cell-a3beb882c6199b0e",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Set random seed to make random variables 'predictable'\n",
    "np.random.seed(0)\n",
    "\n",
    "# Generate uniformly distributed data samples over [0,1)\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "7350cdd1a4746421372f7fd9e5e3a854",
     "grade": true,
     "grade_id": "cell-1c0f265beb29c4ea",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert n == 200\n",
    "assert np.abs(x.mean() - 0.5004377979051402) < 1e-8\n",
    "assert x.shape == (200,)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "8402eeac2809fc5f90720a049f75d773",
     "grade": false,
     "grade_id": "cell-3b0982ae444d1c27",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task**: Determine the vector `y` in the following way\n",
    "\n",
    "$$\n",
    "y_i = f(x_i) + \\varepsilon \\, \\eta_i\n",
    "$$\n",
    "\n",
    "with $\\eta_i$ standard-normal distributed and $\\varepsilon = 1$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "8a13460794c88d0d71743504633bc9e8",
     "grade": false,
     "grade_id": "cell-a98eacb6c66093ab",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "65975f7f4595a7a59b3065060086906c",
     "grade": true,
     "grade_id": "cell-f632b8c0c5988c29",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert y.shape == (200,)\n",
    "assert np.abs(y.mean() - 0.2748887916140714) < 1e-8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "f311ed30f080dba05e3d87c82b81b580",
     "grade": false,
     "grade_id": "cell-e0628aef205798bf",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task**: Generate one figure with the following data:\n",
    "* mark the **data points** $(x_i,y_i)$ as black circles\n",
    "* draw the **population line** (the line representing the *unknown* function $f$) as a red solid line\n",
    "* draw the **regression line** for a fitted polynomial with polynomial degree `p = 20` as a blue dashed line\n",
    "\n",
    "**Hint**: Use the functions `np.polyfit` and `np.polyval` to determine the regression line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d8a3a96894091b4fda3cf4bab166c598",
     "grade": true,
     "grade_id": "cell-374277908e484443",
     "locked": false,
     "points": 4,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "plt.rcParams['figure.figsize'] = (15,8)\n",
    "\n",
    "fig = plt.figure(1, clear = True)\n",
    "\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "1ca7f33500a62feff146a47a7621c95e",
     "grade": false,
     "grade_id": "cell-c482bc8c1c9bf128",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Split the dataset $(x,y)$ into a training and test set using `np.split`\n",
    "- the training set should contain `ntrain` samples\n",
    "- the test set should contain `n - ntrain` samples\n",
    "\n",
    "Choose `ntrain = 80`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "65986c9c8b187b17a2d3b11737c2b263",
     "grade": false,
     "grade_id": "cell-8f48e7bce19fd7cb",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "71ab6e5dcb6c75b5e0a32f14cc366fe4",
     "grade": true,
     "grade_id": "cell-7e8109022605f809",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "assert(ntrain == 80)\n",
    "assert(xtrain.shape == (80,))\n",
    "assert(xtest.shape == (120,))\n",
    "assert(ytrain.shape == (80,))\n",
    "assert(ytest.shape == (120,))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we want to fit polynomial models with varying polynomial degrees ($p= 0,\\ldots,20$).\n",
    "As a quality measure, we store the training MSE (mean squared error) and the test MSE.\n",
    "\n",
    "**Note**: You can ignore the `RankWarning`s!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "6784154946fcab7912a32ea128d0cdf7",
     "grade": false,
     "grade_id": "cell-fc2d52ce5ecc280d",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def computeMSE(y, fhatx):\n",
    "    \" This function returns the mean squared error between x and y.\"\n",
    "    return np.mean(np.power(y-fhatx,2))\n",
    "\n",
    "# Initialize lists that contain test and training mean squared errors\n",
    "MSEtrain = []\n",
    "MSEtest = []\n",
    "\n",
    "# Set range for different degrees\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "\n",
    "for j in deg_range:\n",
    "    \n",
    "    # Fit polynomial of degree 'j' on training data\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()\n",
    "    \n",
    "    # Append test and training mse to according list\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "aa23629891f618a5069e22dbed92206e",
     "grade": false,
     "grade_id": "cell-6f0e245c3fb999fe",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "**Task**: Generate one figure that contains\n",
    "- the test mse in a logarithmic plot as a blue dashed line\n",
    "- the training mse in a logarthmic plot as a red solid line\n",
    "\n",
    "against the polynomial degree.\n",
    "You should use the function `plt.semilogy` and set meaningful `label`s."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "1329d429604f5a7c4f75dc6ec57d3ab5",
     "grade": true,
     "grade_id": "cell-ca2617631310e831",
     "locked": false,
     "points": 4,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "fig = plt.figure(2, clear=True)\n",
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()\n",
    "plt.legend()\n",
    "plt.xlabel(\"Polynomial degree\")\n",
    "plt.ylabel(\"MSE\")\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}