{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Data Science\n", "## Lab 14: Nonlinear regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part A: Spline regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this problem, you will implement a general spline regression.\n", "Recall that a spline of degree $d$ is a piecewise polynomial of degree $d$ with continuity of derivatives of orders $0, 1, \\ldots, d-1$.\n", "\n", "In the case of a cubic spline ($d = 3$) with $K$ knots, this regression function can be expressed by\n", "\n", "$$ f(x_i) = \\beta_0 + \\beta_1 \\, b_1(x_i) + \\ldots + \\beta_{K+3} \\, b_1(x_i) $$\n", "\n", "using appropriate basis functions.\n", "\n", "From Slide 355, you know that we can start off with monomials $x, x^2, x^3$ and then add for each knot $\\xi$ one **truncated monomial**\n", "\n", "$$ h(x,\\xi) = (x-\\xi)_+^3 = \\begin{cases} (x - \\xi)^3 & \\text{if } x > \\xi, \\\\ 0 & \\text{otherwise.} \\end{cases} $$\n", "\n", "\n", "A one-dimensional example is provided below.\n", "Shown are 100 samples of an artificial data set together with their (unknown) population line." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "# Implement 'unknown' function\n", "def y_func(x):\n", " return np.sin(8*x) / np.exp((5*x))\n", "\n", "# Number of samples\n", "n = 100\n", "\n", "# Degree of randomness\n", "eps = 0.1\n", "np.random.seed(1)\n", "\n", "# Draw samples\n", "x = np.random.rand(n)\n", "y = y_func(x) + eps * np.random.randn(n)\n", "\n", "# Plot samples\n", "plt.scatter(x, y, label='Samples')\n", "\n", "# Plot population line\n", "xlin = np.linspace(0,1,100)\n", "plt.plot(xlin, y_func(xlin),'r--', label='Population line')\n", "plt.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Complete the implementation of the function `generateSplineRegressionMatrix`\n", "that implements spline regression of degree $d$.\n", "Each column should contain one of the basis functions evaluated in all \n", "of the knots $\\xi_i$, $i=1,\\ldots,K$.\n", "\n", "**Hint**: If you have no idea how to get started, have a look at **Part B**. There, we perform a **global cubic regression**.\n", "This can be adapted to perform a **cubic spline regression**." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "f286726b154df7207bd2aa7db5d30fec", "grade": false, "grade_id": "cell-0751201643d9b229", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "def generateSplineRegressionMatrix(x, xi, d = 3):\n", "\n", " # Get number of samples\n", " n = len(x)\n", " \n", " # Get number of knots\n", " K = len(xi)\n", " \n", " # Initialize matrix with predictor variables 1\n", " X = np.ones((n,1))\n", "\n", " # TASK: Append columns for the monomials x^1, ..., x^d\n", " # YOUR CODE HERE\n", " raise NotImplementedError()\n", " \n", " # TASK: Append one column for each knot using the \n", " # truncated monomial (x - xi_k)_+^d\n", " # YOUR CODE HERE\n", " raise NotImplementedError()\n", " return X" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "8a912089b5c3cf007827d935df5cd749", "grade": true, "grade_id": "cell-241da47f399a0327", "locked": true, "points": 0, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "xi = np.linspace(0,1,4)[1:-1]\n", "X = generateSplineRegressionMatrix(x, xi, 2)\n", "assert abs(X.mean() - 0.3831258879866572) < 1e-8\n", "\n", "xi = np.linspace(0,1,6)[1:-1]\n", "X = generateSplineRegressionMatrix(x, xi, 4)\n", "assert X.shape == (100,9)\n", "assert abs(X.std() - 0.3610267467203577) < 1e-9" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Estimate the function `y_func` using **cubic** spline regression with $K = 4$ equidistant knots, i.e., $\\xi_1=0.2, \\xi_2=0.4, \\xi_3=0.6, \\xi_4=0.8$.\n", "Store your regression matrix as `X`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "258f968df64f55d60990e40f4d1d1377", "grade": false, "grade_id": "cell-1356bfd72144750b", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6a10832bf1d1e890cf504bbd9e9b1511", "grade": true, "grade_id": "cell-76f96871567adea9", "locked": true, "points": 0, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "assert abs(X.mean() - 0.2732634981273543) < 1e-8\n", "assert X.shape == (100,8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Solve the regression problem and plot the regression line together with the population line and the data set." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "ea8347123eacf74fc8f723dbfabfa315", "grade": false, "grade_id": "cell-a6fd26c3954d4415", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part B: Global spline regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Hint**: If you have no idea how to get started, the following code cells might help you.\n", "They perform a **global cubic regression** and can be adapted to perform a **cubic spline regression**." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def generatePolynomialRegressionMatrix(x, d = 3):\n", " \n", " # Get number of samples\n", " n = len(x)\n", " \n", " # Initialize matrix with predictor variables 1 (for the intercept)\n", " X = np.ones((n,1))\n", "\n", " # Append columns for the monomials x^1, ..., x^d\n", " for i in range(d):\n", " X = np.hstack((X,np.power(x,i+1).reshape(n,1)))\n", " \n", " return X" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Set degree of spline regression\n", "d = 3\n", "\n", "# Generate Spline matrix\n", "X = generatePolynomialRegressionMatrix(x, d)\n", "\n", "# Solve the normal equation, remember the @ sign\n", "# performs the ordinary matrix multiplication for numpy arrays.\n", "# You should also avoid to invert the matrix X^T * X!\n", "\n", "beta = np.linalg.solve(X.T @ X, X.T @ y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.scatter(x, y, label='Samples')\n", "\n", "Xreg = generatePolynomialRegressionMatrix(xlin,d)\n", "\n", "plt.plot(xlin, Xreg @ beta, 'g-', label = 'Regression line')\n", "plt.plot(xlin, y_func(xlin), 'r--', label='Population line')\n", "plt.legend();" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }