{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise 1 - Implementing the logistic function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**:\n", "Implement the logistic function\n", "\n", "$$ \\sigma(x) = \\frac{e^x}{1+e^x} = \\frac{1}{1+e^{-x}}$$" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "#def sigma(x):\n", " # Put your definition here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we want to investigate how the shape of the logistic function changes for an affine linear input, i.e.,\n", "\n", "$$ \\sigma(\\beta_0 + \\beta_1 x) $$\n", "\n", "for different values of $\\beta_0$ and $\\beta_1$.\n", "\n", "**Task**: Take your time and try different values.\n", "What happens for negative/positive values of $\\beta_1$?\n", "What role does $\\beta_0$ play?\n", "\n", "**You have nothing to implement here, only evaluate the cells below.**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def my_sigma(b0, b1) : return sigma(b0 + b1 * x)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "6fbac7ea43c144f9b598ee6e128d045d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(FloatSlider(value=0.0, description='b0', max=10.0, min=-10.0, step=1.0), FloatSlider(val…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "from ipywidgets import interactive\n", "def f(b0, b1):\n", " plt.figure(1)\n", " x = np.linspace(-10,10,1001)\n", " plt.plot(x, sigma(b0 + b1*x))\n", " plt.plot(x,0.5*np.ones(x.shape))\n", " plt.ylim(-0.1, 1.1)\n", " plt.xlabel('x')\n", " plt.ylabel('p(x)')\n", " plt.show()\n", "\n", "interactive_plot = interactive(f, b0=(-10.0, 10.0, 1.0), b1=(-3., 3., 0.2))\n", "output = interactive_plot.children[-1]\n", "output.layout.height = '350px'\n", "interactive_plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise 2 - Logistic regression in practice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this lab, we want to investigate the `Default` data set known from the lecture.\n", "We first load the necessary modules.\n", "The command\n", " \n", " plt.rcParams['figure.figsize'] = [13, 5]\n", " \n", "changes the size of the figure (in inches)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "plt.rcParams['figure.figsize'] = [13, 5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Download the file `Default.csv` from the webpage and read it using the `pandas` function `read_csv`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Inspect the data using the methods you've learned so far, e.g., `describe`, `hist`, `head`, etc." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observation**: If you try the `describe` function you should see that the predictors `default` and `student` are not part of the summary.\n", "This is due to the fact that these values were read in by the `read_csv` function as strings. We know from the lecture that these predictors are categorical (in particular binary)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to process these values we convert them to the data type `boolean`, i.e., we replace the `String` objects in the columns `default` and `student` by `Boolean`'s.\n", "There are a lot of ways to accomplish this task; the easiest might be\n", "\n", " D.replace(to_replace='No',value=False,inplace=True)\n", " \n", "**Task**: Replace every 'No' and 'Yes' in the `DataFrame` by the values `False` and `True`, resp." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we want to plot both, the `income` and `balance` predictors as boxplots as a function of the `default` status.\n", "\n", "**Task**: Complete the plotting command in the following cell. What do you observe?" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(1,2)\n", "D.boxplot(column='balance',by='default', ax=ax[0]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer**:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we want to fit a logistic regression model to our data.\n", "Use the `LogisticRegression` function in the module `sklearn.linear_model`.\n", "The behaviour is similar to a `LinearRegression` fit.\n", "\n", "You can find the documentation of this function [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).\n", "There are a lot of optional arguments, the most important might be the unimpressive looking parameter `C`, which determines the strength of regularization used in the algorithm that solves the maximum likelihood problem.\n", "\n", "We will discuss regularization later in the lecture as well as in the labs. For now, it suffices if you keep the following in mind:\n", "\n", "**The larger you choose `C`, the less the problem will be regularized.**\n", "\n", "**Task**: Fit a logistic regression model that predicts the probability of `default` using `balance` as predictor. You should obtain the following values: $\\beta_0: -10.6513$, $\\beta_\\text{balance}: 0.0055$.\n", "\n", "Choose the following optional parameters:\n", "* set the regularization parameter `C = 1e10` (which is the scientific notation of $C = 10^{10}$, and thus very large)\n", "* set the error tolerance to `tol=1e-10`\n", "* set the solver to `solver = 'liblinear'`\n", "\n", "in this and the upcoming problems." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Print the intercept as well as the coefficients in a nice way." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**:\n", "Predict the probability of `default` for a `balance` value of $\\$ 1.000 $ and $\\$ 2.000 $, resp.\n", "Use the method `predict_proba`.\n", "Interpret the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we want to incorporate the predictors `income` and `student` status as well. This can be done easily using the same methods." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "lr2= LogisticRegression(solver='liblinear',tol=1e-10,C=1e10)\n", "X = D.loc[:,['balance','income','student']]\n", "y = D.loc[:,'default']\n", "reg2 = lr2.fit(X,y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Print the intercept as well as the coefficients in a nice way." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**:\n", "What is the default probability of a student and a non-student with a credit card balance of $\\$ 1500$, an income of $\\$40,000$?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer**:" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }