{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem sheet 12 - Support Vector Machines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the lecture about [support vector machines](https://www.tu-chemnitz.de/mathematik/numa/lehre/ds-2018/Slides/ds-intro-chapter9.pdf), we introduced three classifiers all based on the same principle: they try to determine a ($(d-1)$-dimensional) hyperplane to separate the data.\n", "\n", "The first one discussed was the **maximal margin classifier**. This classifier only works if the data is completely separable, otherwise, the optimization problem\n", "\n", "$$ \\max_{\\beta_0, \\beta_1, \\ldots, \\beta_p, M} M\\\\\n", "\\begin{aligned}\n", "\\text{such that }& & \\sum_{j=0}^p \\beta_j^2 &= 1 \\\\\n", "\\text{and }& & y_i \\, (\\beta^\\top x_i) &\\ge M, \\quad \\text{for } i=1,\\ldots,N\n", "\\end{aligned}$$\n", "\n", "has no admissable solution.\n", "Furthemore, it is quite unstable with respect to adding new observations.\n", "\n", "Therefore, we quickly derived a second algorithm, known as the **support vector classifier** or **soft-margin classifier**.\n", "This classifier allows some observations to be on the wrong side of margin or even on the wrong side of the hyperplane.\n", "The associated optimization problems reads\n", "\n", "$$ \\max_{\\beta_0, \\beta_1, \\ldots, \\beta_p, M, \\varepsilon_1, \\ldots, \\varepsilon_N} M\\\\\n", "\\begin{aligned}\n", "\\text{such that }& & \\sum_{j=0}^p \\beta_j^2 &= 1 \\\\\n", "\\text{and }& & y_i \\, (\\beta^\\top x_i) &\\ge M \\, (1 - \\varepsilon_i), \\quad \\text{for } i=1,\\ldots,N,\\\\\n", "\\text{and }& & \\varepsilon_i \\ge 0, \\sum_{i=1}^N \\varepsilon_i &\\le C \\quad \\text{for } i=1, \\ldots, N,\n", "\\end{aligned}$$\n", "\n", "where $C \\ge 0$ is a tuning parameter.\n", "Furthermore, we see that for $C = 0$ the **soft-margin classifier** simplifies to the **maximal margin classifier**.\n", "This is an important observation, and helps us with our first problem." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem 12.1 Maximal margin classification\n", "\n", "In this problem, we examine maximal margin classification. We can generate a test problem by exploiting the `samples_generator` functionality from `sklearn`, i.e., \n", "\n", " from sklearn.datasets.samples_generator import make_blobs\n", " X, y = make_blobs(n_samples=60, centers=2,\n", " random_state=0, cluster_std=0.60)\n", " \n", "**Task**: Complete the following code cell. Generate a classification sample data set with $60$ samples, two classes, random state $0$ and a standard devitiation within the classes of $0.6$.\n", "You can use `plt.scatter` with the option `c=y` to colorize the blobs in accordance with their class." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "## TODO: Generate data and make scatter plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function takes a model as input and plots the decision function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plotDecisionFunction(model, ax=None):\n", " \"\"\"Plot the decision function of a SVC model in 2 dimensions\"\"\"\n", " \n", " if ax is None:\n", " ax = plt.gca()\n", " xlim = ax.get_xlim()\n", " ylim = ax.get_ylim()\n", " \n", " x = np.linspace(xlim[0], xlim[1], 30)\n", " y = np.linspace(ylim[0], ylim[1], 30)\n", " X,Y = np.meshgrid(x, y)\n", " xy = np.vstack([X.ravel(), Y.ravel()]).T\n", " P = model.decision_function(xy).reshape(X.shape)\n", " \n", " # Plot decision boundary and margins\n", " ax.contour(X, Y, P, colors='k',\n", " levels=[-1, 0, 1], alpha=0.5,\n", " linestyles=['--', '-', '--'])\n", " \n", " # Plot support vectors\n", " ax.scatter(model.support_vectors_[:, 0],\n", " model.support_vectors_[:, 1],\n", " s=30, linewidth=1, facecolors='none', edgecolor='red');\n", " ax.set_xlim(xlim)\n", " ax.set_ylim(ylim)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Implement a maximal margin classifier for the test problem from above.\n", "You can use the function `SVC` from `sklearn.svm` with `kernel='linear'` and a parameter `C=1e-10`. One cannot choose $C=0$, but $C=10^{-10}$ resembles the maximal margin classifier very closely." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## TODO: Train your model, then the following code becomes executable\n", "\n", "#plt.scatter(X[:, 0], X[:, 1], c=y, s=15, cmap='viridis')\n", "#plotDecisionFunction(model);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem 12.2 - Soft-margin classification\n", "\n", "As explained in the lecture, the tuning parameter $C$ determines how many misclassifications are allowed.\n", "Check the following two claims from lecture slide 427:\n", "- C large: margin wide, many observations violate margin, many support vectors, many observations involved determining hyperplane, classifier has low variance, potentially high bias\n", "- C small: fewer support vectors, lower bias, higher variance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: \n", "Use the following interactive plot to see how the margin as well as the decision boundary changes with varying values of $C$.\n", "What do you observe?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ipywidgets import interactive\n", "def mySVC(myC):\n", " model = SVC(kernel='linear', C=myC)\n", " model.fit(X, y)\n", " plt.scatter(X[:, 0], X[:, 1], c=y, s=15, cmap='viridis')\n", " plotDecisionFunction(model);\n", "\n", "interactive_plot = interactive(mySVC, myC=(0.005,1.5,0.02))\n", "output = interactive_plot.children[-1]\n", "output.layout.height = '300px'\n", "interactive_plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observation**: In contrast to the lecture, the margin is small for large values of $C$ and wide otherwise.\n", "This is due to the fact that the tuning parameter $C$ used in the lecture is not the same as the tuning parameter $C$ in the function `SVC`, see the [scikit learn documentation](https://scikit-learn.org/stable/modules/svm.html#svc).\n", "\n", "You will learn more in the **Optimization in Machine Learning** class about this reformulation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we generate a second data set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate data with three centers\n", "X, y = make_blobs(n_samples=60, centers=4,\n", " random_state=0, cluster_std=0.60)\n", "# Change class 2 also into class 0, and class 3 into class 1\n", "y[y==2] = 0\n", "y[y==3] = 1\n", "plt.scatter(X[:, 0], X[:, 1], c=y, s=15, cmap='viridis');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Copy the code from above and generate an interactive chart that visualizes the **soft-margin classifier** for different penalty parameters $C$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, the classes cannot be separated by an affine linear function. At first glance it appears as if the function `plotDecisionFunction()` may have highlighted too many support vectors, especially the blue ones close to the right boundary, but they are indeed all support vectors; all samples in the lower left corner (below the separating line) are assigned to the blue group." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem 12.3 - Support vector machine\n", "\n", "We now consider the support vector machine, which generalizes the soft-margin classifier through the introduction of basis functions (the soft-margin classifier corresponds in the choice of a linear basis function)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: \n", "Experiment with different basis functions, e.g., `'poly', 'rbf'` and `'sigmoid'`, to obtain a better separation between the two classes.\n", "See the [documentation](https://scikit-learn.org/stable/modules/svm.html#kernel-functions) for further information." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }