{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Homework 5 - Classification of flower petal shapes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This data set consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 numpy.ndarray.\n", "\n", "The rows being the samples and the columns being:\n", "\n", "1. sepal length,\n", "2. sepal width,\n", "3. petal length and\n", "4. petal width." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from sklearn import datasets\n", "iris = datasets.load_iris()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data comes as a dictionary. You can access the predictors using `iris.data` and the classes using `iris.target`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "X = iris.data\n", "y = iris.target" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: How many samples are in the data set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Solution**:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "150" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Plot the sepal length on the x-axis and the sepal width on the y-axis. Color each of the three types of irises differently.\n", "Add a legend that gives the correct iris type (0-Setosa, 1-Versicolour, 2-Virginica)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Solution**:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "for l in range(3):\n", " # Find out indices belonging to class l\n", " idx = (y==l)\n", " plt.plot(X[idx,0],X[idx,1],'+')\n", "plt.xlabel('Sepal length')\n", "plt.ylabel('Sepal width')\n", "plt.legend(['Setosa','Versicolour','Virginica']);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**:\n", "Split your data into a training and a test set.\n", "Put the first 40 samples within each class in the training set and the remaining samples in a test data set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Solution**:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "train_idx = np.hstack([np.arange(0,40),np.arange(50,90),np.arange(100,140)])\n", "test_idx = np.hstack([np.arange(40,50),np.arange(90,100),np.arange(140,150)])\n", "Xtrain, ytrain = X[train_idx,:],y[train_idx]\n", "Xtest , ytest = X[test_idx, :],y[test_idx]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the lecture you've heard about the classification method called\n", "*Linear discriminant analysis (LDA)*.\n", "\n", "**Task**: Find a way using `scikit-learn` to accomplish a linear discriminant analysis.\n", "\n", "Perform an LDA using only the first two predictors, i.e., `sepal length` and `sepal width`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Solution**:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LinearDiscriminantAnalysis(n_components=None, priors=None, shrinkage=None,\n", " solver='svd', store_covariance=False, tol=0.0001)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n", "lda = LinearDiscriminantAnalysis()\n", "lda.fit(Xtrain[:,0:2],ytrain)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: What is the proportion of correctly classified irises in the *test* data set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Solution**:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Proportion of correct classifications: 0.8666666666666667\n" ] } ], "source": [ "prop1 = np.mean(1-np.abs(lda.predict(Xtest[:,0:2])-ytest))\n", "print('Proportion of correct classifications: ', prop1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Now, incorporate all of the predictors. How does the proportion of correct classifications change?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Solution**:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Proportion of correct classifications: 1.0\n" ] } ], "source": [ "**Solution**:from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n", "lda2 = LinearDiscriminantAnalysis()\n", "lda2.fit(Xtrain,ytrain)\n", "prop2 = np.mean(1-np.abs(lda2.predict(Xtest)-ytest))\n", "print('Proportion of correct classifications: ', prop2)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }