{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Homework 6 - Cross-validation for parameter selection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the lecture you have learneed about the $K$-nearest neighbor classifier. It often performs very well, altough its computational cost is very high for higher dimensional problems.\n", "\n", "Without worrying about the implementational details, we want to learn about another application of cross-validation: **parameter tuning**.\n", "\n", "In this homework, we want make use of cross-validation to tune the parameter $K$, i.e., the number of neighbors used in the $K$-nearest neighbor classifier." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start by importing the iris dataset known from homework 5." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_iris\n", "iris = load_iris()\n", "X = iris.data\n", "y = iris.target" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, you should try the function `KNeighborsClassifier` from the module `sklearn.neighbors`.\n", "\n", "**Task**: Fit a model using the K nearest neighbor classifier.\n", "Use $K=5$ and compute the accuracy of the model, i.e., the proportion of correct classifications." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should observe an accuracy of $96.67\\%$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to use cross-validation to tune the parameter $K$.\n", "You should use the function `cross_val_score` from the `sklearn.model_selection` module to get a reliable estimate of the accuracy for a given value of $K$ (number of neighbors).\n", "A good choice for the optional parameter `cv`, which sets the number of folds used for the cross-validation, is 8.\n", "You should also set the optional parameter `scoring`, so that the function returns an array containing the accuracy of each fold.\n", "\n", "**Task**: \n", "Complete the following cell.\n", "Perform $K$-nearest neighbor classification for every $K=1,\\ldots,25$ using cross-validation with 8 folds.\n", "Store the mean of the accuracy scores in a list." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import cross_val_score\n", "n_fold = 8\n", "k_range = list(range(1, 26))\n", "k_scores = []\n", "\n", "# Use a for-loop to perform the task" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: What value of $k$ maximizes the accuracy?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Plot the optained accuracy estimates against the parameter values $K$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Compare your best $K$-nearest neighbor model with linear discriminant analysis using cross-validation with 8 folds." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }