Introduction to Data Science (4V, 2Ü)  Prof. Ernst, WS 2018/19 
Content
Tentative list of course topics:
	
		- Introduction: What is Data Science
- Learning Theory
- Regression
- Neural Networks
- Classification
- Clustering and Tree-Based Methods
- Support Vectors
- Unsupervised Learning
Notices
	
Extra lab session 
	    In place of the lecture on Thursday, December 20, there will be an extra lab 
	    session during the regular class time in the computer pool. 
		 		
Class cancellations 
	    There will be no class on Monday, December 17. 
		 		
Rescheduled lecture 
		To make up for the lecture on December 10 lost to the railway strike, 
		we will have a lecture in place of the lab session 13:35-15:05 on Tuesday, December 11  
		in computer pool. 
		 		
Temporary lecture room change 
		On Thursday, November 1 and Monday, November 5, the lecture will take place in Room 2/N101. 
		 		
New time for Lab Exercises 
		Beginning on Tuesday, October 23, 2018, we will start our labs 10 minutes earlier, i.e., 13:35. 
		 		
Cancellation 
		There will be no lab exercise session on Tuesday, October 16. 
		 		
Note 
		To participate in the lab exercises, all students should have an account with the MRZ (Mathematics Computing Center). 
		Those who do not already have one, please apply for one 
		by following this page 
		and collect your login credentials with Ms. Margit Matt (Rh39, Room 704).
		 
		 
First Lab Exercises 
		Tuesday, October 9, 2018. 
		 		
First Class 
		Monday, October 8, 2018. 
		 			
Listing of this course in the electronic Vorlesungsverzeichnis (course directory):
    
                    
    
- Introduction: What is Data Science
- Learning Theory
- Regression
- Neural Networks
- Classification
- Clustering and Tree-Based Methods
- Support Vectors
- Unsupervised Learning
| Extra lab session | In place of the lecture on Thursday, December 20, there will be an extra lab session during the regular class time in the computer pool. | Class cancellations | There will be no class on Monday, December 17. | Rescheduled lecture | To make up for the lecture on December 10 lost to the railway strike, we will have a lecture in place of the lab session 13:35-15:05 on Tuesday, December 11 in computer pool. | Temporary lecture room change | On Thursday, November 1 and Monday, November 5, the lecture will take place in Room 2/N101. | New time for Lab Exercises | Beginning on Tuesday, October 23, 2018, we will start our labs 10 minutes earlier, i.e., 13:35. | Cancellation | There will be no lab exercise session on Tuesday, October 16. | Note | To participate in the lab exercises, all students should have an account with the MRZ (Mathematics Computing Center). Those who do not already have one, please apply for one by following this page and collect your login credentials with Ms. Margit Matt (Rh39, Room 704). | First Lab Exercises | Tuesday, October 9, 2018. | First Class | Monday, October 8, 2018. | 
|---|
Lecture
Literature
- James, Witten, Hastie & Tibshirani. An Introduction to Statistical Learning – with Applications in R. Springer 2013. Available online at this page.
- Here's a continually updated annotated reading list for the course (16.01.2019).
Slides
- What is Data Science? (05.02.2019)
- Learning Theory (05.02.2019)
- Linear Regression (05.02.2019)
- Classification (05.02.2019)
- Resampling Methods (05.02.2019)
- Linear Model Selection and Regularization (05.02.2019)
- Nonlinear Regression Models (05.02.2019)
- Tree-Based Methods (05.02.2019)
- Support Vector Machines (05.02.2019)
- Unsupervised Learning (05.02.2019)
- All slides (05.02.2019)
Exercises
Installation of Programming Environment under Linux (64 bit)
If you want to do the homework on your personal computers, you may clone the programming environment used in the labs. Get miniconda from this web page and follow the steps in the installation dialogue. Next, download the specification file spec-file.txt used in the labs and create a conda environment (under Linux):conda create --name DS2018 --file spec-file.txt
Installation of Programming Environment under Windows and MacOS
Download miniconda for your distribution by following this link and follow the installation instructions. Next, download the yml-file containing the packages used in the labs and create a conda environment in a miniconda/Anaconda shell:conda env create -f DS2018.yml
If your plots are not displayed in the browser, this might be due to a missing package. 
After sourcing of the correct environment, the following might help in some cases
python -m ipykernel install --user 
Please refer to Conda (Installation under Windows, Linux and MacOS) and Conda (Managing environments) for further information.
Material
In order to start the jupyter notebooks you have to open a terminal and source our conda environmentDS2018 via
source /LOCAL/Software/DataScience2018/setup_env
Next, change the directory to your exercise folder and download the jupyter notebook (right click and "Save link as") into this folder.
Finally, start the notebook via the command (make sure you see the (DS2018) in front of your username):
jupyter notebook Problem_Sheet_XX.ipynb
- Übungsblatt 1 (Problem sheet 1),
- Übungsblatt 2 (Problem sheet 2),
- Problem sheet 3 (jupyter notebook including homework 3)
- Solution to Problem sheet 3
- Problem sheet 4 (jupyter notebook)
- Solution to Problem sheet 4
- Homework 4 (jupyter notebook)
- Solution to Homework 4 (jupyter notebook)
- Problem sheet 5 (jupyter notebook)
- Solution to Problem sheet 5 (jupyter notebook)
- Homework 5 (jupyter notebook)
- Solution to Homework 5 (jupyter notebook)
- Problem sheet 6 (jupyter notebook)
- Solution to Problem sheet 6 (jupyter notebook)
- Homework 6 (jupyter notebook)
- Introduction to R (R script by Vincent Rost)
- Problem sheet and homework 7 (jupyter notebook)
- Solution to Problem sheet 7 (jupyter notebook)
- Problem sheet 8 (jupyter notebook)
- Solution to Problem sheet 8 (jupyter notebook)
- Homework 8 (jupyter notebook)
- Solution to Homework 8 (jupyter notebook)
- Problem sheet 9 (jupyter notebook)
- Solution to Problem sheet 9 (jupyter notebook)
- Homework 9 (jupyter notebook)
- Solution to Homework 9 (jupyter notebook)
- Problem sheet 10 (jupyter notebook)
- Solution to Problem sheet 10 (jupyter notebook)
- Using GEO data in Python (jupyter notebook by Thomas Kranzkowski)
- Homework 10 (jupyter notebook)
- Solution to Homework 10 (jupyter notebook)
- Problem sheet 11 (jupyter notebook)
- Solution to Problem sheet 11 (jupyter notebook)
- Problem sheet 12 (jupyter notebook)
- Solution to Problem sheet 12 (jupyter notebook)
- Homework 12 (jupyter notebook)
- Solution to Homework 12 (jupyter notebook)
- Problem sheet 13 (jupyter notebook)