# Practical Data Science 🛠️

EECS 398-003, Fall 2024 at the **University of Michigan**

**Lecture**: TuTh 1:30-3:00PM, 1013 DOW; **Discussion**: F 2:30-3:30PM, 1670 BBB

4 credits; ULCS for Computer Science majors,

Advanced Technical Elective or Application Elective for Data Science majors,

Flexible Technical Elective for Electrical Engineering majors

### Suraj Rampure

he/him

Welcome! 👋 I’m a new teaching faculty member in Computer Science and Engineering at U-M, and I’m super excited to be offering this course in the fall. Hope to see you in class!

Skills and tools for building practical data science projects, along with their theoretical underpinnings. `pandas`

, `numpy`

, `scikit-learn`

, `BeautifulSoup`

, and Jupyter Notebooks, and also the math behind loss functions, gradient descent, linear and logistic regression, and other key ideas in machine learning.

### Content

This course will train students to use industry-standard tools to solve real-world problems, while giving them an understanding of how these tools work under the hood. After taking this course, students will be prepared to build data science portfolios, participate in research across campus, and succeed in data science internships.

The course will roughly be split in two halves; while the topics in it may overlap with other existing courses, it will take a very practical approach.

**Data Wrangling**

Python and Jupyter Notebooks

`numpy`

arraysTabular Data Manipulation in

`pandas`

Exploratory Data Analysis and Data Visualization

Web Scraping and APIs

SQL

Regular Expressions and Text Processing

**Applied Machine Learning**

Linear Regression through Linear Algebra

Feature Engineering in

`scikit-learn`

Regularization and Cross-Validation

Gradient Descent

Logistic Regression

Decision Trees and Random Forests

Unsupervised Learning

The course will be based on courses I’ve taught in the past at both UC San Diego and UC Berkeley, including:

- DSC 80: Practice of Data Science (most similar)
- DSC 40A: Theoretical Foundations of Data Science
- Data 100: Principles and Techniques of Data Science

If the lectures, videos, and assignments linked above seem interesting to you, I think you will enjoy this course. With that said, I will tailor the course to best suit the needs of enrolled students – for instance, I know that students in the course won’t have necessarily seen Python before, so we will introduce it as necessary.

### Format and Assessment

Lectures will be held in-person and recorded, and attendance will not be required. Discussions will be held in-person and run by GSIs/IAs, and attendance will also not be required. Office hours will largely be in-person, but there will be some remote options as well.

Students will be expected to complete **weekly homework assignments**, which will mostly comprise of programming assignments in Python and Jupyter Notebooks, with theoretical questions sprinkled throughout. The course will have **one midterm exam** (date TBD) and **one final exam** (4-6PM on Thursday, December 12th), both of which will be held in-person.

### Prerequisites

The course is open to students from all majors.

The enforced prerequisites are discrete math (EECS 203), programming (EECS 280), calculus I, calculus II, and linear algebra. A probability and statistics course is an advisory prerequisite. Options include DATASCI 101, STATS 206, STATS 250, STATS 280, STATS 412, IOE 265, or ECON 451.

If you’re interested in the class but don’t meet one of the prerequisites, email me and we can chat about your background. I encourage students of all backgrounds who are curious about data science to reach out! We will provide review material for the necessary linear algebra and probability/statistics material closer to the start of the semester.

### Frequently Asked Questions

**If I plan to take, or have already taken, a dedicated machine learning course (such as EECS 445), should I still take this course?**

Yes! The first half of this course introduces students to several tools and skills that aren’t typically covered in other machine learning courses, like using more sophisticated features in `pandas`

, scraping data from the internet, finding patterns in text data, etc. While the second half of the class does overlap a bit with more traditional machine learning courses, this course covers the content from a much more practical perspective. Students who have already seen machine learning will reinforce their understanding of the relevant concepts through hands-on, real-world examples (e.g. hyperparameter tuning in `sklearn`

). Students who haven’t already seen machine learning will develop an intuition for how various machine learning algorithms work, both practically and mathematically, giving them a strong foundation upon which further machine learning courses can build off of.

**What specific topics from linear algebra will the course use?**

In addition to matrix-vector multiplication, we will expect students to be familiar with the ideas of linear independence, spans, projections, and orthogonality. We will review these ideas when necessary, but it will help to have seen them already. As mentioned above, we will provide links to linear algebra review material closer to the start of the semester.

### Examples

The plots below are interactive, and involve examples we’ll work on in the class.

How do we find this “plane of best fit?” efficiently? How do we use it to make predictions?

What are the trends in power outages over time?