đź“– Syllabus

Table of contents

  1. Overview
    1. Instructor
    2. Content
    3. Credit
    4. Prerequisites
  2. Getting Started
    1. Computer and Network Recommendations
    2. Websites
    3. Development Environment
    4. Forms
  3. Communication
  4. Course Components
    1. Lectures
    2. Discussions
    3. Homeworks
    4. Office Hours
    5. Weekly Schedule
  5. Exams
    1. Redemption Policy
  6. Grades
    1. Assignment Weights
    2. Letter Grades
    3. Late Policy, Slip Days, and Drops
    4. Regrade Requests
  7. Collaboration and Academic Integrity
    1. Use of Generative Artificial Intelligence
    2. Honor Code
  8. Student Support and Well-Being
    1. Accommodations
    2. Diversity and Inclusion
    3. Campus Resources
  9. Acknowledgements
  10. Disclaimer

Overview

Instructor

See the 👩‍🏫 Staff page for contact information.

Content

Watch this video to learn more about the course.

Skills and tools for building practical data science projects, along with their theoretical underpinnings. pandas, numpy, scikit-learn, BeautifulSoup, and Jupyter Notebooks, and also the math behind loss functions, gradient descent, linear and logistic regression, and other key ideas in machine learning.

This course will train students to use industry-standard tools to solve real-world problems, while giving them an understanding of how these tools work under the hood. After taking this course, students will be prepared to build data science portfolios, participate in research across campus, and succeed in data science internships.

The course will roughly be split in two halves; while the topics in it may overlap with other existing courses, it will take a very practical approach. Refer to the course homepage for a more specific list of topics.

Data Wrangling
Python and Jupyter Notebooks
numpy arrays
Tabular Data Manipulation in pandas
Exploratory Data Analysis and Data Visualization
Web Scraping and APIs
SQL
Regular Expressions and Text Processing
Applied Machine Learning
Linear Regression through Linear Algebra
Feature Engineering in scikit-learn
Regularization and Cross-Validation
Gradient Descent
Logistic Regression
Decision Trees and Random Forests
Unsupervised Learning

Credit

This course is worth 4 credits, and counts for the following major requirements:

  • Computer Science: Upper-Level CS (ULCS) Elective
  • Data Science: Advanced Technical Elective or Application Elective
  • Electrical Engineering: Flexible Technical Elective

Prerequisites

The course is open to students from all majors.

The enforced prerequisites are discrete math (EECS 203), programming (EECS 280), calculus I, calculus II, and linear algebra. A probability and statistics course is an advisory prerequisite. Options include DATASCI 101, STATS 206, STATS 250, STATS 280, STATS 412, IOE 265, or ECON 451.

All students, especially those who haven’t taken a linear algebra course, should work through LARDS: Linear Algebra Review for Data Science before the semester starts.


Getting Started

The course website, practicaldsc.org, will contain links to all course content. There are also a few things you’ll need to do to get set up.

Computer and Network Recommendations

Make sure you have a laptop consistent with CAEN recommendations.

Test your internet connection with the UM Custom Speedtest website and make sure it meets the minimum requirements for any UM service. You’ll need more bandwidth if there will be multiple simultaneous users in your household.

Resources for help with computing equipment:

You may also use computer workstations in CAEN labs on campus or connect remotely.

Websites

You’ll need to make accounts on the following sites:

  • Ed: We’ll be using Ed as our course message and discussion board. (Think of Ed as a replacement to Piazza.) More details are in the Communication section below. If you didn’t already get an invitation to our Ed course, sign up here.

  • Gradescope: You’ll submit all assignments to Gradescope. This is where all of your grades will live as well. Most of the assignments will be coding assignments. Parts of these assignments will be manually graded, but most of them will be autograded. You should have received an email invitation for Gradescope, but if not please let us know as soon as possible (preferably via Ed). Note that we will not be using the EECS-specific “Autograder” platform.

  • GitHub: You will access all course content (lecture slides and assignments) by pulling our course GitHub repository. That repo is here: github.com/practicaldsc/fa24. Don’t worry if you’ve never used Git before – the ⚙️ Environment Setup page will walk you through all of the necessary steps.

Note that we will not be using Canvas for anything this semester (so please don’t try and send us messages on Canvas!).

Development Environment

As soon as you are able to, go follow the steps in the ⚙️ Environment Setup page of the course website to set up your programming environment for the course. Discussion 1 will be dedicated to making sure you’ve followed these steps.

Forms

Please fill out the required Welcome Survey to tell us a bit more about your background and whether you need an alternate exam no later than Monday, September 2nd.


Communication

This semester, we’ll be using Ed as our course message board. You will be added to Ed automatically; use the invite link in the section above if you weren’t added.

If you have a question about anything to do with the course — if you’re stuck on a problem, didn’t understand something from lecture, want clarification on course logistics, or just have a general question about data science — you can make a post on Ed. We only ask that if your question includes some or all of an answer (even if you’re not sure it’s right), please make your post private so that others cannot see it. You can also post anonymously to other students if you prefer.

Course staff will regularly check Ed and try to answer any questions that you have. You’re also encouraged to answer questions asked by other students. Explaining something is a great way to solidify your understanding of it!

Please don’t email individual staff members, just make a private or public Ed post instead.


Course Components

Lectures

Lectures will be held in-person on Tuesdays and Thursdays from 1:30-3:00PM in 1500 EECS. Attendance is not required, though you are encouraged to attend in-person if you are able to. Lectures will be recorded and made publicly available.

Note that more students are enrolled in the course than the lecture room can fit, but usually, enough students consume material remotely that we don’t anticipate in-person space to be a bottleneck for anyone wishing to attend the class in-person. Still, you may be unable to find a seat in the first few lectures.

Lecture slides/notebooks will be your main resource in this class. You can access them, along with all course materials, by pulling from the course GitHub repository, github.com/practicaldsc/fa24; instructions on how to do this are in the ⚙️ Environment Setup page. We will also link HTML previews of each lecture notebook from the course homepage; you can use these to annotate the lecture notebooks with a tablet, if you’d like.

Supplementary readings will be made available on the course homepage, drawn from a variety of online resources. Additional supplementary resources can be found in the Readings section of the Resources tab of the course website.

Discussions

There are two discussion sections, held in-person on Fridays starting in Week 1:

  • Fridays, 12:30-1:30PM, 2147 GGBL (capacity: 40)
  • Fridays, 2:30-3:30PM, 1670 BBB (capacity: 100)

You can attend either discussion session, but if space fills up, priority will be given to students officially enrolled in that section.

The first discussion includes some useful instruction and tips for using Jupyter Notebooks, the programming environment we’ll be using in this course. It should be helpful to get you set up and comfortable with the technology you’ll be using all semester.

Subsequent discussion sections will be focused on exam preparation. Students will work through problems from past exams in related courses and be able to get help from course staff. Attending discussion and working through practice problems gives you direct experience with the types of theoretical questions you will see on exams.

Discussion sections will not be recorded. The purpose of discussion is to give you hands-on problem-solving experience, so you really need to attend and participate to reap the benefits.

Homeworks

There will be ~12 homework assignments due weekly throughout the semester. Expect each homework assignment to take ~8-10 hours to complete. Even though this is a programming-heavy class, expect it to have the workload patterns of a more theoretical class – that is, expect a constant, moderate level of work each week, rather than having some weeks with very little work and some extremely heavy weeks.

Homework assignments will usually be in the form of Jupyter Notebooks, where you will write Python code and answer free-response and multiple-choice questions, though some homework assignments may include work that must be done on pen-and-paper to assess more theoretical concepts. For programming problems, public tests will be provided to make sure you’re on the right track, however, your submission will be graded using an autograder with hidden tests. We may also grade your code on style.

Each homework is worth the same amount, but the lowest homework will be dropped when calculating your final score. Homeworks will usually be released on Friday mornings and usually due on Thursdays at 11:59PM, though this schedule is subject to change. You will access homework by pulling the course GitHub repository, and will submit your completed homework as Jupyter Notebooks to Gradescope (we will not be using the EECS autograder). You may turn them in as many times as you like before the deadline, and only the most recent submission will be graded, so it’s a good idea to submit early and often.

Note that all homeworks are to be completed individually; there are no group assignments in this course. See the Collaboration and Academic Integrity section for more details.

Note that you may not work with partners on homework assignments, however you’re encouraged to discuss all assignments with others at a conceptual level in office hours and study groups.

🆕 The last homework of the semester is now called the Portfolio Homework. In the Portfolio Homework, you’ll work on an open-ended investigation of a dataset of your choosing from a fixed set of options, using the tools from both halves of the semester. Your work will culminate in a public-facing website that you can share with friends, family, and on your resume! We will release more details about the Portfolio Homework in early-mid November to give you at least 3 weeks to work on it, but for now, note that the Portfolio Homework is different than other homework assignments in the following ways:

  • You can work with one partner (but don’t have to).
  • You cannot use slip days on the Portfolio Homework (since it’s due at the end of the semester and we need time to grade it).
  • You cannot drop the Portfolio Homework – it will be a part of your overall homework grade no matter what. Your lowest score among Homeworks 1-11 will be the one that is dropped.
  • Otherwise, it is worth the same amount as other homeworks.
  • The Portfolio Homework will not be autograded at all; it will be fully manually graded.

Office Hours

To get help on assignments and concepts, course staff will be hosting several office hours per week. The vast majority of office hours will be held in-person on North Campus, though we will hold a few Central Campus and remote office hours, depending on staff availability. See the Calendar tab of the course website for the most up-to-date schedule and directions.

We use the term “office hours” but really, office hours are held in a common room where you can come to work on assignments, meet your classmates, and get help from course staff. We don’t bite and we would love to see you in office hours!

Office hours are your chance to ask for general help, clarification on homeworks, and to review previous homeworks. Course staff will not tell you if your answer is correct, and it is inappropriate to ask. Here are some really good questions to ask instead:

  • I got confused about a concept in class. Can you explain it?
  • When the assignment says X, does it mean A or B?
  • My code is giving a weird error - can you help me understand why?
  • I can’t get this test to pass, so I must be doing something wrong. Can you help me figure it out?
  • My code is doing something different than what I expected. Can you explain what is happening?

Questions that you should never ask us:

  • Is this the right answer?
  • Can you check my code and make sure it is right?
  • What is the answer?
  • What’s going to be on the exam?

Your primary motivation when interacting with course staff should be learning.

Weekly Schedule

To summarize all of the events and deadlines, refer to this general weekly schedule (which is subject to change in any given week):

MondayTuesdayWednesdayThursdayFriday
 Lecture LectureDiscussion
   Homework N - 1 due 11:59PMHomework N released

Exams

This class has one Midterm Exam and one Final Exam. Exams are cumulative, though the Final Exam will emphasize material after the Midterm Exam. The exams will feature a mix of multiple choice, select all, short answer, and long answer questions, including questions that require you to write code and do math.

  • Midterm Exam: Wednesday, October 9th, 7-9PM

  • Final Exam: Thursday, December 12th, 4-6PM

Both exams will be administered in-person. If you have conflicts with either of the exams, please let us know on the Welcome Survey. We may provide alternate exam times for students with a valid, documented conflict with a required activity in another course or official university-affiliated activity, or to help students avoid negative academic consequences when their religious obligations conflict with academic requirements.

Redemption Policy

The Final Exam will consist of two parts: a “Midterm” section and a “post-Midterm” section. Both parts are required, in that they both count towards your Final Exam score.

If you do better on the “Midterm” section of the Final Exam than you did on the original Midterm Exam, your score on the “Midterm” section will replace your original Midterm Exam score. This lowers the stakes of the Midterm Exam and gives you two opportunities to demonstrate your understanding of the content from the first half of the course. This also means that you can miss the Midterm Exam for any reason and have the score be replaced by your score on the “Midterm” section of the Final Exam (though we do not recommend this).

You must take the Final Exam to pass the course.


Grades

Assignment Weights

Here is how we’ll compute your grade:

ComponentWeightNotes
Homeworks50%• lowest score dropped (though the Portfolio Homework cannot be dropped)
• 6 8 slip days available to use with a max of 2 per homework (0 allowed on the Portfolio Homework)
Midterm Exam20%see the Redemption Policy above
Final Exam30% 

Letter Grades

Grading for this class is not curved in the sense that the average is set at (say) a B+ and half of the class must receive a grade lower than that. If everyone does well and shows mastery of the material, everyone can receive an A (this would be awesome!). If no one does well (this is unlikely), then everyone can receive a C.

Grading for this class is curved in the sense that we do not have a pre-defined mapping from project and exam scores to a final GPA. There is no pre-determined score (e.g., 90% of all possible points) that earns an A or a B or a C or any other grade. To determine the final grade, we will ask questions like “Did this student master the material?”. With that said, grades will not be any stricter than the standard grading scale (where an A+ is a 97+, A is 93+, A- is 90+, etc). For instance, the threshold for an “A” will never be higher than 93%.

Try your best not to worry about grades, and we’ll reciprocate by being fair. We’re in this together ❤️.

Late Policy, Slip Days, and Drops

All homeworks must be submitted by 11:59PM Ann Arbor time on the due date to be considered on time. You may turn them in as many times as you like before the deadline, and only the most recent submission will be graded, so it’s a good habit to submit early and often. If you make a submission after the deadline, your assignment will be counted as late.

You have 6 8 “slip days” to use throughout the semester. A slip day extends the deadline of a homework by 24 hours. You may use up to 2 slip days on any one homework assignment, except for the Portfolio Homework, which you cannot use slip days on (since it’s due at the end of the semester and we need time to manually grade it).

Slip days are designed to be a transparent and predictable source of leniency in deadlines. You can use a slip day if you are too busy to complete a homework on its original due date (or if you forgot about it). But slip days are also meant for things like the internet going down at 11:58PM just as you go to submit your homework. Slip days are meant to be used in exceptional circumstances, so you probably should not need to use all 6 8, but if you have something going on in your life that is impeding your ability to do your classwork on time, please reach out to us as soon as possible so we can work something out. The earlier you let us know that something’s going on, the more we can do to help, so please reach out.

Slip days are applied automatically at the end of the semester, and you don’t need to ask in order to use one. It’s your responsibility to keep track of how many you have left. If you’ve run out of slip days and submit an assignment late, that homework may still be graded, but you will receive a 0 on it when we calculate grades at the end of the semester.

Regrade Requests

Most homeworks are entirely autograded, but some questions are manually graded. If you feel that there in an error in the autograder or that the manual grader has made a mistake, you may submit a regrade request within three days of the grades being released. If you do not submit a regrade request within three days, your original grade will be final.

Regrade Requests for Manually Graded Problems

To submit a regrade request for a manually graded problem, make the regrade request directly on Gradescope. Note that part of your grade is clarity, so if your answer was mostly right but unclear you may still not be eligible for full credit.

Regrade Requests for Autograded Problems

To submit an autograder regrade request, please fill out the Autograder Regrade Request Form.

Note that it’s rare that something is wrong with the autograder, and if that’s the case, we’ll typically fix the necessary test cases and re-run the autograder for the entire class.


Collaboration and Academic Integrity

This will be a tough, but rewarding course. While you will be challenged this semester, we will be offering you plenty of support through office hours and Ed. Make good use of these resources, and you will be able to succeed in this course.

In this course, you can read books, surf the web, talk to your friends and course staff to get help understanding the concepts you need to know to complete your assignments. However, all work you submit must be your own, original work; collaboration must not result in solutions that are identifiably similar to other solutions, past or present.

Encouraged CollaborationUnacceptable Collaboration
Discussing the general approach to homeworks

Talking about problem-solving strategies or issues you ran into and how you solved them

Discussing the answers to exams with other students who have already taken the exam after the exam is complete

Using code provided in class, by the textbook or any other assigned reading or video, with attribution

Google searching for documentation on Python or pandas

Working together with other students on homeworks without copying or sharing answers

Posting a question about your approach to a problem on Ed, without sharing your code
Using or submitting code acquired from other students, the web, or any other resource not officially sanctioned by this course

Posting your code online, including on Ed, unless privately to course staff only

Having any other person complete any part of your assignment on your behalf

Completing an assignment on behalf of someone else

Providing code, exam questions, or solutions to any other student in the course

Collaborating with others on exams

If you are unsure about what constitutes an honor code violation, please contact the course staff with questions. The best way to avoid problems is by using your best judgement. Here are some suggestions for completing your work:

  • Don’t look at or discuss the details of another student’s code for a homework you are working on, and don’t let another student look at your code.
  • Don’t start with someone else’s code and make changes to it, or in any way share code with other students.
  • If you are talking to another student about a homework, don’t take notes, and wait an hour afterward before you write any code.

Use of Generative Artificial Intelligence

Our course policy on the use of GenAI tools for homeworks is simple: you can use these tools to build an understanding of course material and to assist you on assignments, keeping in mind that no tool is a substitute for a strong understanding of course concepts.

Some examples of responsible use of generative AI include autocompleting repetitive/boilerplate code and suggesting edge cases. Creating large sections of code you do not understand yourself is using generative AI in an irresponsible way.

Remember that during the exams you will write code without the help of generative AI tools, and exams are worth half your grade.

Honor Code

We report suspected violations to the Engineering Honor Council. To identify violations, we use both manual inspection and automated software to compare present solutions with each other, with past solutions, and with code found online. The Honor Council determines whether a violation of academic standards has occurred, as well as any sanctions. Read the Honor Code for detailed definitions of cheating, plagiarism, and other forms of academic misconduct.


Student Support and Well-Being

Accommodations

If you need, or think you might need, an accommodation for a disability, please let us know during the first three weeks of the semester. Some aspects of this course may be modified to facilitate your participation and progress. As soon as you make us aware of your needs, we can work with the Services for Students with Disabilities (SSD) office to help us determine appropriate academic accommodations. SSD (ssd.umich.edu; 734-763-3000) recommends accommodations through a Verified Individualized Services and Accommodations (VISA) form. Any information you provide is private and confidential and will be treated as such.

Diversity and Inclusion

It is our intention that students from all backgrounds and perspectives will be well served by this course, and that the diversity that students bring to this class will be viewed as an asset. We welcome individuals of all ages, backgrounds, beliefs, ethnicities, genders, gender identities, gender expressions, national origins, religious affiliations, sexual orientations, socioeconomic background, family education level, ability - and other visible and nonvisible differences. All members of this class are expected to contribute to a respectful, welcoming, and inclusive environment for every other member of the class. Your suggestions are encouraged and appreciated.

Campus Resources

As a student, you may experience a range of issues that can negatively impact your learning, such as anxiety, depression, interpersonal or sexual violence, difficulty eating or sleeping, loss/grief, and/or alcohol/drug problems. These mental health concerns or stressful events may lead to diminished academic performance and affect your ability to participate in day-to-day activities.

In order to support you during such challenging times, the University of Michigan provides a number of confidential resources to all enrolled students, many of which are listed here. Some particularly useful resources include:


Acknowledgements

This course is being offered for the first time at the University of Michigan. With that said, many of the materials we will use are adopted from content created by countless other instructors for courses at other institutions, in particular:

  • DSC 10, DSC 40A, and DSC 80 at the University of California, San Diego
  • Data 6 and Data 100 at the University of California, Berkeley

Language in this syllabus has been adopted from other courses as well, including EECS 203, EECS 280, EECS 376, and EECS 485 here at the University of Michigan, and CSE 160 at the University of Washington.


Disclaimer

While we try to do our best to plan ahead, unfortunately, sometimes circumstances do arise that necessitate a policy change. When this happens, the change will be announced, and this document will be updated with the new policy.

We appreciate any and all feedback, given that this course is new and evolving. If you’d like to provide us with anonymous feedback at any point, you can do so at this form. Thank you!