from lec_utils import *
from lec03_utils import *
from PIL import Image
Announcements 📣¶
- Homework 1 is due on Thursday.
Post on Ed or come to Office Hours for help!
- The Welcome Survey was due yesterday; please fill it out if you haven't already!
EECS 370 has the same midterm time as us. If you're in 370, sign up to take their alternate midterm exam the following day. - We released a 🎥 walkthrough video of the "area codes" dictionary example from Lecture 2.
- We also posted the solutions to the exercises from Lecture 2 and Discussion 1 in our public GitHub repository. Starting today, I'll post "blank" versions of lecture notebooks before class, and "filled" versions of notebooks with the code that I write after class, so that you can code along with me.
- Check out the new Resources tab on the course website, with links to lots of supplementary resources and past exams from similar classes!
Agenda¶
- Recap:
for
-loops and dictionaries. numpy
arrays.- Multidimensional arrays and linear algebra.
- Randomness and simulation.
Aside: Python Tutor¶
Python Tutor, linked on the Resources tab of the course website, visualizes the execution of Python code.
# The mystery example from Lecture 2.
from IPython.display import IFrame
IFrame(width="800",
height="500",
frameborder="0",
src="https://pythontutor.com/iframe-embed.html#code=def%20mystery%28vals%29%3A%0A%20%20%20%20vals%5B-1%5D%20%3D%2015%0A%20%20%20%20return%20vals.append%28'BBB'%29%0A%20%20%20%20%0Acreature%20%3D%20%5B1,%202,%203%5D%0A%0Amystery%28creature%29%0Amystery%28creature%29%0Amystery%28creature%29&codeDivHeight=400&codeDivWidth=350&cumulative=false&curInstr=14&heapPrimitives=nevernest&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false")
Question 🤔 (Answer at practicaldsc.org/q)
How much progress have you made on Homework 1?
No judgement!
- A. I've submitted it!
- B. I've finished more than half of it.
- C. I've finished a few questions.
- D. I've looked at it, but haven't written any code yet.
- E. Haven't started at all.
Recap: for
-loops and dictionaries¶
(source)
for
-loops in Python¶
- In Python, you can loop over any iterable. Strings, lists, and dictionaries are all examples of iterables.
- All of the following are valid ways to write a
for
-loop.
for value in "this is a string":
for element in lst: # Assume lst is a list.
for i in range(len(lst)):
- One of the more common
for
-loop examples you may have seen in earlier classes involved performing some operation to every element of a sequence, e.g. doubling the numbers in a list.
def double(vals):
new_vals = []
for val in vals:
new_vals.append(vals * 2)
return new_vals
- We are going to avoid ❌ these kinds of
for
-loops in this class, because there are much faster ways of achieving the same goal innumpy
andpandas
. We'll see these soon.
while
-loops will come up sparingly.
But conceptually, you should know how they work!
List comprehension¶
In the situations when we do want to perform some operation to every element in a list, a common pattern is the list comprehension.
vals = [2, -1, 9, 4, 3, 8]
[val ** 2 for val in vals]
[4, 1, 81, 16, 9, 64]
[val ** 2 for val in vals if val % 2 == 0]
[4, 16, 64]
[val ** 2 if val % 2 == 0 else val + 1 for val in vals]
[4, 0, 10, 16, 4, 64]
Dictionaries¶
- A dictionary stores a collection of key-value pairs.
They are the equivalent of a map in C++.
{
curly brackets}
denote the start and end of a dictionary, a colon:
is used to denote a single key value pair, and a comma,
is used to separate key-value pairs.
dog = {'name': 'Junior', 'age': 15, 4: ['kibble', 'treat']}
dog
{'name': 'Junior', 'age': 15, 4: ['kibble', 'treat']}
- We retrieve a value in a dictionary using its key.
dog['name']
'Junior'
dog['height']
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[9], line 1 ----> 1 dog['height'] KeyError: 'height'
- After creation, we can add or change key-value pairs.
dog['color'] = 'beige'
dog['tricks'] = {
'easy': ['roll over', 'paw'],
'medium': ['jump']
}
dog
{'name': 'Junior', 'age': 15, 4: ['kibble', 'treat'], 'color': 'beige', 'tricks': {'easy': ['roll over', 'paw'], 'medium': ['jump']}}
- A dictionary's keys must be immutable (numbers, strings, Booleans), while its values can be anything.
# Here, we're trying to add a value with a key of [1, 2].
# Since [1, 2] is mutable, it can't be used as a key.
dog[[1, 2]] = 'does this work?'
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[12], line 3 1 # Here, we're trying to add a value with a key of [1, 2]. 2 # Since [1, 2] is mutable, it can't be used as a key. ----> 3 dog[[1, 2]] = 'does this work?' TypeError: unhashable type: 'list'
Activity
Complete the implementation of the function find_anagrams
, which takes in words
, a list of strings, and returns a dictionary describing the anagrams present in words
. Example behavior is given below.
>>> find_anagrams(['dog', 'hello', 'enlist', 'silent', 'a gentleman', 'god', 'elegant man', 'listen'])
{'dgo': ['dog', 'god'],
'ehllo': ['hello'],
'eilnst': ['enlist', 'silent', 'listen'],
' aaeeglmnnt': ['a gentleman', 'elegant man']}
example_words = ['dog', 'hello', 'enlist', 'silent', 'a gentleman', 'god', 'elegant man', 'listen']
def find_anagrams(words):
out = {}
for word in words:
word_sorted = ''.join(sorted(word))
if word_sorted in out:
out[word_sorted].append(word)
else:
out[word_sorted] = [word]
return out
numpy
arrays¶
Import statements¶
- We use
import
statements to add the objects (values, functions, classes) defined in other modules to our programs. There are a few different ways toimport
.
Other terms I'll use for "module" are "library" and "package".
- Option 1:
import module
.
Now, everytime we want to use a name in module
, we must write module.<name>
.
import math
math.sqrt(15)
3.872983346207417
sqrt(15)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[16], line 1 ----> 1 sqrt(15) NameError: name 'sqrt' is not defined
- Option 2:
import module as m
.
Now, everytime we want to use a name in module
, we can write m.<name>
instead of module.<name>
.
# This is the standard way that we will import numpy.
import numpy as np
np.pi
3.141592653589793
np.linalg.inv([[2, 1],
[3, 4]])
array([[ 0.8, -0.2], [-0.6, 0.4]])
- Option 3:
from module import ...
.
This way, we explicitly state the names we want to import from module
.
To import everything, write from module import *
.
# Importing a particular function from the requests module.
from requests import get
# This typically fills up the namespace with a lot of unnecessary names, so use sparingly.
from math import *
sqrt
<function math.sqrt(x, /)>
NumPy¶
- NumPy (pronounced "num pie") is a Python library (module) that provides support for arrays and operations on them.
- The
pandas
library, which we will use for tabular data manipulation, works in conjunction withnumpy
.
- To use
numpy
, we need to import it. It's usually imported asnp
(but doesn't have to be!)
We also had to install it on your computer first, but you already did that when you set up your environment.
import numpy as np
Arrays¶
- The core data structure in
numpy
is the array. Moving forward, "array" will always refer to anumpy
array.
- One way to instantiate an array is to pass a list as an argument to the function
np.array
.
np.array([4, 9, 1, 2])
array([4, 9, 1, 2])
- Arrays, unlike lists, must be homogenous – all elements must be of the same type.
# All elements are converted to strings!
np.array([1961, 'michigan'])
array(['1961', 'michigan'], dtype='<U21')
Array-number arithmetic¶
- Arrays make it easy to perform the same operation to every element without a
for
-loop. This behavior is formally known as "broadcasting", but we often say these operations are vectorized.
temps = [68, 72, 65, 64, 62, 61, 59, 64, 64, 63, 65, 62]
temps
[68, 72, 65, 64, 62, 61, 59, 64, 64, 63, 65, 62]
temp_array = np.array(temps)
# Increase all temperatures by 3 degrees.
temp_array + 3
array([71, 75, 68, 67, 65, 64, 62, 67, 67, 66, 68, 65])
# Halve all temperatures.
temp_array / 2
array([34. , 36. , 32.5, 32. , 31. , 30.5, 29.5, 32. , 32. , 31.5, 32.5, 31. ])
# Convert all temperatures to Celsius.
(5 / 9) * (temp_array - 32)
array([20. , 22.22, 18.33, 17.78, 16.67, 16.11, 15. , 17.78, 17.78, 17.22, 18.33, 16.67])
- Note: In none of the above cells did we actually modify
temp_array
! Each of those expressions created a new array. To actually changetemp_array
, we need to reassign it to a new array.
temp_array
array([68, 72, 65, 64, 62, 61, 59, 64, 64, 63, 65, 62])
temp_array = (5 / 9) * (temp_array - 32)
# Now in Celsius!
temp_array
array([20. , 22.22, 18.33, 17.78, 16.67, 16.11, 15. , 17.78, 17.78, 17.22, 18.33, 16.67])
⚠️ The dangers of unnecessary for
-loops¶
- Under the hood,
numpy
is implemented in C and Fortran, which are compiled languages that are much faster than Python. As a result, these vectorized operations are much quicker than if we used a vanilla Pythonfor
-loop.
Also, the fact that arrays must be homogenous lend themselves to more efficient representations in memory.
- We can time code in a Jupyter Notebook. Let's try and square a long sequence of integers and see how long it takes with a Python loop:
%%timeit
squares = []
for i in range(1_000_000):
squares.append(i * i)
46.7 ms ± 790 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
- In vanilla Python, this takes about 0.04 seconds per loop. In
numpy
:
%%timeit
squares = np.arange(1_000_000) ** 2
1.51 ms ± 47.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
- Only takes about 0.001 seconds per loop, more than 40x faster!
Element-wise arithmetic¶
- We can apply arithmetic operations to multiple arrays, provided they have the same length.
- The result is computed element-wise, which means that the arithmetic operation is applied to one pair of elements from each array at a time.
a = np.array([4, 5, -1])
b = np.array([2, 3, 2])
a + b
array([6, 8, 1])
a / b
array([ 2. , 1.67, -0.5 ])
a ** 2 + b ** 2
array([20, 34, 5])
arr = np.array([3, 8, 4, -3.2])
(2 ** arr).sum()
280.108818820412
(2 ** arr).mean()
70.027204705103
(2 ** arr).max()
256.0
(2 ** arr).argmax()
1
# An attribute, not a method.
arr.shape
(4,)
Question 🤔 (Answer at practicaldsc.org/q)
What questions do we have about arrays so far?
Activity
🎉 Congrats! 🎉 You won the lottery 💰. Here's how your payout works: on the first day of September, you are paid \$0.01. Every day thereafter, your pay doubles, so on the second day you're paid \$0.02, on the third day you're paid \$0.04, on the fourth day you're paid \$0.08, and so on.
September has 30 days.
Write a one-line expression that uses the numbers 2
and 30
, along with the function np.arange
and at least one array method, that computes the total amount in dollars you will be paid in September. No for
-loops or list comprehensions allowed!
*Note*: We have a 🎥 walkthrough video of this problem, but don't watch it until you've tried it yourself!
(2 ** np.arange(30) / 100).sum()
10737418.23
Boolean filtering¶
- Comparisons with arrays yield Boolean arrays! These can be used to answer questions about the values in an array.
temp_array
array([20. , 22.22, 18.33, 17.78, 16.67, 16.11, 15. , 17.78, 17.78, 17.22, 18.33, 16.67])
temp_array >= 18
array([ True, True, True, False, False, False, False, False, False, False, True, False])
- How many values are greater than or equal to 18?
(temp_array >= 18).sum()
4
- What fraction of values are greater than or equal to 18?
(temp_array >= 18).mean()
0.3333333333333333
- Which values are greater than or equal to 18?
temp_array[temp_array >= 18]
array([20. , 22.22, 18.33, 18.33])
- Which values are between 18 and 20?
# Note the parentheses!
temp_array[(temp_array >= 18) & (temp_array <= 20)]
array([20. , 18.33, 18.33])
# WRONG!
temp_array[(temp_array >= 18) and (temp_array <= 20)]
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[53], line 2 1 # WRONG! ----> 2 temp_array[(temp_array >= 18) and (temp_array <= 20)] ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Note: & and | vs. and and or¶
- In Python, the standard symbols for "and" and "or" are, literally,
and
andor
.
if (5 > 3 and 'h' + 'i' == 'hi') or (-2 > 0):
print('success')
success
- But, when taking the element-wise and/or of two arrays, the standard operators don't work. Instead, use the bitwise operators:
&
for "and",|
for "or".
temp_array
array([20. , 22.22, 18.33, 17.78, 16.67, 16.11, 15. , 17.78, 17.78, 17.22, 18.33, 16.67])
# Don't forget parentheses when using multiple conditions!
temp_array[(temp_array % 2 == 0) | (temp_array == temp_array.min())]
array([20., 15.])
- Read more about this here.
Multidimensional arrays and linear algebra¶
Multidimensional arrays¶
- A matrix can be represented in code using a two dimensional (2D) array.
- 2D arrays also resemble tables, or DataFrames, so it's worthwhile to study how they work.
nums = np.array([
[5, 1, 9, 7],
[9, 8, 2, 3],
[2, 5, 0, 4]
])
nums
array([[5, 1, 9, 7], [9, 8, 2, 3], [2, 5, 0, 4]])
# nums has 3 rows and 4 columns.
nums.shape
(3, 4)
- In addition to creating 2D arrays from scratch, we can also create 2D arrays by reshaping other arrays.
# Here, we're asking to reshape np.arange(1, 7)
# so that it has 2 rows and 3 columns.
a = np.arange(1, 7).reshape((2, 3))
a
array([[1, 2, 3], [4, 5, 6]])
Operations along axes¶
- In 2D arrays (and DataFrames), axis 0 refers to the rows (up and down) and axis 1 refers to the columns (left and right).
a
array([[1, 2, 3], [4, 5, 6]])
- If we specify
axis=0
,a.sum
will "compress" along axis 0.
a.sum(axis=0)
array([5, 7, 9])
- If we specify
axis=1
,a.sum
will "compress" along axis 1.
a.sum(axis=1)
array([ 6, 15])
Selecting rows and columns from 2D arrays¶
- You can use
[
square brackets]
to slice rows and columns out of an array, too.
- The general convention is:
array[<row positions>, <column positions>]
a
array([[1, 2, 3], [4, 5, 6]])
# Accesses row 0 and all columns.
a[0, :]
array([1, 2, 3])
# Same as the above.
a[0]
array([1, 2, 3])
# Accesses all rows and column 1.
a[:, 1]
array([2, 5])
# Access all rows and columns 0 and 2.
a[:, [0, 2]]
array([[1, 3], [4, 6]])
# Accesses row 0 and columns 1 and onwards.
a[0, 1:]
array([2, 3])
Activity
Suppose we run the cell below.s = (5, 3)
grid = np.ones(s) * 2 * np.arange(1, 16).reshape(s)
grid[-1, 1:].sum()
What is the output of the cell? Try and answer without writing any code.
s = (5, 3)
grid = np.ones(s) * 2 * np.arange(1, 16).reshape(s)
grid[-1, 1:].sum()
58.0
Linear algebra review¶
- Arrays are used to perform computation in the context of linear algebra! For example, let's work through Practice Question 8.1 from LARDS, but this time using code.
Consider the vectors $\vec u$ and $\vec v$, defined below:
$$\vec u = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} \qquad \vec v = \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}$$
show_projection_plot()