```
from lec_utils import *
from lec03_utils import *
from PIL import Image
```

### Announcements 📣¶

- Homework 1 is due on
**Thursday**.

Post on Ed or come to Office Hours for help!

- The Welcome Survey was due yesterday;
**please fill it out if you haven't already**!

EECS 370 has the same midterm time as us. If you're in 370, sign up to take their alternate midterm exam the following day. - We released a 🎥 walkthrough video of the "area codes" dictionary example from Lecture 2.
- We also posted the solutions to the exercises from Lecture 2 and Discussion 1 in our public GitHub repository.
**Starting today, I'll post "blank" versions of lecture notebooks before class, and "filled" versions of notebooks with the code that I write***after*class, so that you can code along with me. - Check out the new Resources tab on the course website, with links to lots of supplementary resources and past exams from similar classes!

### Agenda¶

- Recap:
`for`

-loops and dictionaries. `numpy`

arrays.- Multidimensional arrays and linear algebra.
- Randomness and simulation.

### Aside: Python Tutor¶

Python Tutor, linked on the Resources tab of the course website, visualizes the execution of Python code.

```
# The mystery example from Lecture 2.
from IPython.display import IFrame
IFrame(width="800",
height="500",
frameborder="0",
src="https://pythontutor.com/iframe-embed.html#code=def%20mystery%28vals%29%3A%0A%20%20%20%20vals%5B-1%5D%20%3D%2015%0A%20%20%20%20return%20vals.append%28'BBB'%29%0A%20%20%20%20%0Acreature%20%3D%20%5B1,%202,%203%5D%0A%0Amystery%28creature%29%0Amystery%28creature%29%0Amystery%28creature%29&codeDivHeight=400&codeDivWidth=350&cumulative=false&curInstr=14&heapPrimitives=nevernest&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false")
```

### Question 🤔 (Answer at practicaldsc.org/q)

How much progress have you made on Homework 1?

No judgement!

- A. I've submitted it!
- B. I've finished more than half of it.
- C. I've finished a few questions.
- D. I've looked at it, but haven't written any code yet.
- E. Haven't started at all.

## Recap: `for`

-loops and dictionaries¶

(source)

`for`

-loops in Python¶

- In Python, you can loop over any
**iterable**. Strings, lists, and dictionaries are all examples of iterables.

- All of the following are valid ways to write a
`for`

-loop.

```
for value in "this is a string":
for element in lst: # Assume lst is a list.
for i in range(len(lst)):
```

- One of the more common
`for`

-loop examples you may have seen in earlier classes involved performing some operation to every element of a sequence, e.g. doubling the numbers in a list.

```
def double(vals):
new_vals = []
for val in vals:
new_vals.append(vals * 2)
return new_vals
```

- We are going to
**avoid ❌**these kinds of`for`

-loops in this class, because there are**much faster**ways of achieving the same goal in`numpy`

and`pandas`

. We'll see these soon.

`while`

-loops will come up sparingly.

But conceptually, you should know how they work!

### List comprehension¶

In the situations when we do want to perform some operation to every element in a list, a common pattern is the **list comprehension**.

```
vals = [2, -1, 9, 4, 3, 8]
```

```
[val ** 2 for val in vals]
```

[4, 1, 81, 16, 9, 64]

```
[val ** 2 for val in vals if val % 2 == 0]
```

[4, 16, 64]

```
[val ** 2 if val % 2 == 0 else val + 1 for val in vals]
```

[4, 0, 10, 16, 4, 64]

### Dictionaries¶

- A dictionary stores a collection of key-value pairs.

They are the equivalent of a map in C++.

`{`

curly brackets`}`

denote the start and end of a dictionary, a colon`:`

is used to denote a single key value pair, and a comma`,`

is used to separate key-value pairs.

```
dog = {'name': 'Junior', 'age': 15, 4: ['kibble', 'treat']}
dog
```

{'name': 'Junior', 'age': 15, 4: ['kibble', 'treat']}

- We retrieve a value in a dictionary using its key.

```
dog['name']
```

'Junior'

```
dog['height']
```

--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[9], line 1 ----> 1 dog['height'] KeyError: 'height'

- After creation, we can add or change key-value pairs.

```
dog['color'] = 'beige'
dog['tricks'] = {
'easy': ['roll over', 'paw'],
'medium': ['jump']
}
```

```
dog
```

{'name': 'Junior', 'age': 15, 4: ['kibble', 'treat'], 'color': 'beige', 'tricks': {'easy': ['roll over', 'paw'], 'medium': ['jump']}}

- A dictionary's keys must be immutable (numbers, strings, Booleans), while its values can be anything.

```
# Here, we're trying to add a value with a key of [1, 2].
# Since [1, 2] is mutable, it can't be used as a key.
dog[[1, 2]] = 'does this work?'
```

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[12], line 3 1 # Here, we're trying to add a value with a key of [1, 2]. 2 # Since [1, 2] is mutable, it can't be used as a key. ----> 3 dog[[1, 2]] = 'does this work?' TypeError: unhashable type: 'list'

### Activity

Complete the implementation of the function `find_anagrams`

, which takes in `words`

, a list of strings, and returns a dictionary describing the anagrams present in `words`

. Example behavior is given below.

```
>>> find_anagrams(['dog', 'hello', 'enlist', 'silent', 'a gentleman', 'god', 'elegant man', 'listen'])
{'dgo': ['dog', 'god'],
'ehllo': ['hello'],
'eilnst': ['enlist', 'silent', 'listen'],
' aaeeglmnnt': ['a gentleman', 'elegant man']}
```

```
example_words = ['dog', 'hello', 'enlist', 'silent', 'a gentleman', 'god', 'elegant man', 'listen']
def find_anagrams(words):
out = {}
for word in words:
word_sorted = ''.join(sorted(word))
if word_sorted in out:
out[word_sorted].append(word)
else:
out[word_sorted] = [word]
return out
```

```
```

```
```

`numpy`

arrays¶

### Import statements¶

- We use
`import`

statements to add the objects (values, functions, classes) defined in other modules to our programs. There are a few different ways to`import`

.

Other terms I'll use for "module" are "library" and "package".

**Option 1**:`import module`

.

Now, everytime we want to use a name in `module`

, we must write `module.<name>`

.

```
import math
```

```
math.sqrt(15)
```

3.872983346207417

```
sqrt(15)
```

--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[16], line 1 ----> 1 sqrt(15) NameError: name 'sqrt' is not defined

**Option 2**:`import module as m`

.

Now, everytime we want to use a name in `module`

, we can write `m.<name>`

instead of `module.<name>`

.

```
# This is the standard way that we will import numpy.
import numpy as np
```

```
np.pi
```

3.141592653589793

```
np.linalg.inv([[2, 1],
[3, 4]])
```

array([[ 0.8, -0.2], [-0.6, 0.4]])

**Option 3**:`from module import ...`

.

This way, we explicitly state the names we want to import from `module`

.

To import everything, write `from module import *`

.

```
# Importing a particular function from the requests module.
from requests import get
```

```
# This typically fills up the namespace with a lot of unnecessary names, so use sparingly.
from math import *
```

```
sqrt
```

<function math.sqrt(x, /)>

### NumPy¶

- NumPy (pronounced "num pie") is a Python library (module) that provides support for
**arrays**and operations on them.

- The
`pandas`

library, which we will use for tabular data manipulation, works in conjunction with`numpy`

.

- To use
`numpy`

, we need to import it. It's usually imported as`np`

(but doesn't have to be!)

We also had to install it on your computer first, but you already did that when you set up your environment.

```
import numpy as np
```

### Arrays¶

- The core data structure in
`numpy`

is the array. Moving forward, "array" will always refer to a`numpy`

array.

- One way to instantiate an array is to pass a list as an argument to the function
`np.array`

.

```
np.array([4, 9, 1, 2])
```

array([4, 9, 1, 2])

- Arrays, unlike lists, must be
**homogenous**– all elements must be of the same type.

```
# All elements are converted to strings!
np.array([1961, 'michigan'])
```

array(['1961', 'michigan'], dtype='<U21')

### Array-number arithmetic¶

- Arrays make it easy to perform the same operation to every element
**without a**. This behavior is formally known as "broadcasting", but we often say these operations are`for`

-loop**vectorized**.

```
temps = [68, 72, 65, 64, 62, 61, 59, 64, 64, 63, 65, 62]
temps
```

[68, 72, 65, 64, 62, 61, 59, 64, 64, 63, 65, 62]

```
temp_array = np.array(temps)
```

```
# Increase all temperatures by 3 degrees.
temp_array + 3
```

array([71, 75, 68, 67, 65, 64, 62, 67, 67, 66, 68, 65])

```
# Halve all temperatures.
temp_array / 2
```

array([34. , 36. , 32.5, 32. , 31. , 30.5, 29.5, 32. , 32. , 31.5, 32.5, 31. ])

```
# Convert all temperatures to Celsius.
(5 / 9) * (temp_array - 32)
```

array([20. , 22.22, 18.33, 17.78, 16.67, 16.11, 15. , 17.78, 17.78, 17.22, 18.33, 16.67])

**Note**: In none of the above cells did we actually modify`temp_array`

! Each of those expressions created a new array. To actually change`temp_array`

, we need to reassign it to a new array.

```
temp_array
```

array([68, 72, 65, 64, 62, 61, 59, 64, 64, 63, 65, 62])

```
temp_array = (5 / 9) * (temp_array - 32)
```

```
# Now in Celsius!
temp_array
```

array([20. , 22.22, 18.33, 17.78, 16.67, 16.11, 15. , 17.78, 17.78, 17.22, 18.33, 16.67])

### ⚠️ The dangers of unnecessary `for`

-loops¶

- Under the hood,
`numpy`

is implemented in C and Fortran, which are compiled languages that are much faster than Python. As a result, these**vectorized**operations are much quicker than if we used a vanilla Python`for`

-loop.

Also, the fact that arrays must be homogenous lend themselves to more efficient representations in memory.

- We can time code in a Jupyter Notebook. Let's try and square a long sequence of integers and see how long it takes with a Python loop:

```
%%timeit
squares = []
for i in range(1_000_000):
squares.append(i * i)
```

46.7 ms ± 790 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

- In vanilla Python, this takes about 0.04 seconds per loop. In
`numpy`

:

```
%%timeit
squares = np.arange(1_000_000) ** 2
```

1.51 ms ± 47.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

- Only takes about 0.001 seconds per loop, more than 40x faster!

### Element-wise arithmetic¶

- We can apply arithmetic operations to multiple arrays, provided they have the same length.
- The result is computed
**element-wise**, which means that the arithmetic operation is applied to one pair of elements from each array at a time.

```
a = np.array([4, 5, -1])
b = np.array([2, 3, 2])
```

```
a + b
```

array([6, 8, 1])

```
a / b
```

array([ 2. , 1.67, -0.5 ])

```
a ** 2 + b ** 2
```

array([20, 34, 5])

```
arr = np.array([3, 8, 4, -3.2])
```

```
(2 ** arr).sum()
```

280.108818820412

```
(2 ** arr).mean()
```

70.027204705103

```
(2 ** arr).max()
```

256.0

```
(2 ** arr).argmax()
```

1

```
# An attribute, not a method.
arr.shape
```

(4,)

### Question 🤔 (Answer at practicaldsc.org/q)

What questions do we have about arrays so far?

### Activity

🎉 Congrats! 🎉 You won the lottery 💰. Here's how your payout works: on the first day of September, you are paid \$0.01. Every day thereafter, your pay doubles, so on the second day you're paid \$0.02, on the third day you're paid \$0.04, on the fourth day you're paid \$0.08, and so on.

September has 30 days.

Write a **one-line expression** that uses the numbers `2`

and `30`

, along with the function `np.arange`

and at least one array method, that computes the total amount **in dollars** you will be paid in September. No `for`

-loops or list comprehensions allowed!

***Note***: We have a 🎥 walkthrough video of this problem, but don't watch it until you've tried it yourself!

```
(2 ** np.arange(30) / 100).sum()
```

10737418.23

### Boolean filtering¶

- Comparisons with arrays yield
**Boolean**arrays! These can be used to answer questions about the values in an array.

```
temp_array
```

array([20. , 22.22, 18.33, 17.78, 16.67, 16.11, 15. , 17.78, 17.78, 17.22, 18.33, 16.67])

```
temp_array >= 18
```

array([ True, True, True, False, False, False, False, False, False, False, True, False])

- How many values are greater than or equal to 18?

```
(temp_array >= 18).sum()
```

4

- What fraction of values are greater than or equal to 18?

```
(temp_array >= 18).mean()
```

0.3333333333333333

- Which values are greater than or equal to 18?

```
temp_array[temp_array >= 18]
```

array([20. , 22.22, 18.33, 18.33])

- Which values are between 18 and 20?

```
# Note the parentheses!
temp_array[(temp_array >= 18) & (temp_array <= 20)]
```

array([20. , 18.33, 18.33])

```
# WRONG!
temp_array[(temp_array >= 18) and (temp_array <= 20)]
```

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[53], line 2 1 # WRONG! ----> 2 temp_array[(temp_array >= 18) and (temp_array <= 20)] ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

### Note: & and | vs. and and or¶

- In Python, the standard symbols for "and" and "or" are, literally,
`and`

and`or`

.

```
if (5 > 3 and 'h' + 'i' == 'hi') or (-2 > 0):
print('success')
```

success

- But, when taking the
**element-wise**and/or of two arrays, the standard operators don't work. Instead, use the**bitwise**operators:`&`

for "and",`|`

for "or".

```
temp_array
```

array([20. , 22.22, 18.33, 17.78, 16.67, 16.11, 15. , 17.78, 17.78, 17.22, 18.33, 16.67])

```
# Don't forget parentheses when using multiple conditions!
temp_array[(temp_array % 2 == 0) | (temp_array == temp_array.min())]
```

array([20., 15.])

- Read more about this here.

## Multidimensional arrays and linear algebra¶

### Multidimensional arrays¶

- A matrix can be represented in code using a two dimensional (2D) array.

- 2D arrays also resemble tables, or DataFrames, so it's worthwhile to study how they work.

```
nums = np.array([
[5, 1, 9, 7],
[9, 8, 2, 3],
[2, 5, 0, 4]
])
nums
```

array([[5, 1, 9, 7], [9, 8, 2, 3], [2, 5, 0, 4]])

```
# nums has 3 rows and 4 columns.
nums.shape
```

(3, 4)

- In addition to creating 2D arrays from scratch, we can also create 2D arrays by
*reshaping*other arrays.

```
# Here, we're asking to reshape np.arange(1, 7)
# so that it has 2 rows and 3 columns.
a = np.arange(1, 7).reshape((2, 3))
a
```

array([[1, 2, 3], [4, 5, 6]])

### Operations along axes¶

- In 2D arrays (and DataFrames), axis 0 refers to the rows (up and down) and axis 1 refers to the columns (left and right).

```
a
```

array([[1, 2, 3], [4, 5, 6]])

- If we specify
`axis=0`

,`a.sum`

will "compress" along axis 0.

```
a.sum(axis=0)
```

array([5, 7, 9])

- If we specify
`axis=1`

,`a.sum`

will "compress" along axis 1.

```
a.sum(axis=1)
```

array([ 6, 15])

### Selecting rows and columns from 2D arrays¶

- You can use
`[`

square brackets`]`

to**slice**rows and columns out of an array, too.

- The general convention is:

```
array[<row positions>, <column positions>]
```

```
a
```

array([[1, 2, 3], [4, 5, 6]])

```
# Accesses row 0 and all columns.
a[0, :]
```

array([1, 2, 3])

```
# Same as the above.
a[0]
```

array([1, 2, 3])

```
# Accesses all rows and column 1.
a[:, 1]
```

array([2, 5])

```
# Access all rows and columns 0 and 2.
a[:, [0, 2]]
```

array([[1, 3], [4, 6]])

```
# Accesses row 0 and columns 1 and onwards.
a[0, 1:]
```

array([2, 3])

### Activity

Suppose we run the cell below.```
s = (5, 3)
grid = np.ones(s) * 2 * np.arange(1, 16).reshape(s)
grid[-1, 1:].sum()
```

What is the output of the cell? **Try and answer without writing any code.**

```
s = (5, 3)
grid = np.ones(s) * 2 * np.arange(1, 16).reshape(s)
grid[-1, 1:].sum()
```

58.0

```
```

```
```

### Linear algebra review¶

- Arrays are used to perform computation in the context of linear algebra! For example, let's work through Practice Question 8.1 from LARDS, but this time using code.

Consider the vectors $\vec u$ and $\vec v$, defined below:

$$\vec u = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} \qquad \vec v = \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}$$

```
show_projection_plot()
```