from lec_utils import *

schedule = pd.read_csv('data/2024-schedule.tsv', sep='\t') 
schedule.head()

<Response [200]>

str

219874

<!DOCTYPE html>
<html lang="en">
<!-- 
     :::    :::   :::   :::          :::::::::: :::     ::: :::::::::: ::::    ::: ::::::::::: :::::::: 
    :+:    :+:  :+:+: :+:+:         :+:        :+:     :+: :+:        :+:+:   :+:     :+:    :+:    :+: 
   +:+    +:+ +:+ +:+:+ +:+        +:+        +:+     +:+ +:+        :+:+:+  +:+     +:+    +:+         
  +#+    +:+ +#+  +:+  +#+        +#++:++#   +#+     +:+ +#++:++#   +#+ +:+ +#+     +#+    +#++:++#++   
 +#+    +#+ +#+       +#+        +#+         +#+   +#+  +#+        +#+  +#+#+#     +#+           +#+    
#+#    #+# #+#       #+#        #+#          #+#+#+#   #+#        #+#   #+#+#     #+#    #+#    #+#     
########  ###       ###        ##########     ###     ########## ###    ####     ###     ########    
Version: 6.0 - Mustard's Retreat
-->
    <head>
        <meta name="viewport" content="initial-scale=1, maximum-scale=1">
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
                <title>Happening at the University of Michigan | Happening @ Michigan</title>
                <link rel="icon" sizes="16x16" href="/favicon.ico" />
						    <meta property="og:title" content="Happening @ Michigan" />
		  						    <meta property="og:image" content="default-image.png" />
        							<link rel="stylesheet" href="/css/jquery-ui-custom.css" /> <link rel="stylesheet" href="/bundles/umevents/css/jquery-ui.structure.min.css" />
<link rel="stylesheet" href="/bundles/umevents/css/jquery-ui.theme.min.css" />
<link rel="stylesheet" href="/css/main.css" /> 
							<script type="text/javascript" src="/js-dist/jquery.min.js"></script>
<script type="text/javascript" src="/js-dist/jquery-ui.min.js"></script>
<script type="text/javascript" src="/js/modals.js"></script>
<script type="text/javascript" src="/js/infoPoint.js"></script>
<script type="text/javascript" src="/js/jquery.unveil.js"></script>
<script type="text/javascript" src="/js/jquery.windowaction.js"></script>
<script type="text/javas

<Response [200]>

'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "name": "Go Blue"\n  }, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate, br", \n    "Content-Length": "12", \n    "Content-Type": "application/x-www-form-urlencoded", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.32.3", \n    "X-Amzn-Trace-Id": "Root=1-66f2db61-2d8211151c1be3905e7cf53c"\n  }, \n  "json": null, \n  "origin": "35.3.45.217", \n  "url": "https://httpbin.org/post"\n}\n'

{'args': {},
 'data': '',
 'files': {},
 'form': {'name': 'Go Blue'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate, br',
  'Content-Length': '12',
  'Content-Type': 'application/x-www-form-urlencoded',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.32.3',
  'X-Amzn-Trace-Id': 'Root=1-66f2db61-2d8211151c1be3905e7cf53c'},
 'json': None,
 'origin': '35.3.45.217',
 'url': 'https://httpbin.org/post'}

<Response [400]>

403

<html>
  <head>
    <title>Page title</title>
  </head>

  <body>
    <h1>This is a heading</h1>
    <p>This is a paragraph.</p>
    <p>This is <b>another</b> paragraph.</p>
  </body>
</html>

schedule = pd.read_csv('data/2024-schedule.tsv', sep='\t') 
schedule.head()

<Response [200]>

str

219874

<!DOCTYPE html>
<html lang="en">
<!-- 
     :::    :::   :::   :::          :::::::::: :::     ::: :::::::::: ::::    ::: ::::::::::: :::::::: 
    :+:    :+:  :+:+: :+:+:         :+:        :+:     :+: :+:        :+:+:   :+:     :+:    :+:    :+: 
   +:+    +:+ +:+ +:+:+ +:+        +:+        +:+     +:+ +:+        :+:+:+  +:+     +:+    +:+         
  +#+    +:+ +#+  +:+  +#+        +#++:++#   +#+     +:+ +#++:++#   +#+ +:+ +#+     +#+    +#++:++#++   
 +#+    +#+ +#+       +#+        +#+         +#+   +#+  +#+        +#+  +#+#+#     +#+           +#+    
#+#    #+# #+#       #+#        #+#          #+#+#+#   #+#        #+#   #+#+#     #+#    #+#    #+#     
########  ###       ###        ##########     ###     ########## ###    ####     ###     ########    
Version: 6.0 - Mustard's Retreat
-->
    <head>
        <meta name="viewport" content="initial-scale=1, maximum-scale=1">
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
                <title>Happening at the University of Michigan | Happening @ Michigan</title>
                <link rel="icon" sizes="16x16" href="/favicon.ico" />
						    <meta property="og:title" content="Happening @ Michigan" />
		  						    <meta property="og:image" content="default-image.png" />
        							<link rel="stylesheet" href="/css/jquery-ui-custom.css" /> <link rel="stylesheet" href="/bundles/umevents/css/jquery-ui.structure.min.css" />
<link rel="stylesheet" href="/bundles/umevents/css/jquery-ui.theme.min.css" />
<link rel="stylesheet" href="/css/main.css" /> 
							<script type="text/javascript" src="/js-dist/jquery.min.js"></script>
<script type="text/javascript" src="/js-dist/jquery-ui.min.js"></script>
<script type="text/javascript" src="/js/modals.js"></script>
<script type="text/javascript" src="/js/infoPoint.js"></script>
<script type="text/javascript" src="/js/jquery.unveil.js"></script>
<script type="text/javascript" src="/js/jquery.windowaction.js"></script>
<script type="text/javas

<Response [200]>

'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "name": "Go Blue"\n  }, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate, br", \n    "Content-Length": "12", \n    "Content-Type": "application/x-www-form-urlencoded", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.32.3", \n    "X-Amzn-Trace-Id": "Root=1-66f2db61-2d8211151c1be3905e7cf53c"\n  }, \n  "json": null, \n  "origin": "35.3.45.217", \n  "url": "https://httpbin.org/post"\n}\n'

{'args': {},
 'data': '',
 'files': {},
 'form': {'name': 'Go Blue'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate, br',
  'Content-Length': '12',
  'Content-Type': 'application/x-www-form-urlencoded',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.32.3',
  'X-Amzn-Trace-Id': 'Root=1-66f2db61-2d8211151c1be3905e7cf53c'},
 'json': None,
 'origin': '35.3.45.217',
 'url': 'https://httpbin.org/post'}

<Response [400]>

403

<html>
  <head>
    <title>Page title</title>
  </head>

  <body>
    <h1>This is a heading</h1>
    <p>This is a paragraph.</p>
    <p>This is <b>another</b> paragraph.</p>
  </body>
</html>

schedule = pd.read_csv('data/2024-schedule.tsv', sep='\t') 
schedule.head()

import requests

res = requests.get('https://events.umich.edu')

res

<Response [200]>

type(res.text)

str

len(res.text)

219874

print(res.text[:2000])

<!DOCTYPE html>
<html lang="en">
<!-- 
     :::    :::   :::   :::          :::::::::: :::     ::: :::::::::: ::::    ::: ::::::::::: :::::::: 
    :+:    :+:  :+:+: :+:+:         :+:        :+:     :+: :+:        :+:+:   :+:     :+:    :+:    :+: 
   +:+    +:+ +:+ +:+:+ +:+        +:+        +:+     +:+ +:+        :+:+:+  +:+     +:+    +:+         
  +#+    +:+ +#+  +:+  +#+        +#++:++#   +#+     +:+ +#++:++#   +#+ +:+ +#+     +#+    +#++:++#++   
 +#+    +#+ +#+       +#+        +#+         +#+   +#+  +#+        +#+  +#+#+#     +#+           +#+    
#+#    #+# #+#       #+#        #+#          #+#+#+#   #+#        #+#   #+#+#     #+#    #+#    #+#     
########  ###       ###        ##########     ###     ########## ###    ####     ###     ########    
Version: 6.0 - Mustard's Retreat
-->
    <head>
        <meta name="viewport" content="initial-scale=1, maximum-scale=1">
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
                <title>Happening at the University of Michigan | Happening @ Michigan</title>
                <link rel="icon" sizes="16x16" href="/favicon.ico" />
						    <meta property="og:title" content="Happening @ Michigan" />
		  						    <meta property="og:image" content="default-image.png" />
        							<link rel="stylesheet" href="/css/jquery-ui-custom.css" /> <link rel="stylesheet" href="/bundles/umevents/css/jquery-ui.structure.min.css" />
<link rel="stylesheet" href="/bundles/umevents/css/jquery-ui.theme.min.css" />
<link rel="stylesheet" href="/css/main.css" /> 
							<script type="text/javascript" src="/js-dist/jquery.min.js"></script>
<script type="text/javascript" src="/js-dist/jquery-ui.min.js"></script>
<script type="text/javascript" src="/js/modals.js"></script>
<script type="text/javascript" src="/js/infoPoint.js"></script>
<script type="text/javascript" src="/js/jquery.unveil.js"></script>
<script type="text/javascript" src="/js/jquery.windowaction.js"></script>
<script type="text/javas

post_res = requests.post('https://httpbin.org/post',
                         data={'name': 'Go Blue'})
post_res

<Response [200]>

'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "name": "Go Blue"\n  }, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate, br", \n    "Content-Length": "12", \n    "Content-Type": "application/x-www-form-urlencoded", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.32.3", \n    "X-Amzn-Trace-Id": "Root=1-66f2db61-2d8211151c1be3905e7cf53c"\n  }, \n  "json": null, \n  "origin": "35.3.45.217", \n  "url": "https://httpbin.org/post"\n}\n'

{'args': {},
 'data': '',
 'files': {},
 'form': {'name': 'Go Blue'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate, br',
  'Content-Length': '12',
  'Content-Type': 'application/x-www-form-urlencoded',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.32.3',
  'X-Amzn-Trace-Id': 'Root=1-66f2db61-2d8211151c1be3905e7cf53c'},
 'json': None,
 'origin': '35.3.45.217',
 'url': 'https://httpbin.org/post'}

<Response [400]>

403

<html>
  <head>
    <title>Page title</title>
  </head>

  <body>
    <h1>This is a heading</h1>
    <p>This is a paragraph.</p>
    <p>This is <b>another</b> paragraph.</p>
  </body>
</html>

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

bs4.BeautifulSoup

      
        Heading here
        My First paragraph
        My second paragraph
        
      
      
        
          item 1
          item 2
          item 3

<div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>

post_res = requests.post('https://httpbin.org/post',
                         data={'name': 'Go Blue'})
post_res

<Response [200]>

post_res.text

'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "name": "Go Blue"\n  }, \n  "headers": {\n    "Accept": "*/*", \n    "Accept-Encoding": "gzip, deflate, br", \n    "Content-Length": "12", \n    "Content-Type": "application/x-www-form-urlencoded", \n    "Host": "httpbin.org", \n    "User-Agent": "python-requests/2.32.3", \n    "X-Amzn-Trace-Id": "Root=1-66f2db61-2d8211151c1be3905e7cf53c"\n  }, \n  "json": null, \n  "origin": "35.3.45.217", \n  "url": "https://httpbin.org/post"\n}\n'

post_res.json()

{'args': {},
 'data': '',
 'files': {},
 'form': {'name': 'Go Blue'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate, br',
  'Content-Length': '12',
  'Content-Type': 'application/x-www-form-urlencoded',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.32.3',
  'X-Amzn-Trace-Id': 'Root=1-66f2db61-2d8211151c1be3905e7cf53c'},
 'json': None,
 'origin': '35.3.45.217',
 'url': 'https://httpbin.org/post'}

yt_res = requests.post('https://youtube.com',
                       data={'name': 'Go Blue'})
yt_res

<Response [400]>

# This takes the text of yt_res and renders it as an HTML document within our notebook!
HTML(yt_res.text)

res = requests.get('https://cse.engin.umich.edu/people/faculty/') 
res.status_code

403

from IPython.display import HTML
HTML(res.text)

!cat data/lec09_ex1.html

<html>
  <head>
    <title>Page title</title>
  </head>

  <body>
    <h1>This is a heading</h1>
    <p>This is a paragraph.</p>
    <p>This is <b>another</b> paragraph.</p>
  </body>
</html>

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

bs4.BeautifulSoup

      
        Heading here
        My First paragraph
        My second paragraph
        
      
      
        
          item 1
          item 2
          item 3

<div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>

<div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>

<ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

[<div id="content">
         <h1>Heading here</h1>
         <p>My First paragraph</p>
         <p>My <em>second</em> paragraph</p>
         <hr/>
       </div>,
 <div id="nav">
         <ul>
           <li>item 1</li>
           <li>item 2</li>
           <li>item 3</li>
         </ul>
       </div>]

[<li>item 1</li>, <li>item 2</li>, <li>item 3</li>]

!cat data/lec09_ex1.html

<html>
  <head>
    <title>Page title</title>
  </head>

  <body>
    <h1>This is a heading</h1>
    <p>This is a paragraph.</p>
    <p>This is <b>another</b> paragraph.</p>
  </body>
</html>

HTML('data/lec09_ex1.html')

<p style="color: red">Look at my red text!</p>

<img src="cool-visualization.png" alt="My box plot that I'm super proud of." width=500>

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

bs4.BeautifulSoup

      
        Heading here
        My First paragraph
        My second paragraph
        
      
      
        
          item 1
          item 2
          item 3

<div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>

<div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>

<ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

[<div id="content">
         <h1>Heading here</h1>
         <p>My First paragraph</p>
         <p>My <em>second</em> paragraph</p>
         <hr/>
       </div>,
 <div id="nav">
         <ul>
           <li>item 1</li>
           <li>item 2</li>
           <li>item 3</li>
         </ul>
       </div>]

[<li>item 1</li>, <li>item 2</li>, <li>item 3</li>]

['item 1', 'item 2', 'item 3']

<img src="cool-visualization.png" alt="My box plot that I'm super proud of." width=500>

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

bs4.BeautifulSoup

      
        Heading here
        My First paragraph
        My second paragraph
        
      
      
        
          item 1
          item 2
          item 3

<div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>

<div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>

<ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

[<div id="content">
         <h1>Heading here</h1>
         <p>My First paragraph</p>
         <p>My <em>second</em> paragraph</p>
         <hr/>
       </div>,
 <div id="nav">
         <ul>
           <li>item 1</li>
           <li>item 2</li>
           <li>item 3</li>
         </ul>
       </div>]

[<li>item 1</li>, <li>item 2</li>, <li>item 3</li>]

['item 1', 'item 2', 'item 3']

<img src="cool-visualization.png" alt="My box plot that I'm super proud of." width=500>

Click <a href="https://study.practicaldsc.org">this link</a> to access past exams.

<div class="background">
          <h3>This is a heading</h3>
          <p>This is a paragraph.</p>
        </div>

html_string = '''
<html>
    <body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    </body>
</html>
'''.strip()

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

bs4.BeautifulSoup

      
        Heading here
        My First paragraph
        My second paragraph
        
      
      
        
          item 1
          item 2
          item 3

<div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>

<div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>

<ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

[<div id="content">
         <h1>Heading here</h1>
         <p>My First paragraph</p>
         <p>My <em>second</em> paragraph</p>
         <hr/>
       </div>,
 <div id="nav">
         <ul>
           <li>item 1</li>
           <li>item 2</li>
           <li>item 3</li>
         </ul>
       </div>]

[<li>item 1</li>, <li>item 2</li>, <li>item 3</li>]

['item 1', 'item 2', 'item 3']

html_string = '''
<html>
    <body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    </body>
</html>
'''.strip()

HTML(html_string)

# We also could have used:
# import bs4
# But, then we'd need to use bs4.BeautifulSoup every time.
from bs4 import BeautifulSoup

BeautifulSoup?

soup = BeautifulSoup(html_string) 
soup

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

type(soup)

bs4.BeautifulSoup

print(soup.text)

      
        Heading here
        My First paragraph
        My second paragraph
        
      
      
        
          item 1
          item 2
          item 3

soup.find('div')

<div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>

soup.find('div', attrs={'id': 'nav'})

<div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>

# The ul child is not at the top of the tree, but we can still find it.
soup.find('ul')

<ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>

soup

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

soup.find_all('div')

[<div id="content">
         <h1>Heading here</h1>
         <p>My First paragraph</p>
         <p>My <em>second</em> paragraph</p>
         <hr/>
       </div>,
 <div id="nav">
         <ul>
           <li>item 1</li>
           <li>item 2</li>
           <li>item 3</li>
         </ul>
       </div>]

soup.find_all('li')

[<li>item 1</li>, <li>item 2</li>, <li>item 3</li>]

[x.text for x in soup.find_all('li')]

['item 1', 'item 2', 'item 3']

soup.find('p')

<p>My First paragraph</p>

soup.find('p').text

'My First paragraph'

soup.find('div')

<div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>

soup.find('div').text

'\n        Heading here\n        My First paragraph\n        My second paragraph\n        \n      '

soup.find('div').attrs

{'id': 'content'}

soup.find('div').get('id')

'content'

soup

<html><head></head><body>
      <div id="content">
        <h1>Heading here</h1>
        <p>My First paragraph</p>
        <p>My <em>second</em> paragraph</p>
        <hr/>
      </div>
      <div id="nav">
        <ul>
          <li>item 1</li>
          <li>item 2</li>
          <li>item 3</li>
        </ul>
      </div>
    
</body></html>

# While there are multiple 'id' attributes, none of them are in the <html> tag at the top.
soup.get('id')

soup.find('div').get('id')

'content'

<head>
    <title>3*Canada-2022-06-04</title>
</head>
<body>
    <h1>Spotify Top 3 - Canada</h1>
    <table>
        <tr class='heading'>
            <th>Rank</th>
            <th>Artist(s)</th> 
            <th>Song</th>
        </tr>
        <tr class=1>
            <td>1</td>
            <td>Harry Styles</td> 
            <td>As It Was</td>
        </tr>
        <tr class=2>
            <td>2</td>
            <td>Jack Harlow</td> 
            <td>First Class</td>
        </tr>
        <tr class=3>
            <td>3</td>
            <td>Kendrick Lamar</td> 
            <td>N95</td>
        </tr>
    </table>
</body>

<!DOCTYPE html>
<html lang="en"><head>
	<meta charset="utf-8"/>
	<title>Quotes to Scrape</title>
    <link href="/static/bootstrap.min.css" rel="stylesheet"/>
    <link href="/static/main.css" rel="stylesheet"/>
</head>
<body>
    <div class="container">
        <div class="row header-box">
            <div class="col-md-8">
                <h1>
                    <a href="/" style="text-decoration: none">Quotes to Scrape</a>
                </h1>
            </div>
            <div class="col-md-4">
                <p>
                
                    <a href="/login">Login</a>
                
                </p>
            </div>
        </div>
    

<div class="row">
    <div class="col-md-8">

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fail once, doesn't mean you're gonna fail at everything. Keep trying, hold on, and always, always, always believe in yourself, because if you don't, then who will, sweetie? So keep your head high, keep your chin up, and most importantly, keep smiling, because life's a beautiful thing and there's so much to smile about.”</span>
        <span>by <small class="author" itemprop="author">Marilyn Monroe</small>
        <a href="/author/Marilyn-Monroe">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="friends,heartbreak,inspirational,life,love,sisters" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/friends/page/1/">friends</a>
            
            <a class="tag" href="/tag/heartbreak/page/1/">heartbreak</a>
            
            <a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
            
            <a class="tag" href="/tag/life/page/1/">life</a>
            
            <a class="tag" href="/tag/love/page/1/">love</a>
            
            <a class="tag" href="/tag/sisters/page/1/">sisters</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“It takes a great deal of bravery to stand up to our enemies, but just as much to stand up to our friends.”</span>
        <span>by <small class="author" itemprop="author">J.K. Rowling</small>
        <a href="/author/J-K-Rowling">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="courage,friends" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/courage/page/1/">courage</a>
            
            <a class="tag" href="/tag/friends/page/1/">friends</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“If you can't explain it to a six year old, you don't understand it yourself.”</span>
        <span>by <small class="author" itemprop="author">Albert Einstein</small>
        <a href="/author/Albert-Einstein">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="simplicity,understand" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/simplicity/page/1/">simplicity</a>
            
            <a class="tag" href="/tag/understand/page/1/">understand</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“You may not be her first, her last, or her only. She loved before she may love again. But if she loves you now, what else matters? She's not perfect—you aren't either, and the two of you may never be perfect together but if she can make you laugh, cause you to think twice, and admit to being human and making mistakes, hold onto her and give her the most you can. She may not be thinking about you every second of the day, but she will give you a part of her that she knows you can break—her heart. So don't hurt her, don't change her, don't analyze and don't expect more than she can give. Smile when she makes you happy, let her know when she makes you mad, and miss her when she's not there.”</span>
        <span>by <small class="author" itemprop="author">Bob Marley</small>
        <a href="/author/Bob-Marley">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="love" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/love/page/1/">love</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“I like nonsense, it wakes up the brain cells. Fantasy is a necessary ingredient in living.”</span>
        <span>by <small class="author" itemprop="author">Dr. Seuss</small>
        <a href="/author/Dr-Seuss">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="fantasy" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/fantasy/page/1/">fantasy</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“I may not have gone where I intended to go, but I think I have ended up where I needed to be.”</span>
        <span>by <small class="author" itemprop="author">Douglas Adams</small>
        <a href="/author/Douglas-Adams">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="life,navigation" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/life/page/1/">life</a>
            
            <a class="tag" href="/tag/navigation/page/1/">navigation</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“The opposite of love is not hate, it's indifference. The opposite of art is not ugliness, it's indifference. The opposite of faith is not heresy, it's indifference. And the opposite of life is not death, it's indifference.”</span>
        <span>by <small class="author" itemprop="author">Elie Wiesel</small>
        <a href="/author/Elie-Wiesel">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="activism,apathy,hate,indifference,inspirational,love,opposite,philosophy" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/activism/page/1/">activism</a>
            
            <a class="tag" href="/tag/apathy/page/1/">apathy</a>
            
            <a class="tag" href="/tag/hate/page/1/">hate</a>
            
            <a class="tag" href="/tag/indifference/page/1/">indifference</a>
            
            <a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
            
            <a class="tag" href="/tag/love/page/1/">love</a>
            
            <a class="tag" href="/tag/opposite/page/1/">opposite</a>
            
            <a class="tag" href="/tag/philosophy/page/1/">philosophy</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“It is not a lack of love, but a lack of friendship that makes unhappy marriages.”</span>
        <span>by <small class="author" itemprop="author">Friedrich Nietzsche</small>
        <a href="/author/Friedrich-Nietzsche">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="friendship,lack-of-friendship,lack-of-love,love,marriage,unhappy-marriage" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/friendship/page/1/">friendship</a>
            
            <a class="tag" href="/tag/lack-of-friendship/page/1/">lack-of-friendship</a>
            
            <a class="tag" href="/tag/lack-of-love/page/1/">lack-of-love</a>
            
            <a class="tag" href="/tag/love/page/1/">love</a>
            
            <a class="tag" href="/tag/marriage/page/1/">marriage</a>
            
            <a class="tag" href="/tag/unhappy-marriage/page/1/">unhappy-marriage</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“Good friends, good books, and a sleepy conscience: this is the ideal life.”</span>
        <span>by <small class="author" itemprop="author">Mark Twain</small>
        <a href="/author/Mark-Twain">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="books,contentment,friends,friendship,life" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/books/page/1/">books</a>
            
            <a class="tag" href="/tag/contentment/page/1/">contentment</a>
            
            <a class="tag" href="/tag/friends/page/1/">friends</a>
            
            <a class="tag" href="/tag/friendship/page/1/">friendship</a>
            
            <a class="tag" href="/tag/life/page/1/">life</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“Life is what happens to us while we are making other plans.”</span>
        <span>by <small class="author" itemprop="author">Allen Saunders</small>
        <a href="/author/Allen-Saunders">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="fate,life,misattributed-john-lennon,planning,plans" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/fate/page/1/">fate</a>
            
            <a class="tag" href="/tag/life/page/1/">life</a>
            
            <a class="tag" href="/tag/misattributed-john-lennon/page/1/">misattributed-john-lennon</a>
            
            <a class="tag" href="/tag/planning/page/1/">planning</a>
            
            <a class="tag" href="/tag/plans/page/1/">plans</a>
            
        </div>
    </div>

    <nav>
        <ul class="pager">
            
            <li class="previous">
                <a href="/page/1/"><span aria-hidden="true">←</span> Previous</a>
            </li>
            
            
            <li class="next">
                <a href="/page/3/">Next <span aria-hidden="true">→</span></a>
            </li>
            
        </ul>
    </nav>
    </div>
    <div class="col-md-4 tags-box">
        
            <h2>Top Ten tags</h2>
            
            <span class="tag-item">
            <a class="tag" href="/tag/love/" style="font-size: 28px">love</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/inspirational/" style="font-size: 26px">inspirational</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/life/" style="font-size: 26px">life</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/humor/" style="font-size: 24px">humor</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/books/" style="font-size: 22px">books</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/reading/" style="font-size: 14px">reading</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/friendship/" style="font-size: 10px">friendship</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/friends/" style="font-size: 8px">friends</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/truth/" style="font-size: 8px">truth</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/simile/" style="font-size: 6px">simile</a>
            </span>
            
        
    </div>
</div>

    </div>
    <footer class="footer">
        <div class="container">
            <p class="text-muted">
                Quotes by: <a href="https://www.goodreads.com/quotes">GoodReads.com</a>
            </p>
            <p class="copyright">
                Made with <span class="zyte">❤</span> by <a class="zyte" href="https://www.zyte.com">Zyte</a>
            </p>
        </div>
    </footer>

</body></html>

<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fail once, doesn't mean you're gonna fail at everything. Keep trying, hold on, and always, always, always believe in yourself, because if you don't, then who will, sweetie? So keep your head high, keep your chin up, and most importantly, keep smiling, because life's a beautiful thing and there's so much to smile about.”</span>
        <span>by <small class="author" itemprop="author">Marilyn Monroe</small>
        <a href="/author/Marilyn-Monroe">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="friends,heartbreak,inspirational,life,love,sisters" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/friends/page/1/">friends</a>
            
            <a class="tag" href="/tag/heartbreak/page/1/">heartbreak</a>
            
            <a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
            
            <a class="tag" href="/tag/life/page/1/">life</a>
            
            <a class="tag" href="/tag/love/page/1/">love</a>
            
            <a class="tag" href="/tag/sisters/page/1/">sisters</a>
            
        </div>
    </div>

"“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fail once, doesn't mean you're gonna fail at everything. Keep trying, hold on, and always, always, always believe in yourself, because if you don't, then who will, sweetie? So keep your head high, keep your chin up, and most importantly, keep smiling, because life's a beautiful thing and there's so much to smile about.”"

'Marilyn Monroe'

'/author/Marilyn-Monroe'

'friends,heartbreak,inspirational,life,love,sisters'

{'quote': '“I like nonsense, it wakes up the brain cells. Fantasy is a necessary ingredient in living.”',
 'author': 'Dr. Seuss',
 'author_url': 'https://quotes.toscrape.com/author/Dr-Seuss',
 'tags': 'fantasy'}

<Response [200]>

89

<div class="col-xs-12 col-sm-4 col-md-4 col-lg-2 flex no-pad">
                        
<div class="event-listing-grid event-single">
<time class="time-banner" datetime="2024-09-24 8:30"><i class="fa fa-clock-o"></i> Sep 24, 2024 8:30am</time>


 <div class="list-image">
                                 <a href="/event/124208">
                        <img alt="livestream" class="icon" height="128" src="/images/umicons_livestream.svg" width="128"/>
    
            <h5> Livestream / Virtual </h5>
            </a>
             </div>
 
 <div class="event-info">
    <div class="event-title"><h3>
       <a href="/event/124208" title="Rush Orthopedics Live Surgery Q&amp;A with Dr. Verma">
    Rush Orthopedics Live Surgery Q&amp;A with Dr....
    </a></h3>
                    </div>
  <ul class="event-details">
    
    <li class="item">
        <a href="/list?filter=locations:1" title="Virtual"><i class="fa fa-location-arrow fa-fw"></i><span> Virtual</span></a>
    </li>
             
                     <li class="item"><a href="/group/3815" title="LSA Opportunity Hub"><i class="fa fa-group fa-fw"></i><span>
        LSA Opportunity Hub
    </span></a></li>
                     <li class="item"><a href="/group/4442" title="LSA Transfer Student Center"><i class="fa fa-group fa-fw"></i><span>
        LSA Transfer Student Center
    </span></a></li>
            
        
    <li class="item"><a href="/list?filter=alltypes:24"><i class="fa fa-list fa-fw"></i><span> Livestream / Virtual </span></a></li>
    
                          
         <li class="item"><a href="https://lsa-umich.12twenty.com/events/30006101217151">
                     <i class="fa fa-link fa-fw"></i>
                  
         <span>RSVP Here</span>
         </a></li>
    
    
 </ul>   

<!--
    <p>
    Get views from the operating room through a live-streamed surgery with Dr. Nikhil Verma, a surgeon who specializes in the treatment of the shoulder,...
    (
        2024-09-24 8:30am
    )
    </p>
-->


 </div>

</div>
                    </div>

len(soup.find_all("td"))

soup.find("tr").get("class")

https://quotes.toscrape.com/page/2

def download_page(i):
    url = f'https://quotes.toscrape.com/page/{i}'
    res = requests.get(url)
    return BeautifulSoup(res.text)

soup = download_page(2) 
soup

<!DOCTYPE html>
<html lang="en"><head>
	<meta charset="utf-8"/>
	<title>Quotes to Scrape</title>
    <link href="/static/bootstrap.min.css" rel="stylesheet"/>
    <link href="/static/main.css" rel="stylesheet"/>
</head>
<body>
    <div class="container">
        <div class="row header-box">
            <div class="col-md-8">
                <h1>
                    <a href="/" style="text-decoration: none">Quotes to Scrape</a>
                </h1>
            </div>
            <div class="col-md-4">
                <p>
                
                    <a href="/login">Login</a>
                
                </p>
            </div>
        </div>
    

<div class="row">
    <div class="col-md-8">

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fail once, doesn't mean you're gonna fail at everything. Keep trying, hold on, and always, always, always believe in yourself, because if you don't, then who will, sweetie? So keep your head high, keep your chin up, and most importantly, keep smiling, because life's a beautiful thing and there's so much to smile about.”</span>
        <span>by <small class="author" itemprop="author">Marilyn Monroe</small>
        <a href="/author/Marilyn-Monroe">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="friends,heartbreak,inspirational,life,love,sisters" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/friends/page/1/">friends</a>
            
            <a class="tag" href="/tag/heartbreak/page/1/">heartbreak</a>
            
            <a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
            
            <a class="tag" href="/tag/life/page/1/">life</a>
            
            <a class="tag" href="/tag/love/page/1/">love</a>
            
            <a class="tag" href="/tag/sisters/page/1/">sisters</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“It takes a great deal of bravery to stand up to our enemies, but just as much to stand up to our friends.”</span>
        <span>by <small class="author" itemprop="author">J.K. Rowling</small>
        <a href="/author/J-K-Rowling">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="courage,friends" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/courage/page/1/">courage</a>
            
            <a class="tag" href="/tag/friends/page/1/">friends</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“If you can't explain it to a six year old, you don't understand it yourself.”</span>
        <span>by <small class="author" itemprop="author">Albert Einstein</small>
        <a href="/author/Albert-Einstein">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="simplicity,understand" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/simplicity/page/1/">simplicity</a>
            
            <a class="tag" href="/tag/understand/page/1/">understand</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“You may not be her first, her last, or her only. She loved before she may love again. But if she loves you now, what else matters? She's not perfect—you aren't either, and the two of you may never be perfect together but if she can make you laugh, cause you to think twice, and admit to being human and making mistakes, hold onto her and give her the most you can. She may not be thinking about you every second of the day, but she will give you a part of her that she knows you can break—her heart. So don't hurt her, don't change her, don't analyze and don't expect more than she can give. Smile when she makes you happy, let her know when she makes you mad, and miss her when she's not there.”</span>
        <span>by <small class="author" itemprop="author">Bob Marley</small>
        <a href="/author/Bob-Marley">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="love" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/love/page/1/">love</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“I like nonsense, it wakes up the brain cells. Fantasy is a necessary ingredient in living.”</span>
        <span>by <small class="author" itemprop="author">Dr. Seuss</small>
        <a href="/author/Dr-Seuss">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="fantasy" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/fantasy/page/1/">fantasy</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“I may not have gone where I intended to go, but I think I have ended up where I needed to be.”</span>
        <span>by <small class="author" itemprop="author">Douglas Adams</small>
        <a href="/author/Douglas-Adams">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="life,navigation" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/life/page/1/">life</a>
            
            <a class="tag" href="/tag/navigation/page/1/">navigation</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“The opposite of love is not hate, it's indifference. The opposite of art is not ugliness, it's indifference. The opposite of faith is not heresy, it's indifference. And the opposite of life is not death, it's indifference.”</span>
        <span>by <small class="author" itemprop="author">Elie Wiesel</small>
        <a href="/author/Elie-Wiesel">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="activism,apathy,hate,indifference,inspirational,love,opposite,philosophy" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/activism/page/1/">activism</a>
            
            <a class="tag" href="/tag/apathy/page/1/">apathy</a>
            
            <a class="tag" href="/tag/hate/page/1/">hate</a>
            
            <a class="tag" href="/tag/indifference/page/1/">indifference</a>
            
            <a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
            
            <a class="tag" href="/tag/love/page/1/">love</a>
            
            <a class="tag" href="/tag/opposite/page/1/">opposite</a>
            
            <a class="tag" href="/tag/philosophy/page/1/">philosophy</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“It is not a lack of love, but a lack of friendship that makes unhappy marriages.”</span>
        <span>by <small class="author" itemprop="author">Friedrich Nietzsche</small>
        <a href="/author/Friedrich-Nietzsche">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="friendship,lack-of-friendship,lack-of-love,love,marriage,unhappy-marriage" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/friendship/page/1/">friendship</a>
            
            <a class="tag" href="/tag/lack-of-friendship/page/1/">lack-of-friendship</a>
            
            <a class="tag" href="/tag/lack-of-love/page/1/">lack-of-love</a>
            
            <a class="tag" href="/tag/love/page/1/">love</a>
            
            <a class="tag" href="/tag/marriage/page/1/">marriage</a>
            
            <a class="tag" href="/tag/unhappy-marriage/page/1/">unhappy-marriage</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“Good friends, good books, and a sleepy conscience: this is the ideal life.”</span>
        <span>by <small class="author" itemprop="author">Mark Twain</small>
        <a href="/author/Mark-Twain">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="books,contentment,friends,friendship,life" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/books/page/1/">books</a>
            
            <a class="tag" href="/tag/contentment/page/1/">contentment</a>
            
            <a class="tag" href="/tag/friends/page/1/">friends</a>
            
            <a class="tag" href="/tag/friendship/page/1/">friendship</a>
            
            <a class="tag" href="/tag/life/page/1/">life</a>
            
        </div>
    </div>

    <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“Life is what happens to us while we are making other plans.”</span>
        <span>by <small class="author" itemprop="author">Allen Saunders</small>
        <a href="/author/Allen-Saunders">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="fate,life,misattributed-john-lennon,planning,plans" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/fate/page/1/">fate</a>
            
            <a class="tag" href="/tag/life/page/1/">life</a>
            
            <a class="tag" href="/tag/misattributed-john-lennon/page/1/">misattributed-john-lennon</a>
            
            <a class="tag" href="/tag/planning/page/1/">planning</a>
            
            <a class="tag" href="/tag/plans/page/1/">plans</a>
            
        </div>
    </div>

    <nav>
        <ul class="pager">
            
            <li class="previous">
                <a href="/page/1/"><span aria-hidden="true">←</span> Previous</a>
            </li>
            
            
            <li class="next">
                <a href="/page/3/">Next <span aria-hidden="true">→</span></a>
            </li>
            
        </ul>
    </nav>
    </div>
    <div class="col-md-4 tags-box">
        
            <h2>Top Ten tags</h2>
            
            <span class="tag-item">
            <a class="tag" href="/tag/love/" style="font-size: 28px">love</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/inspirational/" style="font-size: 26px">inspirational</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/life/" style="font-size: 26px">life</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/humor/" style="font-size: 24px">humor</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/books/" style="font-size: 22px">books</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/reading/" style="font-size: 14px">reading</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/friendship/" style="font-size: 10px">friendship</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/friends/" style="font-size: 8px">friends</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/truth/" style="font-size: 8px">truth</a>
            </span>
            
            <span class="tag-item">
            <a class="tag" href="/tag/simile/" style="font-size: 6px">simile</a>
            </span>
            
        
    </div>
</div>

    </div>
    <footer class="footer">
        <div class="container">
            <p class="text-muted">
                Quotes by: <a href="https://www.goodreads.com/quotes">GoodReads.com</a>
            </p>
            <p class="copyright">
                Made with <span class="zyte">❤</span> by <a class="zyte" href="https://www.zyte.com">Zyte</a>
            </p>
        </div>
    </footer>

</body></html>

divs = soup.find_all('div', class_='quote') 
# The above is a shortcut for the following, just for when the attribute key is class:
# divs = soup.find_all('div', attrs={'class': 'quote'})

divs[0]

<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fail once, doesn't mean you're gonna fail at everything. Keep trying, hold on, and always, always, always believe in yourself, because if you don't, then who will, sweetie? So keep your head high, keep your chin up, and most importantly, keep smiling, because life's a beautiful thing and there's so much to smile about.”</span>
        <span>by <small class="author" itemprop="author">Marilyn Monroe</small>
        <a href="/author/Marilyn-Monroe">(about)</a>
        </span>
        <div class="tags">
            Tags:
            <meta class="keywords" content="friends,heartbreak,inspirational,life,love,sisters" itemprop="keywords"/> 
            
            <a class="tag" href="/tag/friends/page/1/">friends</a>
            
            <a class="tag" href="/tag/heartbreak/page/1/">heartbreak</a>
            
            <a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
            
            <a class="tag" href="/tag/life/page/1/">life</a>
            
            <a class="tag" href="/tag/love/page/1/">love</a>
            
            <a class="tag" href="/tag/sisters/page/1/">sisters</a>
            
        </div>
    </div>

# The quote.
divs[0].find('span', class_='text').text

"“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fail once, doesn't mean you're gonna fail at everything. Keep trying, hold on, and always, always, always believe in yourself, because if you don't, then who will, sweetie? So keep your head high, keep your chin up, and most importantly, keep smiling, because life's a beautiful thing and there's so much to smile about.”"

# The author.
divs[0].find('small', class_='author').text

'Marilyn Monroe'

# The URL for the author.
divs[0].find('a').get('href')

'/author/Marilyn-Monroe'

# The quote's tags.
divs[0].find('meta', class_='keywords').get('content')

'friends,heartbreak,inspirational,life,love,sisters'

def process_quote(div):
    quote = div.find('span', class_='text').text
    author = div.find('small', class_='author').text
    author_url = 'https://quotes.toscrape.com' + div.find('a').get('href')
    tags = div.find('meta', class_='keywords').get('content')
    return {'quote': quote, 'author': author, 'author_url': author_url, 'tags': tags}

# Make sure everything here looks correct based on what's on the webpage!
process_quote(divs[4])

{'quote': '“I like nonsense, it wakes up the brain cells. Fantasy is a necessary ingredient in living.”',
 'author': 'Dr. Seuss',
 'author_url': 'https://quotes.toscrape.com/author/Dr-Seuss',
 'tags': 'fantasy'}

def process_page(divs):
    return pd.DataFrame([process_quote(div) for div in divs])

process_page(divs)

def make_quote_df(n):
    '''Returns a DataFrame containing the quotes on the first n pages of https://quotes.toscrape.com/.''' # This is called a docstring!
    dfs = []
    for i in range(1, n+1):
        # Download page n and create a BeautifulSoup object.
        soup = download_page(i)
        # Create DataFrame using the information in that page.
        divs = soup.find_all('div', class_='quote')
        df = process_page(divs)
        # Append DataFrame to dfs.
        dfs.append(df)
    # Stitch all DataFrames together.
    return pd.concat(dfs).reset_index(drop=True)

quotes = make_quote_df(3)
quotes.head()

quotes[quotes['author'] == 'Albert Einstein']

res = requests.get('https://events.umich.edu')
res

<Response [200]>

89

<div class="col-xs-12 col-sm-4 col-md-4 col-lg-2 flex no-pad">
                        
<div class="event-listing-grid event-single">
<time class="time-banner" datetime="2024-09-24 8:30"><i class="fa fa-clock-o"></i> Sep 24, 2024 8:30am</time>


 <div class="list-image">
                                 <a href="/event/124208">
                        <img alt="livestream" class="icon" height="128" src="/images/umicons_livestream.svg" width="128"/>
    
            <h5> Livestream / Virtual </h5>
            </a>
             </div>
 
 <div class="event-info">
    <div class="event-title"><h3>
       <a href="/event/124208" title="Rush Orthopedics Live Surgery Q&amp;A with Dr. Verma">
    Rush Orthopedics Live Surgery Q&amp;A with Dr....
    </a></h3>
                    </div>
  <ul class="event-details">
    
    <li class="item">
        <a href="/list?filter=locations:1" title="Virtual"><i class="fa fa-location-arrow fa-fw"></i><span> Virtual</span></a>
    </li>
             
                     <li class="item"><a href="/group/3815" title="LSA Opportunity Hub"><i class="fa fa-group fa-fw"></i><span>
        LSA Opportunity Hub
    </span></a></li>
                     <li class="item"><a href="/group/4442" title="LSA Transfer Student Center"><i class="fa fa-group fa-fw"></i><span>
        LSA Transfer Student Center
    </span></a></li>
            
        
    <li class="item"><a href="/list?filter=alltypes:24"><i class="fa fa-list fa-fw"></i><span> Livestream / Virtual </span></a></li>
    
                          
         <li class="item"><a href="https://lsa-umich.12twenty.com/events/30006101217151">
                     <i class="fa fa-link fa-fw"></i>
                  
         <span>RSVP Here</span>
         </a></li>
    
    
 </ul>   

<!--
    <p>
    Get views from the operating room through a live-streamed surgery with Dr. Nikhil Verma, a surgeon who specializes in the treatment of the shoulder,...
    (
        2024-09-24 8:30am
    )
    </p>
-->


 </div>

</div>
                    </div>

'Rush Orthopedics Live Surgery Q&A with Dr. Verma'

'2024-09-24 8:30'

'Virtual'

{'title': 'CommuniTea',
 'time': Timestamp('2024-09-24 12:00:00'),
 'location': 'Trotter Multicultural Center-Sankofa Lounge'}

'NoneType' object has no attribute 'find'

res = requests.get('https://events.umich.edu')
res

<Response [200]>

soup = BeautifulSoup(res.text)

divs = soup.find_all(class_='col-xs-12')

len(divs)

89

divs[0]

<div class="col-xs-12 col-sm-4 col-md-4 col-lg-2 flex no-pad">
                        
<div class="event-listing-grid event-single">
<time class="time-banner" datetime="2024-09-24 8:30"><i class="fa fa-clock-o"></i> Sep 24, 2024 8:30am</time>


 <div class="list-image">
                                 <a href="/event/124208">
                        <img alt="livestream" class="icon" height="128" src="/images/umicons_livestream.svg" width="128"/>
    
            <h5> Livestream / Virtual </h5>
            </a>
             </div>
 
 <div class="event-info">
    <div class="event-title"><h3>
       <a href="/event/124208" title="Rush Orthopedics Live Surgery Q&amp;A with Dr. Verma">
    Rush Orthopedics Live Surgery Q&amp;A with Dr....
    </a></h3>
                    </div>
  <ul class="event-details">
    
    <li class="item">
        <a href="/list?filter=locations:1" title="Virtual"><i class="fa fa-location-arrow fa-fw"></i><span> Virtual</span></a>
    </li>
             
                     <li class="item"><a href="/group/3815" title="LSA Opportunity Hub"><i class="fa fa-group fa-fw"></i><span>
        LSA Opportunity Hub
    </span></a></li>
                     <li class="item"><a href="/group/4442" title="LSA Transfer Student Center"><i class="fa fa-group fa-fw"></i><span>
        LSA Transfer Student Center
    </span></a></li>
            
        
    <li class="item"><a href="/list?filter=alltypes:24"><i class="fa fa-list fa-fw"></i><span> Livestream / Virtual </span></a></li>
    
                          
         <li class="item"><a href="https://lsa-umich.12twenty.com/events/30006101217151">
                     <i class="fa fa-link fa-fw"></i>
                  
         <span>RSVP Here</span>
         </a></li>
    
    
 </ul>   

<!--
    <p>
    Get views from the operating room through a live-streamed surgery with Dr. Nikhil Verma, a surgeon who specializes in the treatment of the shoulder,...
    (
        2024-09-24 8:30am
    )
    </p>
-->


 </div>

</div>
                    </div>

divs[0].find('div', class_='event-title').find('a').get('title')

'Rush Orthopedics Live Surgery Q&A with Dr. Verma'

divs[0].find('time').get('datetime')

'2024-09-24 8:30'

divs[0].find('ul').find('a').get('title')

'Virtual'

def process_event(div):
    title = div.find('div', class_='event-title').find('a').get('title')
    location = div.find('ul').find('a').get('title')
    time = pd.to_datetime(div.find('time').get('datetime')) # Good idea!
    return {'title': title, 'time': time, 'location': location}

process_event(divs[12])

{'title': 'CommuniTea',
 'time': Timestamp('2024-09-24 12:00:00'),
 'location': 'Trotter Multicultural Center-Sankofa Lounge'}

row_list = []
for div in divs:
    try:
        row_list.append(process_event(div))
    except Exception as e:
        print(e)

'NoneType' object has no attribute 'find'

events = pd.DataFrame(row_list) 
events.head()

# Which events are in-person today?
events[~events['location'].isin(['Virtual', ''])]

	Date	Time	At	Opponent	Location	Tournament	Result
0	Aug 31 (Sat)	7:30 PM	Home	Fresno State	Ann Arbor, Mich.	NaN	W 30-10
1	Sep 7 (Sat)	Noon	Home	#3 Texas	Ann Arbor, Mich.	NaN	L 12-31
2	Sep 14 (Sat)	Noon	Home	Arkansas State	Ann Arbor, Mich.	NaN	W 28-18
3	Sep 21 (Sat)	3:30 PM	Home	#11 USC	Ann Arbor, Mich.	NaN	W 27-24
4	Sep 28 (Sat)	Noon	Home	Minnesota	Ann Arbor, Mich.	NaN	-

	title	time	location
0	International Students Career Series: Coffee Chat with the University Career Center	Sep 23, 2024 9:00am	University Career Center, 3200 Student Activities Building, Program Room (3003), 515 E Jefferson St, Ann Arbor, MI, United States
1	Alaska Teachers & Personnel Informational Meeting	Sep 23, 2024 10:00am
2	Michigan in Washington Fall 2024 Application Deadline	Sep 23, 2024 10:00am
3	EEB Prelim Seminar Series - Evolution of “Collecting” Behavior in Deep Sea Carrier Snails	Sep 23, 2024 10:30am	Biological Sciences Building
4	Huron Affinity Group Overview (iMatter Teams)	Sep 23, 2024 11:00am

Element	Description
`<html>`	the document
`<head>`	the header
`<body>`	the body
`<div>`	a logical division of the document
`<span>`	an inline logical division
`<p>`	a paragraph
`<a>`	an anchor (hyperlink)
`<h1>, <h2>, ...`	header(s)
`<img>`	an image

	quote	author	author_url	tags
0	“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”	Albert Einstein	https://quotes.toscrape.com/author/Albert-Einstein	change,deep-thoughts,thinking,world
1	“It is our choices, Harry, that show what we truly are, far more than our abilities.”	J.K. Rowling	https://quotes.toscrape.com/author/J-K-Rowling	abilities,choices
2	“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”	Albert Einstein	https://quotes.toscrape.com/author/Albert-Einstein	inspirational,life,live,miracle,miracles

	quote	author	author_url	tags
0	“The world as we have created it is a process ...	Albert Einstein	https://quotes.toscrape.com/author/Albert-Eins...	change,deep-thoughts,thinking,world
1	“It is our choices, Harry, that show what we t...	J.K. Rowling	https://quotes.toscrape.com/author/J-K-Rowling	abilities,choices
2	“There are only two ways to live your life. On...	Albert Einstein	https://quotes.toscrape.com/author/Albert-Eins...	inspirational,life,live,miracle,miracles
3	“The person, be it gentleman or lady, who has ...	Jane Austen	https://quotes.toscrape.com/author/Jane-Austen	aliteracy,books,classic,humor
4	“Imperfection is beauty, madness is genius and...	Marilyn Monroe	https://quotes.toscrape.com/author/Marilyn-Monroe	be-yourself,inspirational

	quote	author	author_url	tags
0	“This life is what you make it. No matter what...	Marilyn Monroe	https://quotes.toscrape.com/author/Marilyn-Monroe	friends,heartbreak,inspirational,life,love,sis...
1	“It takes a great deal of bravery to stand up ...	J.K. Rowling	https://quotes.toscrape.com/author/J-K-Rowling	courage,friends
2	“If you can't explain it to a six year old, yo...	Albert Einstein	https://quotes.toscrape.com/author/Albert-Eins...	simplicity,understand
...	...	...	...	...
7	“It is not a lack of love, but a lack of frien...	Friedrich Nietzsche	https://quotes.toscrape.com/author/Friedrich-N...	friendship,lack-of-friendship,lack-of-love,lov...
8	“Good friends, good books, and a sleepy consci...	Mark Twain	https://quotes.toscrape.com/author/Mark-Twain	books,contentment,friends,friendship,life
9	“Life is what happens to us while we are makin...	Allen Saunders	https://quotes.toscrape.com/author/Allen-Saunders	fate,life,misattributed-john-lennon,planning,p...

	title	time	location
0	Rush Orthopedics Live Surgery Q&A with Dr. Verma	2024-09-24 08:30:00	Virtual
1	2024 Investment Banking Coffee Chats at Univer...	2024-09-24 09:00:00
2	2024 Morgan Stanley Global Capital Markets Cof...	2024-09-24 09:00:00
3	2024 Morgan Stanley Institutional Equity Coffe...	2024-09-24 09:00:00
4	Framing & Facilitating High Stakes Discussions...	2024-09-24 10:00:00	Virtual

	title	time	location
6	Materials Science and Engineering Career Fair	2024-09-24 10:00:00	Pierpont Commons
8	Macro Seminar: Tuesday, September 24	2024-09-24 11:30:00	Lorch Hall
9	2024 Fall Job & Internship Fair: In Person	2024-09-24 12:00:00	530 South State Street, Ann Arbor, Michigan 48...
...	...	...	...
85	OrgLead 24-25	2024-09-24 19:30:00	Michigan Union - Pendleton (2nd Floor)
86	Mosher-Jordan (2024-2025) (Housing)	2024-09-24 20:00:00	Cesar Chavez Lounge
87	Symphony Band	2024-09-24 20:00:00	Hill Auditorium

Lecture 9¶

Web Scraping¶

EECS 398-003: Practical Data Science, Fall 2024¶

Announcements 📣¶

Agenda¶

Recap: Handling missing values¶

Summary of imputation techniques¶

Activity

Missingness mechanisms¶

How do we know if data are MCAR?¶

Question 🤔 (Answer at practicaldsc.org/q)

Introduction to HTTP¶

Data sources¶

Manual copy-pasting¶

Programatically accessing data¶

Goal¶

The request-response model¶

Consequences of the request-response model¶

HTTP request methods¶

Example: GET requests via requests¶

Reference Slide¶

Example: POST requests via requests¶

Something went wrong

HTTP status codes¶

403 Forbidden

Reference Slide¶

Handling unsuccessful requests¶

The structure of HTML¶

Scraping vs. APIs¶

What is HTML?¶

An example webpage¶

This is a heading

The anatomy of HTML documents¶

Example: Pages and trees¶

Reference Slide¶

Useful tags to know¶

Reference Slide¶

Example tags and attributes¶

Question 🤔 (Answer at practicaldsc.org/q)

Parsing HTML¶

Beautiful Soup 🍜¶

Example HTML document¶

Heading here

Instantiating BeautifulSoup objects¶

Finding elements in a BeautifulSoup object¶

Using find¶

Using find_all¶

Node attributes¶

Activity

Example: Scraping quotes¶

Example: Scraping quotes¶

Organizing our work¶

Downloading a single page¶

Parsing a single page¶

Parsing a single quote, and then a single page¶

Putting it all together¶

Reference Slide¶

Summary of our steps¶

Example: Scraping the Happening @ Michigan page¶

Example: Scraping the Happening @ Michigan page¶

Identifying <div>s¶

Parsing a single event, and then every event¶

Web data in practice¶

Summary, next time¶

Example: `GET` requests via `requests`¶

Example: `POST` requests via `requests`¶

Instantiating `BeautifulSoup` objects¶

Using `find`¶

Using `find_all`¶

Identifying `<div>`s¶