go back to the Portfolio Homework spec
Power Outages ๐
Table of Contents
This dataset has major power outage data in the continental U.S. from January 2000 to July 2016.
Getting the Data
The data is downloadable here.
Note: If you are having a hard time with the โThis datasetโ link, hold shift and click the link to open it into a new tab and then refresh that new tab.
A data dictionary is available at this article under Table 1. Variable descriptions.
Example Questions and Prediction Problems
Feel free to base your exploration into the dataset in Steps 1-2 around one of these questions, or come up with a question of your own.
- Where and when do major power outages tend to occur?
- What are the characteristics of major power outages with higher severity? Variables to consider include location, time, climate, land-use characteristics, electricity consumption patterns, economic characteristics, etc. What risk factors may an energy company want to look into when predicting the location and severity of its next major power outage?
- What characteristics are associated with each category of cause?
- How have characteristics of major power outages changed over time? Is there a clear trend?
Feel free to use one of the prompts below to build your predictive model in Steps 3-5, or come up with a prediction task of your own.
- Predict the severity (in terms of number of customers, duration, or demand loss) of a major power outage.
- Predict the cause of a major power outage.
- Predict the number and/or severity of major power outages in the year 2022.
- Predict the electricity consumption of an area.
Make sure to justify what information you would know at the โtime of predictionโ and to only train your model using those features.
Special Considerations
Step 2: Data Cleaning and Exploratory Data Analysis
- The data is given as an Excel file rather than a CSV. Open the data in Google Sheets or another spreadsheet application and determine which rows and columns of the sheet should be ignored when loading the data in
pandas
.- Note that
pandas
can load multiple filetypes:pd.read_csv
,pd.read_excel
,pd.read_html
,pd.read_json
, etc.
- Note that
- The power outage start date and time is given by
'OUTAGE.START.DATE'
and'OUTAGE.START.TIME'
. It would be preferable if these two columns were combined into onepd.Timestamp
column. Combine'OUTAGE.START.DATE'
and'OUTAGE.START.TIME'
into a newpd.Timestamp
column called'OUTAGE.START'
. Similarly, combine'OUTAGE.RESTORATION.DATE'
and'OUTAGE.RESTORATION.TIME'
into a newpd.Timestamp
column called'OUTAGE.RESTORATION'
.pd.to_datetime
andpd.to_timedelta
will be useful here.
- To visualize geospatial data, consider Folium or another geospatial plotting library. You can even embed Folium maps in your website! If
fig
is afolium.folium.Map
object, thenfig._repr_html_()
evaluates to a string containing your plot as HTML; useopen
andwrite
to write this string to an.html
file.