6. Gradients
Table of contents
The concepts explained in the first four guides are typically covered โ or, are related to whatโs covered โ in a linear algebra course. In addition to all of that, though, weโll also need a few concepts that are traditionally covered in a multivariate calculus course, such as Math 215.
To start, letโs remember the idea of derivatives from single variable calculus. Consider, for example, the function \(f(t) = 5t^4 - t^3 - 5t^2 + 2t - 9\). Then, we have:
- The derivative, \(\frac{df}{dt}(t) = 20t^3 - 3t^2 - 10t + 2\), is the slope of the tangent line to \(f\) at the point \((t, f(t))\). For example, at \(t = 0\), the derivative is 2, which means the slope of the tangent line at \((0, -9)\) is 2.
- The derivative at a point describes the instantaneous rate of change of the function at that point โ the larger the derivative is at a point, the more quickly the function is increasing at that point (i.e. the steeper it is).
- The closer \(t\) is to a minimum, the shallower the slope of the tangent line is โ at a minimum, the slope of the tangent line is 0!
Moving forward, we will encounter functions that take in multiple inputs, such as:
\[f(x, y) = (x - 2)^2 + 2x - (y - 3)^2\]The graph of a function like this is a surface in 3 dimensions.
Now, since \(f\) has multiple input variables, it has multiple rates of change โ one in the \(x\) direction and one in the \(y\) direction. So, instead of just a single derivative, \(f\) has two partial derivatives, \(\frac{\partial f}{\partial x}\) and \(\frac{\partial f}{\partial y}\).
\(\frac{\partial f}{\partial x}\) describes the rate of change in the \(x\) direction โ that is, the rate of change of \(f\) when \(x\) changes, but \(y\) is held constant. To compute \(\frac{\partial f}{\partial x}\), we treat \(y\) as a constant and take the derivative with respect to \(x\). Since \(f(x, y) = (x - 2)^2 + 2x - (y - 3)^2\), we have that:
\[\frac{\partial f}{\partial x} = 2(x-2) + 2 \qquad \frac{\partial f}{\partial y} = -2(y-3)\]We expand on this idea, and its connection to linear algebra, below, but itโs advised that you first work through the following articles on Khan Academy:
๐ slides ย ๐ฅ video on YouTube
Practice Questions 9.1-9.2
Question 9.1
Suppose \(g: \mathbb{R}^2 \rightarrow \mathbb{R}\) is the function:
\[g(\vec x) = (x_1 - 3)^2 + (x_1^2 - x_2)^2\](a) Find \(\nabla g(\vec x)\), the gradient of \(g\), and use it to show that \(\nabla g \left( \begin{bmatrix} -1 \\ 1 \end{bmatrix} \right) = \begin{bmatrix} -8 \\ 0 \end{bmatrix}\).
(b) Using \(\nabla g(\vec x)\), determine the vector \(\vec x^*\) that minimizes the output of \(g\). How can you tell, without the use of any second derivative tests, that \(g\) has a global minimum?
Question 9.2
(a) Suppose \(\vec{a} \in \mathbb{R}^3\), and suppose \(f: \mathbb{R}^3 \rightarrow \mathbb{R}\) is the function:
\[f(\vec x) = \vec{a}^T \vec{x} = a_1x_1 + a_2x_2 + a_3x_3\]What is the gradient of \(f\)?
(b) Suppose \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) is the function:
\[f(\vec x) = \vec x \cdot \vec x = x_1^2 + x_2^2 + ... + x_n^2\]What is the gradient of \(f\)?