Lectures on Convex Optimization

Second edition
Author

Yurii Nesterov

Published

2018

  • Presents a self-contained description of fast gradient methods
  • Offers the first description in the monographic literature of the modern second-order methods based on cubic regularization
  • Provides a comprehensive treatment of the smoothing technique
  • Develops a new theory of optimization in relative scale

Errata and comments

  • The misprints are highlighted in red and the corresponding suggestions for correction are highlighted in blue. Comments are in italics.
  • A negative line number means counting from the bottom to the top of the page.
Page Line Printed Corrected
15 15 good approximation to \({\mathcal{I}}\) good approximation to \({\mathcal{S}}\)
21 -9 optimal value of the function \(g\) optimal value of the function \(f\)
28 2 ready to sudy ready to study
41 3 local rateis different local ratesare different
46 2 solution of solution to
70 -7 \(\nabla f_k(x) = \textcolor{red}{A_kx - e_1} = 0\) \(\nabla f_k(x) = \textcolor{blue}{\frac{L}{4}(A_kx - e_1)} = 0\)
158 -8 \(\frac{1}{\alpha\beta}[f(x + \alpha\beta p) - f(\textcolor{red}{x_0})]\) \(\frac{1}{\alpha\beta}[f(x + \alpha\beta p) - f(\textcolor{blue}{x})]\)
160 -4 \(\left\langle \nabla \rho_{\textcolor{red}{Q}} (x^*), x_1 - x^* \right\rangle\) \(\left\langle \nabla \rho_{\textcolor{blue}{Q_2}} (x^*), x_1 - x^* \right\rangle\)
160 -2 \(\textcolor{red}{\stackrel{(2.2.41)}{\le}}\) \(\textcolor{blue}{\stackrel{(2.2.52)}{=}}\)
160 -1 The proof of Theorem 3.1.3 is not complete, as the final inequalities are not strict. The theorem states that \(Q_1\) and \(Q_2\) are strongly separable, which requires strict inequalities: \(\sup_{x\in Q_1} \left\langle g,x \right\rangle < \gamma < \inf_{y\in Q_2} \left\langle g,y \right\rangle\). To fix this, we can add one more line to the end of the current proof, as follows: Therefore, for any \(x_1\in Q_1\) and \(x_2\in Q_2\) we have: \[\left\langle g^*,x_2 \right\rangle < \left\langle g^*,x_2 \right\rangle + \frac{1}{2}(\rho^*)^2 \le \gamma^* - \frac{1}{2}(\rho^*)^2 \le \left\langle g^*,x_1 \right\rangle - \frac{1}{2}(\rho^*)^2 < \left\langle g^*,x_1 \right\rangle.\]
161 8 \(g\in \mathop{\mathrm{dom}}\textcolor{red}{\psi}_{Q_2}\) \(g\in \mathop{\mathrm{dom}}\textcolor{blue}{\xi}_{Q_2}\)
161 9 \(\xi_{Q_1}(g) = \xi_{Q_2}(\textcolor{red}{p})\) \(\xi_{Q_1}(g) = \xi_{Q_2}(\textcolor{blue}{g})\)
161 13 in view of Corollary 3.1.5 in view of Corollary 3.1.4
161 -5 In view of Corollary 3.1.5 In view of Lemma 2.2.7
169 2 \(f'(\bar{z}\textcolor{red}{,} h) = \dots + f'(\bar{z}\textcolor{red}{,} (0,h_y))\) \(f'(\bar{z}\textcolor{blue}{;} h) = \dots + f'(\bar{z}\textcolor{blue}{;} (0,h_y))\)
169 -3 \(\phi'(x_0\textcolor{red}{,} p)\) \(\phi'(x_0\textcolor{blue}{;} p)\)
173 -4 \(f'(0\textcolor{red}{,} x)\) \(f'(0\textcolor{blue}{;} x)\)
175 3 \(-\xi_{Q_2}(-\bar{y}) \textcolor{red}{=} \left\langle g_2,\bar{y} \right\rangle\) \(-\xi_{Q_2}(-\bar{y}) \textcolor{blue}{\le} \left\langle g_2,\bar{y} \right\rangle\)
175 3 \(\xi_{Q_1}(\bar{y}) \le -\xi_{Q_2}(-\bar{y})\) It is not clear to me why this holds.
178 -8 \(\bar{{\mathcal{K}}} \stackrel{\mathrm{def}}{=} \mathrm{cl} ({\mathcal{K}}(\textcolor{red}{\bar{x}}))\) \(\bar{{\mathcal{K}}} \stackrel{\mathrm{def}}{=} \mathrm{cl} ({\mathcal{K}}(\textcolor{blue}{Q - \bar{x}}))\)
183 -8 \(c = \textcolor{red}{\lambda^* Ax^*}\) \(c = \textcolor{blue}{2\lambda^* Ax^*}\)
183 -7 \(\lambda^* = \textcolor{red}{\langle c,A^{-1}c \rangle^{1/2}}, x^* = \textcolor{red}{\frac{1}{\lambda^*A^{-1}c}}\) \(\lambda^* = \textcolor{blue}{\frac{1}{2}\langle c,A^{-1}c \rangle^{1/2}}, x^* = \textcolor{blue}{\frac{1}{2\lambda^*A^{-1}c}}\)
Although the current results are technically correct, rescaling \(\lambda^*\) could make the notation more consistent with Theorem 3.1.26, because the derivative of \(\left\langle Ax,x \right\rangle\) is \(2Ax\).
186 -6 \(\left\langle \bar{y}, b-Ax_* \right\rangle = \left\| b- A\textcolor{red}{\bar{x}} \right\|\) \(\left\langle \bar{y}, b-Ax_* \right\rangle = \left\| b- A\textcolor{blue}{x_*} \right\|\)
186 -5 \(x_{\delta} \stackrel{\mathrm{def}}{=} \bar{x} + \delta \stackrel{\textcolor{red}{(3.1.70)}}{\in} Q\) \(x_{\delta} \stackrel{\mathrm{def}}{=} \bar{x} + \delta \stackrel{\textcolor{blue}{(3.1.69)}}{\in} Q\)
188 4 \(\stackrel{(3.1.74)}{\ge} \textcolor{red}{\langle A^Ty^*,x(u_2) -x^*\rangle}\) \(\stackrel{(3.1.74)}{\ge} \textcolor{blue}{f(x^*) + \langle A^Ty^*,x(u_2) -x^*\rangle}\)
191 1 let \(\bar{u} = \sum_{i=1}^{k} \textcolor{red}{u_i}\) let \(\bar{u} = \sum_{i=1}^{k} \textcolor{blue}{\lambda^{(i)} u_i}\)
194 3 \(\stackrel{(3.1.23)}{\ge} \textcolor{red}{\min_{x\in P}} \sum_{k=0}^{N}\dots\) \(\stackrel{(3.1.23)}{\ge} \textcolor{blue}{\max_{x\in P}} \sum_{k=0}^{N}\dots\)
195 -4 for any \(x,y\in B_2(\textcolor{red}{x^*},\rho)\) for any \(x,y\in B_2(\textcolor{blue}{x_k^*},\rho)\)
205 -4 then \(\textcolor{red}{{\mathcal{F}}}_A(N) \neq \emptyset\) then \(\textcolor{blue}{{\mathcal{I}}}_A(N) \neq \emptyset\)
205 -1 \(f(x_k) - f^* \textcolor{red}{\ge} \epsilon\) \(f(x_k) - f^* \textcolor{blue}{>} \epsilon\)
The contradiction of (3.2.27) should be a strict inequality. An alternative fix could be replacing “\(\le\)” in (3.2.27) with “\(<\)”.
206 2 \(\left\| x_k - \textcolor{red}{\frac{\epsilon}{\left\| g(x_k) \right\|^2}g(x_k)} \right\|^2\) \(\left\| x_k - \textcolor{blue}{\frac{\epsilon}{\left\| g(x_k) \right\|^2}g(x_k) -x^*} \right\|^2\)
206 5 \(\left\| x_k - \textcolor{red}{\frac{\bar{f}(x_k)}{\left\| \bar{g}(x_k) \right\|^2}\bar{g}(x_k)} \right\|^2\) \(\left\| x_k - \textcolor{blue}{\frac{\bar{f}(x_k)}{\left\| \bar{g}(x_k) \right\|^2}\bar{g}(x_k) - x^*} \right\|^2\)
212 \(R_{\textcolor{red}{k}},x^*_{\textcolor{red}{k}},f_{\textcolor{red}{k}}\) \(R_{\textcolor{blue}{k+1}},x^*_{\textcolor{blue}{k+1}},f_{\textcolor{blue}{k+1}}\)
In the proof of Theorem 3.2.5, after equation (3.2.44), all subscripts \(k\) should be replaced by \(k+1\), except in the last line of the proof.
213 8 \(\left\| x_k - \textcolor{red}{h_kg(x_k)} \right\|^2\) \(\left\| x_k - \textcolor{blue}{h_kg(x_k) - x^*} \right\|^2\)
242 -12 at point a \(x_k\) at a point \(x_k\)
244 5 \(x\in{\mathcal{L}}\textcolor{red}{(}f(x_0)\) \(x\in{\mathcal{L}}\textcolor{blue}{(}f(x_0)\textcolor{blue}{)}\)

Last updated: November 23, 2020.