Lectures on Convex Optimization

Second edition

Author

Yurii Nesterov

Published

2018

Presents a self-contained description of fast gradient methods
Offers the first description in the monographic literature of the modern second-order methods based on cubic regularization
Provides a comprehensive treatment of the smoothing technique
Develops a new theory of optimization in relative scale

Errata and comments

The misprints are highlighted in red and the corresponding suggestions for correction are highlighted in blue. Comments are in italics.
A negative line number means counting from the bottom to the top of the page.

Page	Line	Printed	Corrected
15	15	good approximation to \({\mathcal{I}}\)	good approximation to \({\mathcal{S}}\)
21	-9	optimal value of the function \(g\)	optimal value of the function \(f\)
28	2	ready to sudy	ready to study
41	3	local rate … is different	local rates … are different
46	2	solution of	solution to
70	-7	\(\nabla f_k(x) = \textcolor{red}{A_kx - e_1} = 0\)	\(\nabla f_k(x) = \textcolor{blue}{\frac{L}{4}(A_kx - e_1)} = 0\)
158	-8	\(\frac{1}{\alpha\beta}[f(x + \alpha\beta p) - f(\textcolor{red}{x_0})]\)	\(\frac{1}{\alpha\beta}[f(x + \alpha\beta p) - f(\textcolor{blue}{x})]\)
160	-4	\(\left\langle \nabla \rho_{\textcolor{red}{Q}} (x^), x_1 - x^ \right\rangle\)	\(\left\langle \nabla \rho_{\textcolor{blue}{Q_2}} (x^), x_1 - x^ \right\rangle\)
160	-2	\(\textcolor{red}{\stackrel{(2.2.41)}{\le}}\)	\(\textcolor{blue}{\stackrel{(2.2.52)}{=}}\)
160	-1	The proof of Theorem 3.1.3 is not complete, as the final inequalities are not strict. The theorem states that \(Q_1\) and \(Q_2\) are strongly separable, which requires strict inequalities: \(\sup_{x\in Q_1} \left\langle g,x \right\rangle < \gamma < \inf_{y\in Q_2} \left\langle g,y \right\rangle\). To fix this, we can add one more line to the end of the current proof, as follows: Therefore, for any \(x_1\in Q_1\) and \(x_2\in Q_2\) we have: \[\left\langle g^,x_2 \right\rangle < \left\langle g^,x_2 \right\rangle + \frac{1}{2}(\rho^)^2 \le \gamma^ - \frac{1}{2}(\rho^)^2 \le \left\langle g^,x_1 \right\rangle - \frac{1}{2}(\rho^)^2 < \left\langle g^,x_1 \right\rangle.\]
161	8	\(g\in \mathop{\mathrm{dom}}\textcolor{red}{\psi}_{Q_2}\)	\(g\in \mathop{\mathrm{dom}}\textcolor{blue}{\xi}_{Q_2}\)
161	9	\(\xi_{Q_1}(g) = \xi_{Q_2}(\textcolor{red}{p})\)	\(\xi_{Q_1}(g) = \xi_{Q_2}(\textcolor{blue}{g})\)
161	13	in view of Corollary 3.1.5	in view of Corollary 3.1.4
161	-5	In view of Corollary 3.1.5	In view of Lemma 2.2.7
169	2	\(f'(\bar{z}\textcolor{red}{,} h) = \dots + f'(\bar{z}\textcolor{red}{,} (0,h_y))\)	\(f'(\bar{z}\textcolor{blue}{;} h) = \dots + f'(\bar{z}\textcolor{blue}{;} (0,h_y))\)
169	-3	\(\phi'(x_0\textcolor{red}{,} p)\)	\(\phi'(x_0\textcolor{blue}{;} p)\)
173	-4	\(f'(0\textcolor{red}{,} x)\)	\(f'(0\textcolor{blue}{;} x)\)
175	3	\(-\xi_{Q_2}(-\bar{y}) \textcolor{red}{=} \left\langle g_2,\bar{y} \right\rangle\)	\(-\xi_{Q_2}(-\bar{y}) \textcolor{blue}{\le} \left\langle g_2,\bar{y} \right\rangle\)
175	3	\(\xi_{Q_1}(\bar{y}) \le -\xi_{Q_2}(-\bar{y})\)	It is not clear to me why this holds.
178	-8	\(\bar{{\mathcal{K}}} \stackrel{\mathrm{def}}{=} \mathrm{cl} ({\mathcal{K}}(\textcolor{red}{\bar{x}}))\)	\(\bar{{\mathcal{K}}} \stackrel{\mathrm{def}}{=} \mathrm{cl} ({\mathcal{K}}(\textcolor{blue}{Q - \bar{x}}))\)
183	-8	\(c = \textcolor{red}{\lambda^* Ax^*}\)	\(c = \textcolor{blue}{2\lambda^* Ax^*}\)
183	-7	\(\lambda^* = \textcolor{red}{\langle c,A^{-1}c \rangle^{1/2}}, x^* = \textcolor{red}{\frac{1}{\lambda^*A^{-1}c}}\)	\(\lambda^* = \textcolor{blue}{\frac{1}{2}\langle c,A^{-1}c \rangle^{1/2}}, x^* = \textcolor{blue}{\frac{1}{2\lambda^*A^{-1}c}}\)
		Although the current results are technically correct, rescaling \(\lambda^*\) could make the notation more consistent with Theorem 3.1.26, because the derivative of \(\left\langle Ax,x \right\rangle\) is \(2Ax\).
186	-6	\(\left\langle \bar{y}, b-Ax_* \right\rangle = \left\\| b- A\textcolor{red}{\bar{x}} \right\\|\)	\(\left\langle \bar{y}, b-Ax_* \right\rangle = \left\\| b- A\textcolor{blue}{x_*} \right\\|\)
186	-5	\(x_{\delta} \stackrel{\mathrm{def}}{=} \bar{x} + \delta \stackrel{\textcolor{red}{(3.1.70)}}{\in} Q\)	\(x_{\delta} \stackrel{\mathrm{def}}{=} \bar{x} + \delta \stackrel{\textcolor{blue}{(3.1.69)}}{\in} Q\)
188	4	\(\stackrel{(3.1.74)}{\ge} \textcolor{red}{\langle A^Ty^,x(u_2) -x^\rangle}\)	\(\stackrel{(3.1.74)}{\ge} \textcolor{blue}{f(x^) + \langle A^Ty^,x(u_2) -x^*\rangle}\)
191	1	let \(\bar{u} = \sum_{i=1}^{k} \textcolor{red}{u_i}\)	let \(\bar{u} = \sum_{i=1}^{k} \textcolor{blue}{\lambda^{(i)} u_i}\)
194	3	\(\stackrel{(3.1.23)}{\ge} \textcolor{red}{\min_{x\in P}} \sum_{k=0}^{N}\dots\)	\(\stackrel{(3.1.23)}{\ge} \textcolor{blue}{\max_{x\in P}} \sum_{k=0}^{N}\dots\)
195	-4	for any \(x,y\in B_2(\textcolor{red}{x^*},\rho)\)	for any \(x,y\in B_2(\textcolor{blue}{x_k^*},\rho)\)
205	-4	then \(\textcolor{red}{{\mathcal{F}}}_A(N) \neq \emptyset\)	then \(\textcolor{blue}{{\mathcal{I}}}_A(N) \neq \emptyset\)
205	-1	\(f(x_k) - f^* \textcolor{red}{\ge} \epsilon\)	\(f(x_k) - f^* \textcolor{blue}{>} \epsilon\)
		The contradiction of (3.2.27) should be a strict inequality. An alternative fix could be replacing “\(\le\)” in (3.2.27) with “\(<\)”.
206	2	\(\left\\| x_k - \textcolor{red}{\frac{\epsilon}{\left\\| g(x_k) \right\\|^2}g(x_k)} \right\\|^2\)	\(\left\\| x_k - \textcolor{blue}{\frac{\epsilon}{\left\\| g(x_k) \right\\|^2}g(x_k) -x^*} \right\\|^2\)
206	5	\(\left\\| x_k - \textcolor{red}{\frac{\bar{f}(x_k)}{\left\\| \bar{g}(x_k) \right\\|^2}\bar{g}(x_k)} \right\\|^2\)	\(\left\\| x_k - \textcolor{blue}{\frac{\bar{f}(x_k)}{\left\\| \bar{g}(x_k) \right\\|^2}\bar{g}(x_k) - x^*} \right\\|^2\)
212		\(R_{\textcolor{red}{k}},x^*_{\textcolor{red}{k}},f_{\textcolor{red}{k}}\)	\(R_{\textcolor{blue}{k+1}},x^*_{\textcolor{blue}{k+1}},f_{\textcolor{blue}{k+1}}\)
		In the proof of Theorem 3.2.5, after equation (3.2.44), all* subscripts \(k\) should be replaced by \(k+1\), except in the last line of the proof.*
213	8	\(\left\\| x_k - \textcolor{red}{h_kg(x_k)} \right\\|^2\)	\(\left\\| x_k - \textcolor{blue}{h_kg(x_k) - x^*} \right\\|^2\)
242	-12	at point a \(x_k\)	at a point \(x_k\)
244	5	\(x\in{\mathcal{L}}\textcolor{red}{(}f(x_0)\)	\(x\in{\mathcal{L}}\textcolor{blue}{(}f(x_0)\textcolor{blue}{)}\)

Last updated: November 23, 2020.