机器学习中的优化 Optimization Chapter 1 Mathematics Background（数学基础）

1. Notation

||$x$||: Euclidean norm ($l_2$ norm),

\[||x||^2 = x^Tx= \sum_{i=1}^dx_i^2 \]

$ \mathbb{R_+} = { x\in \mathbb{R}: x\geq 0 } $

2. Cauchy-Schwarz inequality

Let $u,v\in \mathbb{R^d}$, then

\[|u^Tv|\leq ||u||\cdot ||v|| \]

3. Spectral Norm

Let $A$ be a matrix, $A\in \mathbb{R^{m\times d}}$, then

\[||A|| = \max_{||v||=1}||Av|| \]

where $v\in \mathbb{R^d}$. Besides,

\[||Av|| \leq ||A||\cdot ||v|| \]

Furthermore, we have triangle inequality:

\[||A+B||\leq ||A||+||B|| \]

4. Mean Value Theorem

$\large{\text{Theorem}\ 1.3}$: Let $a be real numbers, \(h:[a,b]\rightarrow \mathbb{R}$ be a continuous function which is differential on $(a,b)$. Then there exists $c\in (a,b)$ such that:

\[h'(c) = \frac{h(b)-h(a)}{b-a} \]

5. Differentiability

$\large{\text{Definition}\ 1.5}$: $f: dom(f)\rightarrow \mathbb{R^m}, dom(f)\rightarrow \mathbb{R^d}. f \text{ is called differentiable at } x \text{ in the interior of } dom(f) \text{ if there exists a matrix } A\in \mathbb{R^{m\times d}} \text{ and an error function }r:\mathbb{R^d} \rightarrow \mathbb{R^m}\text{ defined in some neighborhood of } 0\in \mathbb{R^d} \text{such that for all }y \text{ in some neighborhood of }x,$

\[f(y) = f(x)+A(y-x)+r(y-x) \]

where

\[\lim_{v\rightarrow 0} \frac{||r(v)||}{||v||}=0 \]

Besides, $Df(x)_{ij} = \frac{\partial f_i}{\partial x_j}(x)$

6. Convex Set

$\large{\text{Theorem}\ 1.9}$: $f:dom(f)\rightarrow \mathbb{R^m} \text{ be differentiable}, X\in dom(f) \text{ a convex set}, B\in \mathbb{R_+}. \text{ If } X\in dom(f) \text{ is non-empty and open the following statements are equivalent:}$

\[\begin{align} &(i)\large{||f(x)-f(y)||\leq B||x-y||}(\large{\bf{\text{B-Lipschitz}}}) \\ &(ii)\large{||Df(x)||} \leq B, \forall x\in X \end{align} \]

7. Epigraph

\[\begin{equation} {\bf{epi}}(f)=\{(x,\alpha)\in \mathbb{R^{d+1}}:x\in dom(f), \alpha\geq f(x) \} \end{equation} \]

$\large{\text{Lemma}\ 1.11}$: $f\text{ is a convex function if and only if } {\bf epi}(f) \text{ is a convex set}$

$\large{\text{Lemma}\ 1.15}$: $f\text{ is convex if and only if } dom(f) \text{ is convex and} $:

\[\begin{equation} f(y)\geq f(x)+\nabla f(x)^T(y-x) \end{equation} \]

$\text{holds for all }x,y\in dom(f)$.

$\large{\bf{\text{Lemma}}\ 1.16}$: $\text{Suppose that }dom(f) \text{ is open and } f\text{ is differentiable. Then } f\text{ is convex if and only if } dom(f) \text{ is convex and}$

\[\begin{equation} \large{{(\nabla f(y)-\nabla f(x))^T(y-x)\geq 0}} \end{equation} \]

$\text{holds for all }x,y\in dom(f)$.
$\bf{Proof}: \text{If }f \text{ is convex,from first-order convex property we have}:$

\[\begin{align} f(y)\geq f(x)+\nabla f(x)^T(y-x)\\ f(x)\geq f(y)+\nabla f(y)^T(x-y) \end{align} \]

$\text{for all } x,y\in dom(f). \text{ After adding up these two inequalities, then we get:}$

\[\nabla f(x)^T(y-x)+\nabla f(y)^T(x-y) = (\nabla f(y)-\nabla f(x))^T(x-y)\leq 0 \]

8. Second-Order Characterization of Convexity

$\large{\text{Lemma}\ 1.17}$: $f\text{ is convex if and only if } dom(f) \text{ is convex and} $:

\[\begin{equation} \large\nabla^2 f(x)\geq 0 \end{equation} \]

$\text{holds for all }x,y\in dom(f) \text{ Positive Semidefinite: } M\geq 0, x^TMx\geq 0 \text{ for all }x\neq 0$.

$\large{\text{Lemma}\ 1.25}$: $f:dom(f)\rightarrow \mathbb{R}\text{ be strictly convex. Then }f \text{ has just at most one global minimum.}$

$\large{\text{Lemma}\ 1.27}$: $\text{Suppose }f:dom(f)\rightarrow \mathbb{R}\text{ is convex and differentiable.} X\in dom(f) \text{ be a convex set. Point } x^*\in X \text{ is a } {\bf minimizer} \text{ of } f \text{ if and only if}$

\[\large\nabla f(x^*)^T(x-x^*)\geq 0, \forall x\in X \]

9. log-sum-exp function

${\bf \large Statement}: \text{log-sum-exp function is a convex function}$
$\bf Proof$: $f(x) = \log(\sum_i e^{x_i})$

\[\begin{align} f(\theta x+(1-\theta)y)&=\log(\sum_i e^{\theta x_i+(1-\theta)y_i})\\ &=\log(\sum_i e^{\theta x_i}e^{(1-\theta)y_i})\\ &=\log(\sum_i u_i^{\theta}v_{i}^{(1-\theta)}) \end{align} \]

$\text{By Inequality}:$

\[\begin{align} \sum_i x_iy_i\leq (\sum_i |x_i|^p)^{1/p}(\sum_i |y_i|^q)^{1/q} \end{align} \]

$\text{Where }1/p+1/q=1. \text{ Therefore,}$

\[\begin{align} \log(\sum_i u_i^{\theta}v_{i}^{(1-\theta)}) &\leq \log[(\sum_i |u_i|^{\theta\cdot 1/\theta})^{\theta}(\sum_i |v_i|^{(1-\theta)\cdot(1/(1-\theta))})^{1-\theta}]\\ &= \theta\log(\sum_i u_i)+(1-\theta)\log(\sum_i v_i) \end{align} \]

优化 Optimization Machine-Learning