Deep Learning 深度学习 Notes Chapter 3 Prob and Information Theory
1. Marginal Prob
\[\begin{equation} P(x=a) = \sum_y P(x=a,y=y) \end{equation} \]\(\text{For continuous variables, we have:}\)
\[\begin{equation} p(x) = \int p(x,y)dy \end{equation} \]2. Chain Rule of Conditional Prob
\[\begin{equation} P(x^{(1)},...,x^{(n)}) = P(x^{(1)}\prod_{i=2}^n P(x^{(i)}|x^{(1)},...,x^{(i-1)})) \end{equation} \]3. Conditional Independence
\(\text{2 random variables }x,y \text{ are conditional independence on }z:\)
\[\begin{equation} p(x,y|z) = p(x|z)p(y|z) \end{equation} \]4. Covariance
\[\begin{equation} Cov(f(x),g(y)) = \mathbb{E}[(f(x)-\mathbb{E}(f))(g(y)-\mathbb{E}(g))] \end{equation} \]5. KL Divergence and Cross Entropy
\[\begin{align} D_{KL}(P||Q) &= \sum_i P(i)\log{\frac{P(i)}{Q(i)}}\\ H(P,Q) &= -\sum_i P(i)\log{Q(i)} \end{align} \]\(\text{Therefore, we have:}\)
\[H(P,Q) = H(P) +D_{KL}(P||Q) \]\(\text{where } H(P) = -\sum_i P(i)\log{P(i)}\)