Lecture8 Neural NetWork:Representation
Lecture8 Neural NetWork:Representation
Non-linear hypotheses
Neural NetWork
\(a_i^{(j)}\) = "activation" of unit i in layer j
\(\Theta^{(j)}\) = matrix of weights controlling function mapping from layer \(j\) to layer \(j+1\)
\[\begin{align} a_1^{(2)} & = g(\Theta_{10}^{(1)}x_0+\Theta_{11}^{(1)}x_1+\Theta_{12}^{(1)}x_2+\Theta_{13}^{(1)}x_3) \\ a_2^{(2)} & = g(\Theta_{20}^{(1)}x_0+\Theta_{21}^{(1)}x_1+\Theta_{22}^{(1)}x_2+\Theta_{23}^{(1)}x_3) \\ a_1^{(3)} & = g(\Theta_{30}^{(1)}x_0+\Theta_{31}^{(1)}x_1+\Theta_{32}^{(1)}x_2+\Theta_{33}^{(1)}x_3) \\ h_\Theta(x) & = a_1^{(3)}=g(\Theta_{10}^{(2)}a_0^{(2)}+\Theta_{11}^{(2)}a_1^{(2)}+\Theta_{12}^{(2)}a_2^{(2)}+\Theta_{13}^{(2)}a_3^{(2)}) \end{align} \]if network has \(s_j\) units in layer j,\(s_{j+1}\) units in layer j+1,then \(\Theta^{(j)}\) will be of dimension \(s_{j+1} \times (s_j+1)\)
Vectorized implementation
\[x = \left[\begin{matrix} x_0 \\ x_1 \\ x_3 \\ x_4 \\ \end{matrix}\right] \ z^{(2)} = \left[\begin{matrix} z_1^{(2)} \\ z_2^{(2)} \\ z_3^{(2)} \end{matrix}\right] \\ z^{(2)} = \Theta^{(1)}a^{(1)} \\ a^{(2)} = g(z^{(2)}) \\ Add\ a_0^{(2)} = 1 \Rightarrow a^{(2)} \in \mathbb{R}^4\\ z^{(3)} = \Theta^{(2)})a^{(2)} \\ h_\Theta(x) = a^{(3)} = g(z^{(3)}) \]Examples and intuitions
Non-linear classification example: XOR/XNOR
Simple example: AND
\(x_1,x_3 \in {0,1}\)
\(y = x1\ AND\ x_2\)
\[\Theta_{10} = -30 \\ \Theta_{11} = 20 \\ \Theta_{11} = 20 \\ h_\Theta(x) = g(-30 + 20x_1+20x_2) \]\(x_1\) | \(x_2\) | \(h_\Theta(x)\) |
---|---|---|
0 | 0 | \(g(-30)\approx0\) |
0 | 1 | \(g(-10)\approx0\) |
1 | 0 | \(g(-10)\approx0\) |
1 | 1 | \(g(10)\approx0\) |
Example: OR function
\[h_\Theta(x) = g(-10+20x_1+20x_2) \]\(x_1\) | \(x_2\) | \(h_\Theta(x)\) |
---|---|---|
0 | 0 | \(g(-10)\approx0\) |
0 | 1 | \(g(10)\approx1\) |
1 | 0 | \(g(10)\approx1\) |
1 | 1 | \(g(30)\approx1\) |