ce_amtic's
ML Notes
  1. 1. Logistic Regression
  2. 2. Simple RNN

Logistic Regression

Loss Function: Cross Entropy

$$L(\mathbb w, \mathbb X, \mathbb Y)= -\dfrac {1} {m}\sum\limits_{t=1}^m y_t\log (\hat y_t)+(1-y^t)\log{(1-\hat y_t)} $$

其思想等价于最大化似然。为什么不用MSE呢,因为MSE具有非凸性。

Basics:

$$\mathbb x=\begin{pmatrix} x_1\newline x_2 \end{pmatrix},\mathbb X=\begin{vmatrix}\mathbb x^{(1)}\dots\mathbb x^{(m)}\end{vmatrix},\mathbb w=\begin{pmatrix}w_1\newline w_2\end{pmatrix},\mathbb {\hat Y}=\begin{vmatrix}\hat y^{(1)}\dots\hat y^{(m)} \end{vmatrix}$$

Gradient:

$$\begin{aligned}\dfrac {\partial}{\partial w_j}J(\mathbb w)&=-\dfrac 1 m \sum\limits_{i=1}^m y^{(i)}\dfrac{\sigma(\mathbb w^T\mathbb x^{(i)}+b)(1-\sigma(\mathbb w^T\mathbb x^{(i)}+b))}{h(\mathbb x^{(i)})}x^{(i)}_j+(1-y^{(i)})\dfrac{-h(\mathbb x^{(i)})(1-h(\mathbb x^{(i)}))}{1-h(\mathbb x^{(i)})}x^{(i)}_j\newline &=-\dfrac 1 m\sum_i \begin{pmatrix}y^{(i)}\dfrac {\hat y(1-\hat y)}{\hat y}+(1-y^{(i)})\dfrac {-\hat y(1-\hat y)}{1-\hat y}\end{pmatrix}\newline&=-\dfrac 1 m \sum_i x^{(i)}_j(y^{(i)}-\hat y)\newline&= \dfrac 1 m \sum_i x^{(i)}_j(\hat y-y^{(i)}) x^{(i)}_j\end{aligned}$$

i.e. $$\dfrac \partial {\partial \mathbb w}J(\mathbb w)=\dfrac 1 m \mathbb X(\hat {\mathbb Y}-\mathbb Y)^T$$

Simple RNN