Intro to Neural Networks
Overall Process:
Data-> Feature Extraction -> Classifier
Neural Networks:直接从数据中获取特征进行学习,无需进行特征提取。
That is “turn a feature extraction by hand into a ML task“
简单的分类问题:
get a linear model: $h(x)=w^Tx+b$
get a linear model + sigmoid activation: $h(x)=\sigma (w^Tx+b)$
set threshold to do classifier :
$$
y_{predicted}=\begin{cases}0 && if h(x)<0.5\1 &&if h(x)\ge0.5\end{cases}
$$
在这里,如果是多分类问题呢,我们就去用softmax。
Simple Overview of neural network models:
Raw Data -> a bunch of experts( classifier) ->Decisions
each unit is like an expert, performing a simple task, 这个task 是什么呢,就是去做,$w^Tx+b$, a simple computational part.
Feed these decisions to the next level…
first layer:第一层
x->$h(x)=\sigma (w_1^Tx+b_1)$ Unit1,bias1
x->$h(x)=\sigma (w_2^Tx+b_2)$ Unit2,bias2
x->$h(x)=\sigma (w_3^Tx+b_3)$ Unit3,bias3
…
将第一层的这个计算过程,合并,即是:
x->h(x)=$\sigma(W^Tx+b)$,这里,W是权重矩阵,b是偏置向量。
然后,第二层,第三层,第四层以此类推。
注意,权重矩阵的notation $W_{ij}$表示从第i层到第j层连接的偏执,所以说对于第i层的unit来说,他们面对的不是权重矩阵,也是权重矩阵的某一列。
上面就是传说中的 feed-forward deep neural network.
Some intuition
- Not only better classification + better feature(data)encoding ,意味着更多计算力(resources)的消耗。
- NN有分治的思想:feed simple decisions into next stage for more complex decisions.
- Learn to encode on the way to learn to do feature extraction and better classificaiton.
NN architecture
注意,这里units aka “neurons”
neurons 要做的不止是linear model
- Each unit performs a task such as linear classification/detection of magma/linear transformations of its input/scaling/normalization
Jargon in NN
ReLU activation function
ReLU activiation function is also called a rectifier, and a unit that uses the activation if called a Rectified Linear Unit.
Desicion surface
NN always require a trade-off between computer resources and neurons nuts.
hyperparameter search is always a key point and also means sophisticating design of architectures.
Nonlinear separation
Representation power of NN
How power are deep models?
任何function都可以用只有一个hidden layer的NN去表示
Universal approximation theorem.
既然越深越好,那为啥不深呢:why deep instead of big?
Size<->power的 trade-off
因为size太大,由于计算力的限制,我们失去了效率。