Intro to Neural Networks
Overall Process:
Data-> Feature Extraction -> Classifier
Neural Networks:直接从数据中获取特征进行学习,无需进行特征提取。
That is “turn a feature extraction by hand into a ML task“
简单的分类问题:
get a linear model: h(x)=wTx+b
get a linear model + sigmoid activation: h(x)=σ(wTx+b)
set threshold to do classifier :
ypredicted={0ifh(x)<0.5\1ifh(x)≥0.5
在这里,如果是多分类问题呢,我们就去用softmax。
Simple Overview of neural network models:
Raw Data -> a bunch of experts( classifier) ->Decisions
each unit is like an expert, performing a simple task, 这个task 是什么呢,就是去做,wTx+b, a simple computational part.
Feed these decisions to the next level…
first layer:第一层
x->h(x)=σ(wT1x+b1) Unit1,bias1
x->h(x)=σ(wT2x+b2) Unit2,bias2
x->h(x)=σ(wT3x+b3) Unit3,bias3
…
将第一层的这个计算过程,合并,即是:
x->h(x)=σ(WTx+b),这里,W是权重矩阵,b是偏置向量。
然后,第二层,第三层,第四层以此类推。
注意,权重矩阵的notation Wij表示从第i层到第j层连接的偏执,所以说对于第i层的unit来说,他们面对的不是权重矩阵,也是权重矩阵的某一列。
上面就是传说中的 feed-forward deep neural network.
Some intuition
- Not only better classification + better feature(data)encoding ,意味着更多计算力(resources)的消耗。
- NN有分治的思想:feed simple decisions into next stage for more complex decisions.
- Learn to encode on the way to learn to do feature extraction and better classificaiton.
NN architecture
注意,这里units aka “neurons”
neurons 要做的不止是linear model
- Each unit performs a task such as linear classification/detection of magma/linear transformations of its input/scaling/normalization
Jargon in NN
ReLU activation function
ReLU activiation function is also called a rectifier, and a unit that uses the activation if called a Rectified Linear Unit.
Desicion surface
NN always require a trade-off between computer resources and neurons nuts.
hyperparameter search is always a key point and also means sophisticating design of architectures.
Nonlinear separation
Representation power of NN
How power are deep models?
任何function都可以用只有一个hidden layer的NN去表示
Universal approximation theorem.
既然越深越好,那为啥不深呢:why deep instead of big?
Size<->power的 trade-off
因为size太大,由于计算力的限制,我们失去了效率。