- What is the learning slowdown problem? What general idea can we learn from the cross-entropy and softmax to solve the learning slowdown problem?
Learning slowdown 问题描述:
神经元在学习的过程中,没有很好地对出错很大的情况进行学习,learning rate的改变很小,learning rate学习的效果慢。
Learning slowdown problem describes that the neurons don’t learn very well ( don’t change their learning rate fast ) when they make big errors in the learning process especially at the begin stage of learnning. Instead, these neurons tend to learn very quick when the errors are small. Such learning behaviors lead to very low efficiency. But we expect neurons to learn like human ,that is, the bigger the error made during learning, the faster it they learn from the error.
models change their learning rate very slowly when they make some big errors, but we expect them to learn very quick from bad mistakes just like humans. What’s worse, these models tend to learn very quick when the errors are small. Such learning behaviors make it very low efficient especially when the errors are big at the beginning of learning(trainning).
The learning slowdown problem arises from the partial derivative of cost function those models apply. For instance, the MSE cost function’s partial derivative to the weight matrix behave like the bigger the error, the smaller the derivative, the lower the change of gradient decent, and the slower the neuron learn from wrong experience.
Therefore , the key to solve the learning slowdown problem as well as the general idea of cross entropy and softmax is to adjust the models to make the neurons learning like human. In other words, the larger the error, the more impressed the neurons is ,and the faster it learns.
导数小,偏导数小,梯度变化慢,学的慢。
那这种问题是由什么原因导致的呢:机器学习在使用梯度下降的过程中需要利用损失函数对权重矩阵求偏导,而拥有learning rate problem的损失函数中(如MSE),它的偏导形式确是误差(错误)越大,其导数越小,learning rate改变越慢。
- 所以,我个人理解中解决learning slowdown problem需要 调整学习背后的数学模型,让对于错误越大的case,使机器像人一样,错误越大,印象越深,学的越快。
Learning slowdown problem describes that the neurons don’t learn very well ( don’t change their learning rate fast ) when they make big errors in the learning process especially at the begin stage. Instead, these neurons tend to learn very quick when the errors are small. Such learning behaviors lead to very low efficiency. But we expect neurons to learn like human ,that is, the bigger the error made during learning, the faster they learn from the error.
The learning slow problem arises from the partial derivative of cost function those models apply. For instance, the MSE cost function’s partial derivative to the weight matrix behave like the bigger the error, the smaller the derivative, the lower the change of gradient decent, and the slower the neuron learns from wrong experience.
Therefore , the key to solve the learning slowdown problem as well as the general idea of cross entropy and softmax is to adjust the models to make the neurons learning like human. In other words, the larger the error they make, the more impressed the neurons are ,and the faster they learn.