Take vague input data.
lay good fundation.
Talk is cheap, show me the code.
Background:
- Make decsion/prediction based on Data.
- data visualization and representation.
这次机器学习基础课程的重点仍然是有监督与无监督学习
Build models work for unseen data对尚未观测到的数据进行预测。
Terminology:
Types:
Supervised Learning: learn from training Data example->ml model :predict an unseen data.
data+ label
Self-supervided/transfer-learning
Key:supervised learning.
Traing example:training data and value.
目标,找到什么函数可以take x to y.
Training Data
Are feature informative?
Invariances:理解、建模和处理数据不变性是机器学习中非常重要的主题。
Error on small groups,label imbalance.
训练数据的一部分标签是你所不感兴趣的。
Outliers:regression
有的时候outlier也未必是坏事,有的时候离群值可以让模型变得更好。
signal or noise?
Representation of data
numerical
categorical
binary
structured
very deserve exercise
How to represent raw image.
Training Data and test data have same underlying distritutes:这样你的模型就会效果很好(small test error)。
也就是说,你想在一堆图片中找到橘子,你的训练和测试都有橘子。
Zero error?
data hungry model:complex model ,memory the train data
Overfitting and 鲁棒性
讨论分类的鲁棒性
优化: minimize loss 凸模型
Model Selection
How do I know my model makes sense?
which feature is influential?
- Simple vs complex :generalization
- Simple vs complex:interpretability