Neural Network
activiation funciton: typically simple non-linear function to do non-linear transformation.
SWISH function: is continous compared to ReLu.
logistic model 本质是线性的。
Bring in the Economics
How make the engineering solutions speak to the Economics?
像bankrupt一样,你想预测的一些结果,有的时候像noise 一样neligible.
Are ML Models Better?
事实上,如果这些ML模型很好的话,应该呈现一种对角线的visualization的形式,但结果表明,仍有很多error,在左下(false positive)和右上出现。
slip through all the infomation to find a useful ine. Here average seller repsonse time is one typical example.
Null hypothesis: a good borrower.(state everyone is a good borrower is a null hypothesis.)
Model c.f. null hypothesis.
如果从假设推断的角度想:就可以把ML model 和统计假设推断(hypothesis test)结合起来。
Type-I and Type-II errors
假阳性/假阴性。
what is the actual person? compared with the hypothesis type.
type-1 和 type-2 error哪一个更cost?
False positive is more costly?
Drag down false-positive
Economic Loss Function
cost associated to these two types of errors.
the threshold pbar is controlling the desicion of these 2 types of errors.
That is you need to apply domain knowledge rather than simply cs methods.
- The fragility of ML models.
It’s a needle in a haystack: 大海捞针
too much information:
valuable info 比 noise 增长的速度慢。
Data may be fake.
假设保险公司想统计买保险的人的健康情况,以记步为例子,过去人们没有step counter的时候采用真实的几步,现在,当step counter在抖音上火了以后,人们纷纷去买,保险公司再统计到的数据就不再真实。
所以,跟CV的不变性不同的是,在economic interest主导的市场,模型依据的统计关系不是一成不变的,(c.f. MIT+FinTech Q&A blog)
Robust ML model
variables that are easily manipulated but not easily verified should not be given too much weight.
The old data.The new data.
a game between borrower and the lander.
a little bit smarter algorithm.