Modeling Dynamic User Interests: A Neural Matrix Factorization Approach
Research Context:Boston Globe
- What they really care about?
- Content Recommendation:想变得有趣然后留住customer
- Dynamic User Segementation:
- 什么是Dynamic User Segementation?
- Person A :sports/ Person B:politics
- create profiles, and use this profiles for targeting ads, offers.
- Company want to figure out your interest and target you and hope you buy their product.
- Optimize user experience: how to place articles at the front page?
Our broad contribution : A novel neural network for modeling dynamic of user interests
Desired Output: A sample user’s meaningful content consumption journey
Indentify these latent user interests
Import Modeling Considerations
- Input data is high dimensional: text data是非结构化的
- 注意interest 是 customized化的
- 注意时间的动态性, take time into account. last year you are interest in Politics,but this year your interest has changed
- Interpretability: Explain why that is the case. Why we are recommending this to this person.
Why NN?
Advantages of NNs
- Flexibly model nonlinearities:
- c.f. splines or kernels. The nonlinearities is estimated from data itself, it is not predecided.
- Computationally efficient to estimate
- Pytorch/GPU等等好用
Disadvantages of NNs
- Black-box nature ,lack interpretability. 跟上节课说的一样,manager他们听不懂。
Our model: NNs + interpretability
Our model: Interpretable decomposition into user and content factors:Matrix factorization
First step:降维
one-hot embedding:sparse and giant
GloVe
$E_a$:user降维后的,nothing generalization.但$E_x$是download的,是general的。
$x_i^{t-1}$这里$t-1$表示的是时间,如$t-1$是previous week,$t-2$是2 weeks ago。
First 2 : Estimate a nonlinear hidden state for each user.$l_i^t$:just a linear combination
ReLU: simplest non-linearlarity used by NNs
- Pre-trainned embeddings + hidden state of a user at that time stop: Combined nonlinearly
$$
l_i^t=\sigma_1(W_l ·[E_x x_i^t;E_a a_i])
$$
这里$\sigma_1$这个activiation function 是ReLU.
$W_l$:model parameter estimated from the data.
First 3: In corporate Dynamics in the hidden state $U_i^t$ current hidden state.
nonlinearity in this case is just softmax function.
softmax function convert numbers into probabilities.
At time t, 30% sports,40%politics…
3 step: Combine the user and content factors
4 step: minimize the loss fuction
resonstructed estimate is close to the acutal
Data
这个630好高哦。
Empirical Results:Roadmap
AUC
ROC
Ablation Studies
Test out the relative contribution of different elements of our model.
Economic Significance of Results
Data-driven content categorizations.
Future work:adaptive personalization
Adaptive: see what u do react to taht
Conclusion
Q&A
separate model for each person?
require lots of data. Don’t observe people most of time
Share from similar people.