Neural Factorization Machines for Sparse Predictive Analytics

  • 主要内容
  • 代码

He X. and Chua T. Neural factorization machines for sparse predictive analytics. In International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2017.

引入 B-Interaction Layer 引入 二阶的特征交叉, 并通过 MLP 提取 high-order 信息. 和 DeepFM 的区别就是并联和串联的区别?


  1. 稀疏特征 \(\bm{x}\);
  2. 通过 embedding layer 获得:

\[\mathcal{V}_x = \{x_1 \bm{v}_1, x_2 \bm{v}_2, \cdots, x_n \bm{v}_n\}; \]

  1. 通过 Bi-Interaction Layer 获得交叉特征:

\[f_{BI}(\mathcal{V}_x) = \sum_{i=1}^n \sum_{j = i + 1} x_i \bm{v}_i \odot x_j \bm{v}_j, \]

其中 \(\odot\) 是 element-wise 乘法;
4. 通过 MLP 获得 high-order 信息:

\[\bm{z}_1 = \sigma_1(W_1 f_{BI}(\mathcal{V}_x) + \bm{b}_1), \\ \bm{z}_2 = \sigma_2(W_2 \bm{z}_1) + \bm{b}_2), \\ \vdots \\ \bm{z}_L = \sigma_L(W_L \bm{z}_{L-1}) + \bm{b}_L). \\ \]

  1. NFM:

\[\hat{y}_{NFM}(\bm{x}) = w_0 + \bm{w}^T\bm{x} + \bm{h}^T \bm{z}_L. \]

  1. 如果是预测得分, 可以通过

\[L_{reg} = \sum_{\bm{x} \in \mathcal{X}} (\hat{y}(\bm{x}) - y(\bm{x}))^2 \]

来训练, 如果是分类, 则可以用 log loss ...

