局部加权回归LOWESS
1. LOWESS
用kNN做平均回归:
\[\hat{f(x)} = Ave(y_i | x_i \in N_k(x)) \]其中,\(N_k(x)\)为距离点x最近k个点组成的邻域集合(neighborhood set)。这种邻域平均回归存在很多缺点:
- 没有考虑到不同距离的邻近点应有不同的权重;
- 拟合的曲线不连续(discontinuous),如下图。
因此引入kernel加权平滑:
\[\hat{f(x_0)} = \frac{ \sum_{i=1}^{N} K_{\lambda}(x_0, x_i)y_i }{\sum_{i=1}^{N} K_{\lambda}(x_0, x_i)} \]比如,Epanechnikov 二次kernel:
\[K_{\lambda}(x_0, x_i) = D(\frac{|x_0 - x_i|}{\lambda}) \]\[D(t) = \left \{ { \matrix { {\frac{3}{4} (1-t^2) } & {for |t| < 1} \cr { 0} & {otherwise} \cr } } \right. \]其中,\(\lambda\)为kernel的参数,称之为window width。对于kNN,只考虑最近的k个点影响;基于此,
\[\lambda = |x_0 - x_{[k]}| \]其中,\(x_{[k]}\)为距离\(x_0\)第k近的点。如上图,经kernel加权平滑后,回归拟合的曲线为连续的了。但是,这种kernel回归同样存在着边界(boundary)问题,如下图:
对于x序列的开始与结束区段的点,其左右邻域是不对称的,导致了平滑后的值偏大或偏小。因此,需要对权值做再修正,假定对\(x_0\)的估计值:_
\[\hat{f(x_0)} = \sum_{j=0}^d \beta_j x_0^{j} \]定义目标函数:
\[\min_{\beta} \sum_{i=1}^N K_{\lambda}(x_0, x_i) [y_i - \sum_{j=0}^d \beta_j x_i^j]^2 \] \[B = \begin{pmatrix} 1 & x_1 & \cdots & x_1^d \\ 1 & x_2 & \cdots & x_2^d \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_N & \cdots & x_N^d \\ \end{pmatrix} \]\[W_{x_0} = \begin{pmatrix} K_{\lambda}(x_0, x_1) & 0 & \cdots & 0 \\ 0 & K_{\lambda}(x_0, x_2) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & K_{\lambda}(x_0, x_N) \\ \end{pmatrix} \]\[\Delta = \begin{pmatrix} \beta_0, \beta_1, \cdots, \beta_d \end{pmatrix}^T \]\[Y = \begin{pmatrix} y_1, y_2, \cdots, y_N \end{pmatrix}^T \]那么,目标函数可改写为
\[\min_{\Delta} (Y-B\Delta)^T W_{x_0} (Y-B\Delta) \]求偏导,可得到
\[\Delta = (B^T W_{x_0} B)^{-1} (B^T W_{x_0} Y) \]那么,估计值
\[\begin{aligned} \hat{f(x_0)} &= e(x_0) (B^T W_{x_0} B)^{-1} (B^T W_{x_0} Y) \\ & = \sum_i w_i (x_0) y_i \end{aligned} \]其中,\(e(x_0) = \begin{pmatrix} 1, x_0, \cdots, x_0^d \end{pmatrix}\)。上述回归方法称之为LOWESS (LOcal Weighted regrESSion)。
2. Robust LOWESS
Robust LOWESS是Cleveland [1] 在LOWESS基础上提出来的robust回归方法,能避免outlier对回归的影响。在计算完估计值后,计算残差:
\[e_i = y_i - \hat{f(x_i)} \]根据残差计算robustnest weight:
\[\delta_i = B(e_i/6s) \]其中,\(s\)为残差绝对值序列\(|e_i|\)d的中位值(median),\(B\)函数为bisquare函数:
\[B(u) = \left \{ { \matrix { {(1-u^2)^2 } & {for \quad 0 \le u < 1} \cr { 0 } & {for \quad u \ge 1} \cr } } \right. \]然后,用robustness weight乘以kernel weight作为\(W_{x_0}\)的新weight。如此,便剔除了残差较大的异常点对于回归的影响。这里有Python版实现。
3. 参考资料
[1] Trevor Hastie, Robert Tibshirani, Jerome H. Friedman. The elements of statistical learning. Springer, Berlin: Springer series in statistics, 2009.
[2] Cleveland, William S. "Robust locally weighted regression and smoothing scatterplots." Journal of the American statistical association 74.368 (1979): 829-836.
[3] peterf, The Local Polynomial Regression Estimator.