深度学习中向量(矩阵)微分基本知识


函数矩阵基本运算

1. 函数矩阵

\[ A(x)=\left[ \begin{matrix} a_{11}(x) & a_{12}(x) & \cdots & a_{1n}(x) \\ a_{21}(x) & a_{22}(x) & \cdots & a_{2n}(x) \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1}(x) & a_{m2}(x) & \cdots & a_{mn}(x) \end{matrix} \right] \tag{1} \]

  1. 函数矩阵的导数

\[ A^{'}(x_0)=\frac{dA(x)}{dx}= \left[ \begin{matrix} a^{'}_{11}(x) & a^{'}_{12}(x) & \cdots & a^{'}_{1n}(x) \\ a^{'}_{21}(x) & a^{'}_{22}(x) & \cdots & a^{'}_{2n}(x) \\ \vdots & \vdots & \ddots & \vdots \\ a^{'}_{m1}(x) & a^{'}_{m2}(x) & \cdots & a^{'}_{mn}(x) \end{matrix} \right] \tag{2} \]

  1. 运算性质
  • 函数矩阵的加法、数量乘法、矩阵与矩阵的乘法、矩阵的转置与常数矩阵的相应运算完全相同。
  • \[\frac{d}{dx}[A(x)+B(x)]=\frac{dA(x)}{dx}+\frac{dB(x)}{dx}\tag{3} \]

  • \(k(x)\)\(x\)的纯量函数,\(A(x)\)是函数矩阵,则

\[\frac{d}{dx}[k(x)A(x)]=\frac{dk(x)}{dx}A(x)+k(x)\frac{dA(x)}{dx}\tag{4} \]

  • \(A(x)\)\(B(x)\)均可导,且可以相差,则

\[\frac{d}{dx}[A(x)B(x)]=\frac{dA(x)}{dx}B(x)+A(x)\frac{dB(x)}{dx} \tag{5} \]

  • 设计\(A(x)\)为函数矩阵,\(x=f(t)\)\(t\)的纯量函数,\(A(x)\)\(f(x)\)均可导,则

\[\frac{d}{dx}(A(x))=f^{'}(t)\frac{dA(x)}{dx} \tag{6} \]

  • 函数矩阵的高阶导数为:

\[\frac{d^2 A(x)}{dx^2}=\frac{d}{dx}\left(\frac{dA(x)}{dx}\right) \tag{7} \]

\[\frac{d^3 A(x)}{dx^3}=\frac{d}{dx}\left(\frac{d^2A(x)}{dx^2}\right) \tag{8} \]

\[\vdots \]

\[\frac{d^k A(x)}{dx^k}=\frac{d}{dx}\left(\frac{d^{k-1}A(x)}{dx^{k-1}}\right) \tag{9} \]

  1. PyTorch求导示例
import torch
x=torch.Tensor([[1,2,3],[2,3,4]])
x.requires_grad=True
# y=x^2
y=torch.pow(x,2)
# 求导
y.sum().backward()
print(x.grad)
>>tensor([[2., 4., 6.],
        [4., 6., 8.]])
  1. 函数矩阵对矩阵的导数
  • 定义
    \(A=(a_{ij})_{p \times q}\)\(B=(b_{kl})_{p \times q}\)\(A\)中的每一个元素\(a_{ij}\)\(B\)的函数,即\(a_{ij}=a_{ij}(B)\),称\(A\)\(B\)的函数,用\(A(B)\)表示。
  • 函数矩阵\(A\)对矩阵\(B\)的导数\(\frac{DA}{DB}\)

    \[ \frac{DA}{DB}= \left[ \begin{matrix} \frac{\partial A}{\partial b_{11}} & \frac{\partial A}{\partial b_{12}} & \cdots & \frac{\partial A}{\partial b_{1q}} \\ \frac{\partial A}{\partial b_{21}} & \frac{\partial A}{\partial b_{22}} & \cdots & \frac{\partial A}{\partial b_{2q}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial A}{\partial b_{p1}} & \frac{\partial A}{\partial b_{p2}} & \cdots & \frac{\partial A}{\partial b_{pq}} \end{matrix} \right]=\left(\frac{\partial A}{\partial b_{kl}} \right) \tag{10} \]

    其中:

    \[ \frac{\partial A}{\partial b_{kl}}= \left[ \begin{matrix} \frac{\partial a_{11}}{\partial b_{kl}} & \frac{\partial a_{12}}{\partial b_{kl}} & \cdots & \frac{\partial a_{1n}}{\partial b_{kl}} \\ \frac{\partial a_{21}}{\partial b_{kl}} & \frac{\partial a_{22}}{\partial b_{kl}} & \cdots & \frac{\partial a_{2n}}{\partial b_{kl}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial a_{m1}}{\partial b_{kl}} & \frac{\partial a_{m2}}{\partial b_{kl}} & \cdots & \frac{\partial a_{mn}}{\partial b_{kl}} \end{matrix} \right] \tag{11} \]

  • 性质

    \[\frac{DA}{DA^T}=\frac{DA^T}{DA}=E \tag{12} \]

    \[\frac{D(A+B)}{DC}=\frac{DA}{DC}+\frac{DB}{DC} \tag{13} \]