自动微分与区间算术:原理、技术与性能分析
立即解锁
发布时间: 2025-10-20 01:01:09 阅读量: 21 订阅数: 36 AIGC 

迈向百亿亿次科学计算
### 自动微分与区间算术:原理、技术与性能分析
#### 1. 自动微分算法
自动微分是一种在科学计算和工程领域中广泛应用的技术,用于高效准确地计算函数的导数。这里主要介绍前向模式和反向模式的梯度计算,以及前向模式的海森矩阵计算。
##### 1.1 前向模式 - 梯度
考虑Kantorovich图中的基本函数 $f_{curr}$,它依赖于一个或两个直接前导函数。对于包含二元运算符的 $f_{curr}$:
\[
f_{curr}(f_{left}, f_{right}) = f_{left} \text{ op } f_{right}
\]
其对输入变量 $x_i$ 的偏导数为:
\[
\frac{\partial f_{curr}}{\partial x_i} = \frac{\partial f_{curr}}{\partial f_{left}} \cdot \frac{\partial f_{left}}{\partial x_i} + \frac{\partial f_{curr}}{\partial f_{right}} \cdot \frac{\partial f_{right}}{\partial x_i}
\]
对于包含一元运算符的 $f_{curr}$:
\[
f_{curr}(f_{left}) = \text{op } f_{left}
\]
其对输入变量 $x_i$ 的偏导数为:
\[
\frac{\partial f_{curr}}{\partial x_i} = \frac{\partial f_{curr}}{\partial f_{left}} \cdot \frac{\partial f_{left}}{\partial x_i}
\]
并且有 $\frac{\partial x_i}{\partial x_i} = 1$。前向模式算法通过执行上述操作,并存储每个节点的中间结果,从独立变量开始,经过中间函数,最终得到目标函数的偏导数。
##### 1.2 前向模式 - 海森矩阵
海森矩阵的计算更为复杂,但基本原理与前向模式的梯度计算类似。对表达式 (4) 关于第二个变量 $x_j$ 求导,得到二阶导数:
\[
\frac{\partial^2 f_{curr}}{\partial x_i \partial x_j} = \frac{\partial}{\partial x_j} \left( \frac{\partial f_{curr}}{\partial f_{left}} \cdot \frac{\partial f_{left}}{\partial x_i} + \frac{\partial f_{curr}}{\partial f_{right}} \cdot \frac{\partial f_{right}}{\partial x_i} \right)
\]
\[
= \frac{\partial^2 f_{curr}}{\partial f_{left} \partial x_j} \cdot \frac{\partial f_{left}}{\partial x_i} + \frac{\partial f_{curr}}{\partial f_{left}} \cdot \frac{\partial^2 f_{left}}{\partial x_i \partial x_j} + \frac{\partial^2 f_{curr}}{\partial f_{right} \partial x_j} \cdot \frac{\partial f_{right}}{\partial x_i} + \frac{\partial f_{curr}}{\partial f_{right}} \cdot \frac{\partial^2 f_{right}}{\partial x_i \partial x_j}
\]
初始值为 $\frac{\partial^2 x_i}{\partial x_i \partial x_j} = 0$ 和 $\frac{\partial^2 x_j}{\partial x_i \partial x_j} = 0$。计算海森矩阵需要用到一些二阶偏导数的值,为了简化和提高计算效率,可使用一些缩写。以下是根据算术运算符得到的二阶偏导数方程:
| 操作 | 左操作数 | 右操作数 |
| ---- | ---- | ---- |
| 加法 | $\frac{\partial^2 f_{curr}}{\partial f_{left} \partial x_i} = 0$ | $\frac{\partial^2 f_{curr}}{\partial f_{right} \partial x_i} = 0$ |
| 减法 | $\frac{\partial^2 f_{curr}}{\partial f_{left} \partial x_i} = 0$ | $\frac{\partial^2 f_{curr}}{\partial f_{right} \partial x_i} = 0$ |
| 乘法 | $\frac{\partial^2 f_{curr}}{\partial f_{left} \partial x_i} = \frac{\partial f_{right}}{\partial x_i}$ | $\frac{\partial^2 f_{curr}}{\partial f_{right} \partial x_i} = \frac{\partial f_{left}}{\partial x_i}$ |
| 除法 | $\frac{\partial^2 f_{curr}}{\partial f_{left} \partial x_i} = \frac{\partial}{\partial x_i} \left( \frac{1}{\partial f_{right}} \right) = - \left( \frac{1}{f_{right}} \right)^2 \cdot \frac{\partial f_{right}}{\partial x_i}$ | $\frac{\partial^2 f_{curr}}{\partial f_{right} \partial x_i} = 0$ |
| 一元操作 | $\frac{\partial^2 f_{curr}}{\partial f_{left} \partial x_i} = \frac{\partial^2 f_{curr}}{\partial^2 f_{left}} \cdot \frac{\partial f_{left}}{\partial x_i}$ | - |
##### 1.3 反向模式 - 梯度
反向模式算法是自动微分的另一种方法,其关键在于导数传播是反向进行的,这通常更适合输入变量较多的问题。考虑关系 $\overline{f}_i = \frac{\partial f_{curr}}{\partial f_i} \cdot \overline{f}_{curr}$,其中 $\overline{f}_i = \frac{\partial f_{n + k}}{\partial f_i}$,$\overline{f}_{curr} = \frac{\partial f_{n + k}}{\partial f_{curr}}$。对于 $curr = n + k$,假设 $\overline{f}_{curr} = \frac{\partial f_{curr}}{\partial f_{curr}} = 1$。根据此关系,可以推导出Kantorovich图中关于前导函数的偏导数公式:
\[
\frac{\partial f_{n + k}}{\partial f_{left}} = \sum \frac{\partial f_{curr}}{\partial f_{left}} \cdot \frac{\par
0
0
复制全文


