高斯过程

在概率论和统计学中，高斯过程（英语：Gaussian process）是观测值出现在一个连续域（例如时间或空间）的随机过程。在高斯过程中，连续输入空间中每个点都是与一个正态分布的随机变量相关联。此外，这些随机变量的每个有限集合都有一个多元正态分布，换句话说他们的任意有限线性组合是一个正态分布。高斯过程的分布是所有那些（无限多个）随机变量的联合分布，正因如此，它是连续域（例如时间或空间）上函数的分布。

高斯过程被认为是一种机器学习算法，是以惰性学习（英语：lazy learning）方式，利用点与点之间同质性的度量作为核函数（英语：Kernel function），以从输入的训练数据预测未知点的值。其预测结果不仅包含该点的值，而同时包含不确定性的资料－它的一维高斯分布（即该点的边际分布）。^[1]^[2]

对于某些核函数，可以使用矩阵代数（见克里金法（英语：kriging）条目）来计算预测值。若核函数有代数参数，则通常使用软件以拟合高斯过程的模型。

由于高斯过程是基于高斯分布（正态分布）的概念，故其以卡尔·弗里德里希·高斯为名。可以把高斯过程看成多元正态分布的无限维广义延伸。

高斯过程常用于统计建模中，而使用高斯过程的模型可以得到高斯过程的属性。举例来说，如果把一随机过程用高斯过程建模，我们可以显示求出各种导出量的分布，这些导出量可以是例如随机过程在一定范围次数内的平均值，及使用小范围采样次数及采样值进行平均值预测的误差。

定义

一统计学分布定义为{X_t, t∈T}是一个高斯过程，当且仅当对下标集合T的任意有限子集t₁,...,t_k，

$X_{t_{1},\ldots ,t_{k}}=(X_{t_{1}},\ldots ,X_{t_{k}})$

是一个多元正态分布，这等同于说 $(X_{t_{1}},\ldots ,X_{t_{k}})$ 的任一线性组合是一单变量正态分布。更准确地，取样函数X_t 的任一线性泛函均会得出正态分布。可以写成X ~ GP(m,K)，即随机函数X 以高斯过程（GP）方式分布，且其平均数函数为m 及其协方差函数为K。^[3]当输入向量t为二维或多维时，高斯过程亦可能被称为高斯自由场（高斯场（英语：Gaussian random field））。^[4]

有些人^[5] 假设随机变量 X_t 平均为0；其可以在不失一般性的前提下简化运算，且高斯过程的均方属性可完全由协方差函数K得出。^[6]

协方差函数

高斯过程的关键事实是它们可以完全由它们的二阶统计量来定义.^[4]因此，如果高斯过程被假定为具有平均值零, defining 协方差函数完全定义了过程的行为。重要的是，这个函数的非负定性使得它的谱分解使用了 K-L转换.

可以通过协方差函数定义的基本方面是过程的平稳过程, 各向同性, 光滑函数和周期函数。^[7]^[8]

平稳过程指的是过程的任何两点x和x'的分离行为。如果过程是静止的，取决于它们的分离x-x'，而如果非平稳则取决于x和x'的实际位置。例如，一个特例 Ornstein–Uhlenbeck 过程, 一个布朗运动过程，是固定的。

如果过程仅依赖于 $|x-x'|$ ，x和x'之间的欧几里德距离（不是方向），那么这个过程被认为是各向同性的。同时存在静止和各向同性的过程被认为是 同质与异质;^[9]在实践中，这些属性反映了在给定观察者位置的过程的行为中的差异（或者更确切地说，缺乏这些差异）。

最终高斯过程翻译为功能先验，这些先验的平滑性可以由协方差函数引起。如果我们预期对于“接近”的输入点x和x'，其相应的输出点y和y'也是“接近”，则存在连续性的假设。如果我们希望允许显著的位移，那么我们可以选择一个更粗糙的协方差函数。行为的极端例子是Ornstein-Uhlenbeck协方差函数和前者不可微分和后者无限可微的平方指数。周期性是指在过程的行为中引发周期性模式。形式上，这是通过将输入x映射到二维向量 $u(x)=(\cos(x),\sin(x))$ 来实现的。

常见的协方差函数

The effect of choosing different kernels on the prior function distribution of the Gaussian process. Left is a squared exponential kernel. Middle is Brownian. Right is quadratic.

一些常见的协方差函数:^[8]

常值： $K_{\operatorname {C} }(x,x')=C$
线性： $K_{\operatorname {L} }(x,x')=x^{T}x'$
高斯噪声: $K_{\operatorname {GN} }(x,x')=\sigma ^{2}\delta _{x,x'}$
平方指数: $K_{\operatorname {SE} }(x,x')=\exp {\Big (}-{\frac {\|d\|^{2}}{2\ell ^{2}}}{\Big )}$
Ornstein–Uhlenbeck : $K_{\operatorname {OU} }(x,x')=\exp \left(-{\frac {|d|}{\ell }}\right)$
Matérn: $K_{\operatorname {Matern} }(x,x')={\frac {2^{1-\nu }}{\Gamma (\nu )}}{\Big (}{\frac {{\sqrt {2\nu }}|d|}{\ell }}{\Big )}^{\nu }K_{\nu }{\Big (}{\frac {{\sqrt {2\nu }}|d|}{\ell }}{\Big )}$
定期: $K_{\operatorname {P} }(x,x')=\exp \left(-{\frac {2\sin ^{2}\left({\frac {d}{2}}\right)}{\ell ^{2}}}\right)$
有理二次方: $K_{\operatorname {RQ} }(x,x')=(1+|d|^{2})^{-\alpha },\quad \alpha \geq 0$

注译

^ Platypus Innovation: A Simple Intro to Gaussian Processes (a great data modelling tool). [2016-11-02]. （原始内容存档于2018-05-01）.
^ Chen, Zexun; Wang, Bo; Gorban, Alexander N. Multivariate Gaussian and Student-t process regression for multi-output prediction. Neural Computing and Applications. 2019-12-31. ISSN 0941-0643. doi:10.1007/s00521-019-04687-8 （英语）.
^ Rasmussen, C. E. Gaussian Processes in Machine Learning. Advanced Lectures on Machine Learning. Lecture Notes in Computer Science 3176. 2004: 63–71. ISBN 978-3-540-23122-6. doi:10.1007/978-3-540-28650-9_4.
^ ^4.0 ^4.1 Bishop, C.M. Pattern Recognition and Machine Learning. Springer. 2006. ISBN 0-387-31073-8.
^ Simon, Barry. Functional Integration and Quantum Physics. Academic Press. 1979.
^ Seeger, Matthias. Gaussian Processes for Machine Learning. International Journal of Neural Systems. 2004, 14 (2): 69–104. doi:10.1142/s0129065704001899.
^ Barber, David. Bayesian Reasoning and Machine Learning. Cambridge University Press. 2012 [2018-06-26]. ISBN 978-0-521-51814-7. （原始内容存档于2020-11-11）.
^ ^8.0 ^8.1 Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning. MIT Press. 2006 [2018-06-26]. ISBN 0-262-18253-X. （原始内容存档于2021-05-22）.
^ Grimmett, Geoffrey; David Stirzaker. Probability and Random Processes. Oxford University Press. 2001. ISBN 0198572220.

[1] Platypus Innovation: A Simple Intro to Gaussian Processes (a great data modelling tool). [2016-11-02]. （原始内容存档于2018-05-01）.

[2] Chen, Zexun; Wang, Bo; Gorban, Alexander N. Multivariate Gaussian and Student-t process regression for multi-output prediction. Neural Computing and Applications. 2019-12-31. ISSN 0941-0643. doi:10.1007/s00521-019-04687-8 （英语）.

[3] Rasmussen, C. E. Gaussian Processes in Machine Learning. Advanced Lectures on Machine Learning. Lecture Notes in Computer Science 3176. 2004: 63–71. ISBN 978-3-540-23122-6. doi:10.1007/978-3-540-28650-9_4.

[prml-4] 4.0 ^4.1 Bishop, C.M. Pattern Recognition and Machine Learning. Springer. 2006. ISBN 0-387-31073-8.

[5] Simon, Barry. Functional Integration and Quantum Physics. Academic Press. 1979.

[seegerGPML-6] Seeger, Matthias. Gaussian Processes for Machine Learning. International Journal of Neural Systems. 2004, 14 (2): 69–104. doi:10.1142/s0129065704001899.

[brml-7] Barber, David. Bayesian Reasoning and Machine Learning. Cambridge University Press. 2012 [2018-06-26]. ISBN 978-0-521-51814-7. （原始内容存档于2020-11-11）.

[gpml-8] 8.0 ^8.1 Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning. MIT Press. 2006 [2018-06-26]. ISBN 0-262-18253-X. （原始内容存档于2021-05-22）.

[PRP-9] Grimmett, Geoffrey; David Stirzaker. Probability and Random Processes. Oxford University Press. 2001. ISBN 0198572220.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]