LossFunction

class LossFunction

对于最小二乘问题,在优化中很容易受到输入中离群点的影响,影响整个优化问题的收敛,此时可以使用 loss function 去降低离群点的影响。

考虑一个 structure from motion 问题,3D 点和相机参数均为未知待优化变量,测量值是图像坐标,描述了某一点的预期重投影位置。例如,我们要模拟一个街道场景的几何形状,其中有消防栓和汽车,由一个未知参数的移动像机观测(内外参均未知),我们唯一关心的三维点是消防栓的尖顶。我们的图像处理算法负责生成输入到 Ceres 的测量值,它已在所有图像帧中找到并匹配了所有此类顶点,只有其中一帧将汽车前灯误认为是消防栓。如果我们不做任何特殊处理,错误测量的残差将导致优化结果偏离最佳值,以减少因错误测量而产生的巨大误差。

使用鲁棒损失核函数可以降低大残差的影响。在上面的例子中,这会导致离群项的权重降低,不会过度影响最终解。

class LossFunction {
 public:
  virtual void Evaluate(double s, double out[3]) const = 0;
};

关键在于 LossFunction::Evaluate 的计算,给定一个非负的标量 ss,计算鲁棒核函数的值,以及对应的一阶和二阶导数。

out=[τ(s),τ(s),τ(s)]\text{out}=[\tau(s),\tau^{\prime}(s),\tau^{\prime\prime}(s)]

在使用 loss function 之后,最小二乘问题中的残差项对 costfunction 的贡献为 12τ(s)\frac{1}{2}\tau(s),其中 s=fi2s=||f_i||^2。最理智的 τ\tau 选择满足

τ(0)=0τ(0)=1τ(s)<1 in the outlier region τ(s)<0 in the outlier region \begin{aligned} \tau(0) & =0 \\ \tau^{\prime}(0) & =1 \\ \tau^{\prime}(s) & <1 \text { in the outlier region } \\ \tau^{\prime \prime}(s) & <0 \text { in the outlier region } \end{aligned}

给定一个鲁棒函数 τ(s)\tau(s) ,通过添加一个尺度因子 a>0a>0 ,可以改变残差大小,即 τ(s,a)=a2τ(s/a2)\tau(s, a)=a^{2} \tau\left(s / a^{2}\right) 同时其一阶导和二阶导分别是 τ(s/a2)\tau^{\prime}\left(s / a^{2}\right)(1/a2)τ(s/a2)\left(1 / a^{2}\right) \tau^{\prime \prime}\left(s / a^{2}\right) 。The reason for the appearance of squaring is that aa is in the units of the residual vector norm whereas ss is a squared norm. For applications it is more convenient to specify aa than its square.

Instance

Ceres 拥有一些已经提前定义好的鲁棒核函数,简单起见,这里只简单介绍他们的无尺度版本,下图展示了他们的函数图像,更多的细节可以在 include/ceres/loss_function.h 中找到。

class TrivialLoss
τ(s)=s\tau(s)=s
class HuberLoss
τ(s)={ss12s1s>1\tau (s)= \begin{cases} s & s\le 1 \\ 2\sqrt s-1 & s>1 \end{cases}
class SoftLOneLoss
τ(s)=2(1+s1)\tau (s)=2(\sqrt{1+s} -1 )
class CauchyLoss
τ(s)=log(1+s)\tau(s)=\log(1+s)
class ArctanLoss
τ(s)=arctan(s)\tau(s)=\arctan(s)
class TolerantLoss
τ(s,a,b)=blog(1+e(sa)/b)blog(1+e(a/b)\tau(s,a,b)=b\log(1+e^{(s-a)/b})-b\log(1+e^{(-a/b})
class TukeyLoss
τ(s)={13(1(1s)3)s113s>1\tau (s)= \begin{cases} \frac{1}{3}(1-(1-s)^3) & s\le 1 \\ \frac{1}{3} & s>1 \end{cases}
class CompusedLoss

给定两个鲁棒核函数 f,gf,g 组成 h(s)=f(g(s))h(s)=f(g(s))

class ScaleLoss

有时,可能只想简单地缩放鲁棒函数的输出值。例如,您可能希望对不同的误差项进行不同的加权(例如,对像素重投影误差进行不同的加权)。给定核函数 τ(s)\tau (s) 和一个尺度因子 aaScaleLoss 的做法是实现了 aτ(s)a\tau (s)

class LossFunctionWrapper

有时,在构建优化问题后,我们希望改变损失函数的参数大小。例如,在对有大量离群值的数据进行估计时,可以先使用大尺度,优化问题,然后缩小尺度,从而改善收敛性。这比使用小尺度的损失函数收敛效果更好。这个模板类允许用户在构建优化问题后,实现规模可变的损失函数,例如

Problem problem;

// Add parameter blocks

auto* cost_function =
    new AutoDiffCostFunction<UW_Camera_Mapper, 2, 9, 3>(feature_x, feature_y);

LossFunctionWrapper* loss_function(new HuberLoss(1.0), TAKE_OWNERSHIP);
problem.AddResidualBlock(cost_function, loss_function, parameters);

Solver::Options options;
Solver::Summary summary;
Solve(options, &problem, &summary);

loss_function->Reset(new HuberLoss(1.0), TAKE_OWNERSHIP);
Solve(options, &problem, &summary);

Theory

我们来考虑一个只有一个残差块的优化问题

minx12τ(f2(x))\min _{x} \frac{1}{2} \tau\left(f^{2}(x)\right)

那么该鲁棒核函数的梯度 g(x)g(x) 和高斯牛顿 HH 矩阵如下

g(x)=τJ(x)f(x)H(x)=J(x)(τ+2τf(x)f(x))J(x)\begin{aligned} g(x) & =\tau^{\prime} J^{\top}(x) f(x) \\ H(x) & =J^{\top}(x)\left(\tau^{\prime}+2 \tau^{\prime \prime} f(x) f^{\top}(x)\right) J(x) \end{aligned}

where the terms involving the second derivatives of f(x)f(x) have been ignored. Note that H(x)H(x) is indefinite if τf(x)f(x)+12τ<0\tau^{\prime \prime} f(x)^{\top} f(x)+\frac{1}{2} \tau^{\prime}<0 . If this is not the case, then its possible to re-weight the residual and the Jacobian matrix such that the robustified Gauss-Newton step corresponds to an ordinary linear least squares problem.

α\alpha 为下面方程的根

12α2αττf(x)2=0.\frac{1}{2} \alpha^{2}-\alpha-\frac{\tau^{\prime \prime}}{\tau^{\prime}}\|f(x)\|^{2}=0 .

Then, define the rescaled residual and Jacobian as

f~(x)=τ1αf(x)J~(x)=τ(1αf(x)f(x)f(x)2)J(x)\begin{array}{l} \tilde{f}(x)=\frac{\sqrt{\tau^{\prime}}}{1-\alpha} f(x) \\ \tilde{J}(x)=\sqrt{\tau^{\prime}}\left(1-\alpha \frac{f(x) f^{\top}(x)}{\|f(x)\|^{2}}\right) J(x) \end{array}

In the case 2τf(x)2+τ02 \tau^{\prime \prime}\|f(x)\|^{2}+\tau^{\prime} \lesssim 0 , we limit α1ϵ\alpha \leq 1-\epsilon for some small ϵ\epsilon . For more details see Triggs.

With this simple rescaling, one can apply any Jacobian based non-linear least squares algorithm to robustified non-linear least squares problems.

While the theory described above is elegant, in practice we observe that using the Triggs correction when τ<0\tau ^{\prime\prime} <0 leads to poor performance, so we upper bound it by zero. For more details see corrector.cc

Last updated