进行浮动和双重比较的最有效方法是什么?

时间:2020-03-05 18:40:58  来源:igfitidea点击:

比较两个" double"或者两个" float"值的最有效方法是什么?

简单地这样做是不正确的:

bool CompareDoubles1 (double A, double B)
{
   return A == B;
}

但是类似:

bool CompareDoubles2 (double A, double B) 
{
   diff = A - B;
   return (diff < EPSILON) && (-diff < EPSILON);
}

似乎是废物处理。

有人知道更聪明的浮点比较器吗?

解决方案

回答

与大多数人所做的事情(甚至在游戏编程中)相比,与ε值进行比较。

我们应该对实现进行一些更改:

bool AreSame(double a, double b)
{
    return fabs(a - b) < EPSILON;
}

编辑:克里斯特(Christer)在最近的博客文章中添加了有关此主题的大量信息。享受。

回答

我们编写的代码存在错误:

return (diff < EPSILON) && (-diff > EPSILON);

正确的代码是:

return (diff < EPSILON) && (diff > -EPSILON);

(...是的,这是不同的)

我不知道晶圆厂在某些情况下是否不会让我们失去懒惰的评估。我会说这取决于编译器。我们可能要同时尝试两者。如果它们平均相等,则采用fab实施。

如果我们有关于两个浮点数中哪个比另一个浮点数更大的信息,则可以按比较的顺序进行操作,以更好地利用延迟评估。

最后,通过内联此函数可能会得到更好的结果。虽然不太可能改善...

编辑:OJ,感谢我们更正代码。我因此删除了我的评论

回答

有关更深入的方法,请阅读比较浮点数。这是该链接的代码片段:

// Usable AlmostEqual function    
bool AlmostEqual2sComplement(float A, float B, int maxUlps)    
{    
    // Make sure maxUlps is non-negative and small enough that the    
    // default NAN won't compare as equal to anything.    
    assert(maxUlps > 0 && maxUlps < 4 * 1024 * 1024);    
    int aInt = *(int*)&A;    
    // Make aInt lexicographically ordered as a twos-complement int    
    if (aInt < 0)    
        aInt = 0x80000000 - aInt;    
    // Make bInt lexicographically ordered as a twos-complement int    
    int bInt = *(int*)&B;    
    if (bInt < 0)    
        bInt = 0x80000000 - bInt;    
    int intDiff = abs(aInt - bInt);    
    if (intDiff <= maxUlps)    
        return true;    
    return false;    
}

回答

这取决于我们希望比较的精确度。如果要比较完全相同的数字,则只需使用==。 (除非我们实际上确实想要完全相同的数字,否则我们几乎从不希望这样做。)在任何体面的平台上,我们还可以执行以下操作:

diff= a - b; return fabs(diff)<EPSILON;

因为fabs往往很快。相当快,我的意思是它基本上是按位与的,所以最好快一点。

用于比较双精度和浮点数的整数技巧很不错,但往往使各种CPU管道更难以有效处理。而且,由于将堆栈用作经常使用的值的临时存储区域,因此如今在某些有序体系结构中,这绝对不会更快。 (专为那些关心的人而设。)

回答

浮点数的通用比较通常是没有意义的。如何进行比较实际上取决于眼前的问题。在许多问题中,数字被充分离散以允许在给定的公差范围内进行比较。不幸的是,同样有很多问题,这种技巧并没有真正起作用。例如,当观察值非常接近障碍时,可以考虑使用有问题的数字的Heaviside(步进)功能(想到数字股票期权)。进行基于公差的比较不会有多大好处,因为它将有效地将问题从原来的障碍转移到两个新的障碍。再次,对于此类问题,没有通用解决方案,并且特定解决方案可能需要更改数值方法才能达到稳定性。

回答

`return fabs(a - b) < EPSILON;

如果满足以下条件,那就很好了:

  • 输入的数量级变化不大
  • 极少数相反的符号可以视为相等

否则,它将导致我们遇到麻烦。双精度数字的分辨率约为16位小数。如果我们要比较的两个数字的大小大于EPSILON * 1.0E16,那么我们可能会说:

return a==b;

我将研究一种不同的方法,该方法假定我们需要担心第一个问题,并假设第二个可以解决应用程序问题。一个解决方案是这样的:

#define VERYSMALL  (1.0E-150)
#define EPSILON    (1.0E-8)
bool AreSame(double a, double b)
{
    double absDiff = fabs(a - b);
    if (absDiff < VERYSMALL)
    {
        return true;
    }

    double maxAbs  = max(fabs(a) - fabs(b));
    return (absDiff/maxAbs) < EPSILON;
}

这在计算上是昂贵的,但有时正是所需要的。这是我们公司必须要做的,因为我们要处理一个工程库,并且输入可能相差几十个数量级。

无论如何,重点是(并适用于几乎所有编程问题):评估需求,然后提出解决方案来满足需求-不要以为简单的答案就能满足需求。如果经过评估,我们发现fabs(a-b)<EPSILON`就足够了,那就完美了-使用它!但是也要意识到它的缺点和其他可能的解决方案。

回答

在C ++中获取epsilon的可移植方式是

#include <limits>
std::numeric_limits<double>::epsilon()

然后比较功能变为

#include <cmath>
#include <limits>

bool AreSame(double a, double b) {
    return std::fabs(a - b) < std::numeric_limits<double>::epsilon();
}

回答

使用其他建议时要格外小心。这完全取决于上下文。

我花了很长的时间来追踪系统中的错误,这些错误假定是| a-b | &lt;epsilona == b。潜在的问题是:

  • 算法中的隐含推定,即如果a == b和b == c则a == c。
  • 对于以英寸为单位的线和以密耳(.001英寸)为单位的线,请使用相同的epsilon。那是a == b,但是1000a!= 1000b。 (这就是为什么AlmostEqual2sComplement要求提供epsilon或者最大ULPS的原因)。
  • 角度的余弦和线的长度都使用相同的epsilon!
  • 使用这种比较功能对集合中的项目进行排序。 (在这种情况下,将内置的C ++运算符==用于双精度会产生正确的结果。)

就像我说的:这一切都取决于上下文和ab的预期大小。

顺便说一句,std :: numeric_limits &lt;double> :: epsilon()是"机器epsilon"。它是1.0与下一个用double表示的值之间的差。我猜想它可以用于比较功能,但前提是期望值小于1. (这是对@cdv的回答...)

同样,如果我们在doubles中基本上使用了int算术(在某些情况下,在这里我们使用doubles来保存int值),则算术将是正确的。例如4.0 / 2.0将与1.0 + 1.0相同。只要我们不做会导致分数(4.0 / 3.0)或者不超出int大小的事情即可。

回答

比较浮点数是否取决于上下文。由于即使更改操作顺序也会产生不同的结果,因此重要的是要知道数字的期望值"相等"。

在查看浮点比较时,Bruce Dawson比较浮点数是一个不错的起点。

以下定义来自于Knuth的《计算机编程的艺术》:

bool approximatelyEqual(float a, float b, float epsilon)
{
    return fabs(a - b) <= ( (fabs(a) < fabs(b) ? fabs(b) : fabs(a)) * epsilon);
}

bool essentiallyEqual(float a, float b, float epsilon)
{
    return fabs(a - b) <= ( (fabs(a) > fabs(b) ? fabs(b) : fabs(a)) * epsilon);
}

bool definitelyGreaterThan(float a, float b, float epsilon)
{
    return (a - b) > ( (fabs(a) < fabs(b) ? fabs(b) : fabs(a)) * epsilon);
}

bool definitelyLessThan(float a, float b, float epsilon)
{
    return (b - a) > ( (fabs(a) < fabs(b) ? fabs(b) : fabs(a)) * epsilon);
}

当然,选择epsilon取决于上下文,并确定我们希望数字相等的程度。

比较浮点数的另一种方法是查看数字的ULP(最后一个单位)。尽管没有专门讨论比较问题,但是每位计算机科学家都应该知道的有关浮点数的文章是了解浮点如何工作以及陷阱(包括ULP是什么)的一个很好的资源。

回答

我发现Google C ++测试框架包含一个很好的基于跨平台模板的AlmostEqual2sComplement实现,该实现可用于double和float。鉴于它是根据BSD许可发布的,因此只要保留许可,就可以在自己的代码中使用它了。我从http://code.google.com/p/googletest/source/browse/trunk/include/gtest/internal/gtest-internal.h https://github.com/google/googletest/blob中提取了以下代码/master/googletest/include/gtest/internal/gtest-internal.h并在顶部添加了许可证。

确保将GTEST_OS_WINDOWS定义为某个值(或者将使用的代码更改为适合代码库的代码,毕竟它是BSD许可的)。

用法示例:

double left  = // something
double right = // something
const FloatingPoint<double> lhs(left), rhs(right);

if (lhs.AlmostEquals(rhs)) {
  //they're equal!
}

这是代码:

// Copyright 2005, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
//     * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//     * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
//     * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Authors: [email protected] (Zhanyong Wan), [email protected] (Sean Mcafee)
//
// The Google C++ Testing Framework (Google Test)

// This template class serves as a compile-time function from size to
// type.  It maps a size in bytes to a primitive type with that
// size. e.g.
//
//   TypeWithSize<4>::UInt
//
// is typedef-ed to be unsigned int (unsigned integer made up of 4
// bytes).
//
// Such functionality should belong to STL, but I cannot find it
// there.
//
// Google Test uses this class in the implementation of floating-point
// comparison.
//
// For now it only handles UInt (unsigned int) as that's all Google Test
// needs.  Other types can be easily added in the future if need
// arises.
template <size_t size>
class TypeWithSize {
 public:
  // This prevents the user from using TypeWithSize<N> with incorrect
  // values of N.
  typedef void UInt;
};

// The specialization for size 4.
template <>
class TypeWithSize<4> {
 public:
  // unsigned int has size 4 in both gcc and MSVC.
  //
  // As base/basictypes.h doesn't compile on Windows, we cannot use
  // uint32, uint64, and etc here.
  typedef int Int;
  typedef unsigned int UInt;
};

// The specialization for size 8.
template <>
class TypeWithSize<8> {
 public:
#if GTEST_OS_WINDOWS
  typedef __int64 Int;
  typedef unsigned __int64 UInt;
#else
  typedef long long Int;  // NOLINT
  typedef unsigned long long UInt;  // NOLINT
#endif  // GTEST_OS_WINDOWS
};

// This template class represents an IEEE floating-point number
// (either single-precision or double-precision, depending on the
// template parameters).
//
// The purpose of this class is to do more sophisticated number
// comparison.  (Due to round-off error, etc, it's very unlikely that
// two floating-points will be equal exactly.  Hence a naive
// comparison by the == operation often doesn't work.)
//
// Format of IEEE floating-point:
//
//   The most-significant bit being the leftmost, an IEEE
//   floating-point looks like
//
//     sign_bit exponent_bits fraction_bits
//
//   Here, sign_bit is a single bit that designates the sign of the
//   number.
//
//   For float, there are 8 exponent bits and 23 fraction bits.
//
//   For double, there are 11 exponent bits and 52 fraction bits.
//
//   More details can be found at
//   http://en.wikipedia.org/wiki/IEEE_floating-point_standard.
//
// Template parameter:
//
//   RawType: the raw floating-point type (either float or double)
template <typename RawType>
class FloatingPoint {
 public:
  // Defines the unsigned integer type that has the same size as the
  // floating point number.
  typedef typename TypeWithSize<sizeof(RawType)>::UInt Bits;

  // Constants.

  // # of bits in a number.
  static const size_t kBitCount = 8*sizeof(RawType);

  // # of fraction bits in a number.
  static const size_t kFractionBitCount =
    std::numeric_limits<RawType>::digits - 1;

  // # of exponent bits in a number.
  static const size_t kExponentBitCount = kBitCount - 1 - kFractionBitCount;

  // The mask for the sign bit.
  static const Bits kSignBitMask = static_cast<Bits>(1) << (kBitCount - 1);

  // The mask for the fraction bits.
  static const Bits kFractionBitMask =
    ~static_cast<Bits>(0) >> (kExponentBitCount + 1);

  // The mask for the exponent bits.
  static const Bits kExponentBitMask = ~(kSignBitMask | kFractionBitMask);

  // How many ULP's (Units in the Last Place) we want to tolerate when
  // comparing two numbers.  The larger the value, the more error we
  // allow.  A 0 value means that two numbers must be exactly the same
  // to be considered equal.
  //
  // The maximum error of a single floating-point operation is 0.5
  // units in the last place.  On Intel CPU's, all floating-point
  // calculations are done with 80-bit precision, while double has 64
  // bits.  Therefore, 4 should be enough for ordinary use.
  //
  // See the following article for more details on ULP:
  // http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm.
  static const size_t kMaxUlps = 4;

  // Constructs a FloatingPoint from a raw floating-point number.
  //
  // On an Intel CPU, passing a non-normalized NAN (Not a Number)
  // around may change its bits, although the new value is guaranteed
  // to be also a NAN.  Therefore, don't expect this constructor to
  // preserve the bits in x when x is a NAN.
  explicit FloatingPoint(const RawType& x) { u_.value_ = x; }

  // Static methods

  // Reinterprets a bit pattern as a floating-point number.
  //
  // This function is needed to test the AlmostEquals() method.
  static RawType ReinterpretBits(const Bits bits) {
    FloatingPoint fp(0);
    fp.u_.bits_ = bits;
    return fp.u_.value_;
  }

  // Returns the floating-point number that represent positive infinity.
  static RawType Infinity() {
    return ReinterpretBits(kExponentBitMask);
  }

  // Non-static methods

  // Returns the bits that represents this number.
  const Bits &bits() const { return u_.bits_; }

  // Returns the exponent bits of this number.
  Bits exponent_bits() const { return kExponentBitMask & u_.bits_; }

  // Returns the fraction bits of this number.
  Bits fraction_bits() const { return kFractionBitMask & u_.bits_; }

  // Returns the sign bit of this number.
  Bits sign_bit() const { return kSignBitMask & u_.bits_; }

  // Returns true iff this is NAN (not a number).
  bool is_nan() const {
    // It's a NAN if the exponent bits are all ones and the fraction
    // bits are not entirely zeros.
    return (exponent_bits() == kExponentBitMask) && (fraction_bits() != 0);
  }

  // Returns true iff this number is at most kMaxUlps ULP's away from
  // rhs.  In particular, this function:
  //
  //   - returns false if either number is (or both are) NAN.
  //   - treats really large numbers as almost equal to infinity.
  //   - thinks +0.0 and -0.0 are 0 DLP's apart.
  bool AlmostEquals(const FloatingPoint& rhs) const {
    // The IEEE standard says that any comparison operation involving
    // a NAN must return false.
    if (is_nan() || rhs.is_nan()) return false;

    return DistanceBetweenSignAndMagnitudeNumbers(u_.bits_, rhs.u_.bits_)
        <= kMaxUlps;
  }

 private:
  // The data type used to store the actual floating-point number.
  union FloatingPointUnion {
    RawType value_;  // The raw floating-point number.
    Bits bits_;      // The bits that represent the number.
  };

  // Converts an integer from the sign-and-magnitude representation to
  // the biased representation.  More precisely, let N be 2 to the
  // power of (kBitCount - 1), an integer x is represented by the
  // unsigned number x + N.
  //
  // For instance,
  //
  //   -N + 1 (the most negative number representable using
  //          sign-and-magnitude) is represented by 1;
  //   0      is represented by N; and
  //   N - 1  (the biggest number representable using
  //          sign-and-magnitude) is represented by 2N - 1.
  //
  // Read http://en.wikipedia.org/wiki/Signed_number_representations
  // for more details on signed number representations.
  static Bits SignAndMagnitudeToBiased(const Bits &sam) {
    if (kSignBitMask & sam) {
      // sam represents a negative number.
      return ~sam + 1;
    } else {
      // sam represents a positive number.
      return kSignBitMask | sam;
    }
  }

  // Given two numbers in the sign-and-magnitude representation,
  // returns the distance between them as an unsigned number.
  static Bits DistanceBetweenSignAndMagnitudeNumbers(const Bits &sam1,
                                                     const Bits &sam2) {
    const Bits biased1 = SignAndMagnitudeToBiased(sam1);
    const Bits biased2 = SignAndMagnitudeToBiased(sam2);
    return (biased1 >= biased2) ? (biased1 - biased2) : (biased2 - biased1);
  }

  FloatingPointUnion u_;
};

编辑:这篇文章是4岁。它可能仍然有效,并且代码很好,但是有些人发现了改进。最好从Google Test源代码中获取最新版本的AlmostEquals,而不是我在此处粘贴的最新版本。

回答

对于任何涉及浮点减法的答案,我都会非常警惕(例如fabs(a-b)<epsilon)。首先,浮点数在较大的幅度和足够高的幅度(稀疏间隔大于epsilon)处变得更稀疏,我们也可以只执行a == b。其次,减去两个非常接近的浮点数(鉴于我们正在寻找接近相等的情况,这将趋于如此)正是我们获得灾难性抵消的方式。

虽然不是便携式的,但我认为grom的答案在避免这些问题方面做得最好。