Java 执行双值相等比较时,epsilon 值应该是多少

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3728246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 03:59:36  来源:igfitidea点击:

What should be the epsilon value when performing double value equal comparison

java

提问by Cheok Yan Cheng

Here is the output for the below program.

这是以下程序的输出。

value is : 2.7755575615628914E-17
Double.compare with zero : 1
isEqual with zero : true

My question is, what should be an epsilon value? Is there any robust way to obtain the value, instead of picking a number out from the sky.

我的问题是,epsilon 值应该是多少?是否有任何可靠的方法来获取该值,而不是从天上挑出一个数字。



package sandbox;

/**
 *
 * @author yccheok
 */
public class Main {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        double zero = 1.0/5.0 + 1.0/5.0 - 1.0/10.0 - 1.0/10.0 - 1.0/10.0 - 1.0/10.0;
        System.out.println("value is : " + zero);
        System.out.println("Double.compare with zero : " + Double.compare(zero, 0.0));
        System.out.println("isEqual with zero : " + isEqual(zero, 0.0));
    }

    public static boolean isEqual(double d0, double d1) {
        final double epsilon = 0.0000001;
        return d0 == d1 ? true : Math.abs(d0 - d1) < epsilon;
    }
}

采纳答案by mob

The answer to your second question is no. The magnitude of finite-machine precision error can be arbitrarily large:

你的第二个问题的答案是否定的。有限机器精度误差的大小可以任意大:

public static void main(String[] args) {
    double z = 0.0;
    double x = 0.23;
    double y = 1.0 / x;
    int N = 50000;
    for (int i = 0; i < N; i++) {
        z += x * y - 1.0;
    }
    System.out.println("z should be zero, is " + z);
}

This gives ~5.55E-12, but if you increase Nyou can get just about any level of error you desire.

这给出了~5.55E-12,但如果你增加,N你可以获得你想要的任何级别的错误。

There is a vast amount of past and current research on how to write numerically stable algorithms. It is a hard problem.

过去和现在有大量关于如何编写数值稳定算法的研究。这是一个难题。

回答by Jerry Coffin

There is no one right value. You need to compute it relative to the magnitude of the numbers involved. What you're basically dealing with is a number of significant digits, not a specific magnitude. If, for example, your numbers are both in the range of 1e-100, and your calculations should maintain roughly 8 significant digits, then your epsilon should be around 1e-108. If you did the same calculations on numbers in the range of 1e+200, then your epsilon would be around 1e+192 (i.e., epsilon ~= magnitude - significant digits).

没有一个正确的值。您需要相对于所涉及数字的大小来计算它。你基本上处理的是一些有效数字,而不是一个特定的数量。例如,如果您的数字都在 1e-100 的范围内,并且您的计算应该保持大约 8 位有效数字,那么您的 epsilon 应该在 1e-108 左右。如果您对 1e+200 范围内的数字进行相同的计算,那么您的 epsilon 将约为 1e+192(即 epsilon ~= 幅度 - 有效数字)。

I'd also note that isEqualis a poor name -- you want something like isNearlyEQual. For one reason, people quite reasonably expect "equal" to be transitive. At the very least, you need to convey the idea that the result is no longer transitive -- i.e., with your definition of isEqual, isEqual(a, c)can be false, even though isEqual(a, b)and isEqual(b, c)are both true.

我还注意到这isEqual是一个糟糕的名字——你想要像isNearlyEQual. 出于一个原因,人们相当合理地期望“相等”是可传递的。至少,您需要传达结果不再具有传递性的想法——即,根据您对 的定义isEqualisEqual(a, c)可能为假,即使isEqual(a, b)isEqual(b, c)都为真。

Edit: (responding to comments): I said "If [...] your calculations should maintain roughly 8 significant digits, then your epsilon should be...". Basically, it comes to looking at what calculations you're doing, and how much precision you're likely to lose in the process, to provide a reasonable guess at how big a difference has to be before it's significant. Without knowing the calculation you're doing, I can't guess that.

编辑:(回复评论):我说“如果 [...] 您的计算应该保持大约 8 位有效数字,那么您的 epsilon 应该是...”。基本上,它涉及到查看您正在执行哪些计算,以及您在此过程中可能会损失多少精度,以合理猜测差异必须有多大才能显着。不知道你正在做的计算,我无法猜测。

As far as the magnitude of epsilon goes: no, it does notmake sense for it to always be less than or equal to 1. A floating point number can only maintain limited precision. In the case of an IEEE double precision floating point, the maximumprecision that can be represented is about 20 decimal digits. That means if you start with 1e+200, the absolute smallest difference from that number that the machine can represent at allis about 1e+180 (and a double can represent numbers up to ~1e+308, at which point the smallest difference that can be represented is ~1e+288).

就 epsilon 的大小而言:不,它总是小于或等于 1是没有意义的。浮点数只能保持有限的精度。在 IEEE 双精度浮点的情况下,可以表示的最大精度约为 20 位十进制数字。这意味着如果开始时1E + 200,从该数的绝对差最小,所述机器可以代表在所有为约1E + 180(和双可以表示数字高达〜1E + 308,在该点的最小差可以表示为~1e+288)。

回答by Alexandre C.

I like (pseudo code, I don't do java)

我喜欢(伪代码,我不做java)

bool fuzzyEquals(double a, double b)
{
    return abs(a - b) < eps * max(abs(a), abs(b));
}

with epsilon being a few times the machine epsilon. Take 10^-12 if you don't know what to use.

epsilon 是机器 epsilon 的几倍。如果您不知道使用什么,请取 10^-12。

This is quite problem dependant however. If the computations giving a and b are prone to roundoff error, or involve many operations, or are themselves within some (known) accuracy, you want to take a bigger epsilon.

然而,这完全取决于问题。如果给出 a 和 b 的计算容易出现舍入误差,或者涉及许多操作,或者本身在某些(已知)精度范围内,则您需要采用更大的 epsilon。

Ths point is to use relativeprecision, not absolute.

这一点是使用相对精度,而不是绝对精度。

回答by Geoff Reedy

You should definitely read https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/first.

您一定要先阅读https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

It discusses various ways of comparing floating point numbers: absolute tolerance, relative tolerance, ulp distance. It makes a fairly good argument that ulp checking is the way to go. The case hinges around the argument that if you want to check if two floating point numbers are the same, you have to take into account the distance between representable floats. In other words, you should check if the two numbers are within e floats of each other.

它讨论了比较浮点数的各种方法:绝对容差、相对容差、ulp 距离。这是一个相当好的论点,即 ulp 检查是要走的路。这种情况取决于以下论点:如果要检查两个浮点数是否相同,则必须考虑可表示的浮点数之间的距离。换句话说,您应该检查两个数字是否在彼此的 e 浮点数内。

The algorithms are given in C, but can be translated to java using java.lang.Double#doubleToLongBitsand java.lang.Float#floatToIntBitsto implement the casting from floating to integer types. Also, with java > 1.5 there are methods ulp(double)ulp(float)and for java > 1.6 nextUp(double)nextUp(float)nextAfter(double, double)nextAfter(float, float)that are useful for quantifying the difference between two floating point numbers.

算法在 C 中给出,但可以转换为 java 使用java.lang.Double#doubleToLongBitsjava.lang.Float#floatToIntBits实现从浮点类型到整数类型的转换。此外,对于 java > 1.5,有一些方法ulp(double)ulp(float)和 java > 1.6nextUp(double)nextUp(float)nextAfter(double, double)nextAfter(float, float)可用于量化两个浮点数之间的差异。

回答by Ahmed Fasih

In isEqual, have something like:

isEqual,有类似的东西:

epsilon = Math.max(Math.ulp(d0), Math.ulp(d1))

An ulp of a double value is the positive distance between this floating-point value and the double value next larger in magnitude. [1]

double 值的 ulp 是此浮点值与大小上次大的 double 值之间的正距离。[1]

[1] http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#ulp%28double%29

[1] http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#ulp%28double%29

回答by Richard Gomes

There are two concepts involved here:

这里涉及两个概念:

  1. A machine precision unit: Double.ulp()
  2. A machine precision for a given double d: Double.ulp(d)
  1. 一机精度单位: Double.ulp()
  2. 给定的机器精度double dDouble.ulp(d)

If you call Double.ulp()you will obtain the machine precision unit, which is the precision you can expect from a certain hardware platform... whatever this definition might be!

如果你打电话,Double.ulp()你将获得机器精度单位,这是你可以从某个硬件平台期望的精度......无论这个定义是什么!

If you call Double.ulp(d), you will get the machine precision for double d. In other words, every double dhas its specific precision. This is more useful than the previous paragraph.

如果你打电话Double.ulp(d),你会得到机器精度为double d。换句话说,每个double d都有其特定的精度。这比上一段更有用。

You must take particular attention to detail when you are performing iterations which involve calculations in cascade, i.e.: when results from the previous calculations are employed in the current calculation. This is because errors accumulate in these situations and may end up, in certain circumstances, delivering results which are way off the true value they should deliver. In certain circumstances, the size of the accumulated error may even be greater than the true value. See some disastrous examples here.

当您执行涉及级联计算的迭代时,您必须特别注意细节,即:在当前计算中使用先前计算的结果时。这是因为错误会在这些情况下累积,并且在某些情况下最终可能会产生与它们应该提供的真正价值相去甚远的结果。在某些情况下,累积误差的大小甚至可能大于真实值。请参阅此处的一些灾难性示例

In certain business domains, numerical computation errors are simply not acceptable. Depending on the business domain, its regulations, requirements and characteristics, you must take alternative approaches for the simplistic choice of employing floating point arithmetic (i.e: doublesor floats).

在某些业务领域,数值计算错误是不可接受的。根据业务领域、其法规、要求和特征,您必须采用替代方法来选择采用浮点运算(即:doublesfloats)的简单选择。

In the case of Finance for example, never ever use floating point arithmetic. Never ever use doublesor floatswhen you are dealing with money. Never. Period. You can employ BigDecimal or fixed point arithmetic, depending on circumstances.

例如,在 Finance 的情况下,永远不要使用浮点运算。永远不要使用doublesfloats在您处理金钱时使用。绝不。时期。您可以根据情况使用 BigDecimal 或定点算术

In the specific case of processing stock prices, you know that prices have always 5 digits of precision and, in this case, fixed point arithmeticis plenty enough and also delivers the maximum performance you can possibly obtain, which is a very strong and common requirement in this business domain.

在处理股票价格的特定情况下,您知道价格始终具有 5 位精度,在这种情况下,定点算法就足够了,并且还可以提供您可能获得的最大性能,这是一个非常强大且常见的要求在这个业务领域。

If the business domain really requires numerical computations, you must in this case make sure you keep error propagation under your strict and very careful control. This is a long subject, there are a number of techniques, and very frequently developers overlook the problem simply believing that there's a single magic call to a method which does all the hard work for them. No, it doesn't. You have to do your research, do your homework and do all the hard work necessary in order to make sure you keep errors under control. You need to understand exactly what is going on with the numerical algorithms you've implemented.

如果业务领域确实需要数值计算,那么在这种情况下,您必须确保将错误传播保持在严格且非常谨慎的控制之下。这是一个很长的主题,有许多技术,开发人员经常忽略这个问题,只是认为有一个对方法的神奇调用可以为他们完成所有艰苦的工作。不,它没有。你必须做你的研究,做你的功课,做所有必要的努力,以确保你控制错误。您需要准确了解您实施的数值算法发生了什么。