Java 确定整数的平方根是否为整数的最快方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/295579/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 12:41:43  来源:igfitidea点击:

Fastest way to determine if an integer's square root is an integer

javamathoptimizationperfect-square

提问by Kip

I'm looking for the fastest way to determine if a longvalue is a perfect square (i.e. its square root is another integer):

我正在寻找最快的方法来确定一个long值是否是一个完美的平方(即它的平方根是另一个整数):

  1. I've done it the easy way, by using the built-in Math.sqrt()function, but I'm wondering if there is a way to do it faster by restricting yourself to integer-only domain.
  2. Maintaining a lookup table is impractical (since there are about 231.5integers whose square is less than 263).
  1. 我已经通过使用内置Math.sqrt()函数以简单的方式完成了它,但我想知道是否有办法通过将自己限制在仅限整数的域来更快地完成它。
  2. 维护查找表是不切实际的(因为大约有 2 31.5 个整数的平方小于 2 63)。

Here is the very simple and straightforward way I'm doing it now:

这是我现在正在做的非常简单直接的方法:

public final static boolean isPerfectSquare(long n)
{
  if (n < 0)
    return false;

  long tst = (long)(Math.sqrt(n) + 0.5);
  return tst*tst == n;
}

Note: I'm using this function in many Project Eulerproblems. So no one else will ever have to maintain this code. And this kind of micro-optimization could actually make a difference, since part of the challenge is to do every algorithm in less than a minute, and this function will need to be called millions of times in some problems.

注意:我在许多Project Euler问题中使用了这个函数。所以没有其他人需要维护这个代码。这种微优化实际上可能会有所作为,因为部分挑战是在不到一分钟的时间内完成每个算法,并且在某些问题中需要调用该函数数百万次。



I've tried the different solutions to the problem:

我已经尝试了不同的解决方案:

  • After exhaustive testing, I found that adding 0.5to the result of Math.sqrt() is not necessary, at least not on my machine.
  • The fast inverse square rootwas faster, but it gave incorrect results for n >= 410881. However, as suggested by BobbyShaftoe, we can use the FISR hack for n < 410881.
  • Newton's method was a good bit slower than Math.sqrt(). This is probably because Math.sqrt()uses something similar to Newton's Method, but implemented in the hardware so it's much faster than in Java. Also, Newton's Method still required use of doubles.
  • A modified Newton's method, which used a few tricks so that only integer math was involved, required some hacks to avoid overflow (I want this function to work with all positive 64-bit signed integers), and it was still slower than Math.sqrt().
  • Binary chop was even slower. This makes sense because the binary chop will on average require 16 passes to find the square root of a 64-bit number.
  • According to John's tests, using orstatements is faster in C++ than using a switch, but in Java and C# there appears to be no difference between orand switch.
  • I also tried making a lookup table (as a private static array of 64 boolean values). Then instead of either switch or orstatement, I would just say if(lookup[(int)(n&0x3F)]) { test } else return false;. To my surprise, this was (just slightly) slower. This is because array bounds are checked in Java.
  • 经过详尽的测试,我发现添加0.5到 Math.sqrt() 的结果是没有必要的,至少在我的机器上不需要。
  • 平方根倒数快速增快,但它给了不正确的结果对于n> = 410881.然而,所建议BobbyShaftoe,我们可以使用对于n <410881的FISR黑客攻击。
  • Newton 的方法比Math.sqrt(). 这可能是因为Math.sqrt()使用了类似于牛顿法的东西,但在硬件中实现,所以它比在 Java 中快得多。此外,牛顿法仍然需要使用双打。
  • 一种修改后的牛顿方法,它使用了一些技巧,因此只涉及整数数学,需要一些技巧来避免溢出(我希望这个函数可以处理所有 64 位有符号正整数),但它仍然比Math.sqrt().
  • 二元切割甚至更慢。这是有道理的,因为二进制斩波平均需要 16 次才能找到 64 位数字的平方根。
  • 根据 John 的测试,or在 C++ 中using语句比使用 a 更快switch,但在 Java 和 C# 中,or和之间似乎没有区别switch
  • 我还尝试制作一个查找表(作为 64 个布尔值的私有静态数组)。然后,而不是 switch 或or语句,我只想说if(lookup[(int)(n&0x3F)]) { test } else return false;. 令我惊讶的是,这(只是稍微)慢了一点。这是因为在 Java 中检查了数组边界

采纳答案by A. Rex

I figured out a method that works ~35% faster than your 6bits+Carmack+sqrt code, at least with my CPU (x86) and programming language (C/C++). Your results may vary, especially because I don't know how the Java factor will play out.

我找到了一种比 6bits+Carmack+sqrt 代码快 35% 的方法,至少在我的 CPU (x86) 和编程语言 (C/C++) 中是这样。您的结果可能会有所不同,尤其是因为我不知道 Java 因素将如何发挥作用。

My approach is threefold:

我的方法有三点:

  1. First, filter out obvious answers. This includes negative numbers and looking at the last 4 bits. (I found looking at the last six didn't help.) I also answer yes for 0. (In reading the code below, note that my input is int64 x.)
    if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
        return false;
    if( x == 0 )
        return true;
  2. Next, check if it's a square modulo 255 = 3 * 5 * 17. Because that's a product of three distinct primes, only about 1/8 of the residues mod 255 are squares. However, in my experience, calling the modulo operator (%) costs more than the benefit one gets, so I use bit tricks involving 255 = 2^8-1 to compute the residue. (For better or worse, I am not using the trick of reading individual bytes out of a word, only bitwise-and and shifts.)
    int64 y = x;
    y = (y & 4294967295LL) + (y >> 32); 
    y = (y & 65535) + (y >> 16);
    y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
    // At this point, y is between 0 and 511.  More code can reduce it farther.
    
    To actually check if the residue is a square, I look up the answer in a precomputed table.
    if( bad255[y] )
        return false;
    // However, I just use a table of size 512
    
  3. Finally, try to compute the square root using a method similar to Hensel's lemma. (I don't think it's applicable directly, but it works with some modifications.) Before doing that, I divide out all powers of 2 with a binary search:
    if((x & 4294967295LL) == 0)
        x >>= 32;
    if((x & 65535) == 0)
        x >>= 16;
    if((x & 255) == 0)
        x >>= 8;
    if((x & 15) == 0)
        x >>= 4;
    if((x & 3) == 0)
        x >>= 2;
    At this point, for our number to be a square, it must be 1 mod 8.
    if((x & 7) != 1)
        return false;
    The basic structure of Hensel's lemma is the following. (Note: untested code; if it doesn't work, try t=2 or 8.)
    int64 t = 4, r = 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    // Repeat until t is 2^33 or so.  Use a loop if you want.
    The idea is that at each iteration, you add one bit onto r, the "current" square root of x; each square root is accurate modulo a larger and larger power of 2, namely t/2. At the end, r and t/2-r will be square roots of x modulo t/2. (Note that if r is a square root of x, then so is -r. This is true even modulo numbers, but beware, modulo some numbers, things can have even more than 2 square roots; notably, this includes powers of 2.) Because our actual square root is less than 2^32, at that point we can actually just check if r or t/2-r are real square roots. In my actual code, I use the following modified loop:
    int64 r, t, z;
    r = start[(x >> 3) & 1023];
    do {
        z = x - r * r;
        if( z == 0 )
            return true;
        if( z < 0 )
            return false;
        t = z & (-z);
        r += (z & t) >> 1;
        if( r > (t >> 1) )
            r = t - r;
    } while( t <= (1LL << 33) );
    The speedup here is obtained in three ways: precomputed start value (equivalent to ~10 iterations of the loop), earlier exit of the loop, and skipping some t values. For the last part, I look at z = r - x * x, and set t to be the largest power of 2 dividing z with a bit trick. This allows me to skip t values that wouldn't have affected the value of r anyway. The precomputed start value in my case picks out the "smallest positive" square root modulo 8192.
  1. 首先,过滤掉明显的答案。这包括负数和查看最后 4 位。(我发现查看最后六个没有帮助。)我也回答是 0。(在阅读下面的代码时,请注意我的输入是int64 x。)
    if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
        return false;
    if( x == 0 )
        return true;
  2. 接下来,检查它是否是模 255 = 3 * 5 * 17 的平方。因为这是三个不同素数的乘积,所以只有大约 1/8 的余数 mod 255 是平方。但是,根据我的经验,调用模运算符 (%) 的成本高于获得的收益,因此我使用涉及 255 = 2^8-1 的小技巧来计算残差。(无论好坏,我没有使用从单词中读取单个字节的技巧,只是按位和和移位。)
    int64 y = x;
    y = (y & 4294967295LL) + (y >> 32); 
    y = (y & 65535) + (y >> 16);
    y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
    // At this point, y is between 0 and 511.  More code can reduce it farther.
    
    为了实际检查残差是否为正方形,我在预先计算的表格中查找答案。
    if( bad255[y] )
        return false;
    // However, I just use a table of size 512
    
  3. 最后,尝试使用类似于Hensel 引理的方法计算平方根。(我认为它不能直接适用,但经过一些修改后可以使用。)在此之前,我使用二分查找来划分 2 的所有幂:
    if((x & 4294967295LL) == 0)
        x >>= 32;
    if((x & 65535) == 0)
        x >>= 16;
    if((x & 255) == 0)
        x >>= 8;
    if((x & 15) == 0)
        x >>= 4;
    if((x & 3) == 0)
        x >>= 2;
    此时,对于我们的数字是一个正方形,它必须是 1 mod 8。
    if((x & 7) != 1)
        return false;
    Hensel 引理的基本结构如下。(注意:未经测试的代码;如果不起作用,请尝试 t=2 或 8。)
    int64 t = 4, r = 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    t <<= 1; r += ((x - r * r) & t) >> 1;
    // Repeat until t is 2^33 or so.  Use a loop if you want.
    这个想法是在每次迭代时,你在 r 上加一位,即 x 的“当前”平方根;每个平方根都是精确模一个越来越大的 2 的幂,即 t/2。最后,r 和 t/2-r 将是 x 模 t/2 的平方根。(请注意,如果 r 是 x 的平方根,那么 -r 也是如此。即使是模数也是如此,但要注意,对某些数字进行模数,事物甚至可以有 2 个以上的平方根;值得注意的是,这包括 2 的幂。 ) 因为我们的实际平方根小于 2^32,此时我们实际上可以检查 r 或 t/2-r 是否为实数平方根。在我的实际代码中,我使用了以下修改后的循环:
    int64 r, t, z;
    r = start[(x >> 3) & 1023];
    do {
        z = x - r * r;
        if( z == 0 )
            return true;
        if( z < 0 )
            return false;
        t = z & (-z);
        r += (z & t) >> 1;
        if( r > (t >> 1) )
            r = t - r;
    } while( t <= (1LL << 33) );
    这里的加速是通过三种方式获得的:预先计算的起始值(相当于循环的约 10 次迭代)、更早退出循环和跳过一些 t 值。对于最后一部分,我查看了z = r - x * x,并通过一些技巧将 t 设置为 2 除 z 的最大幂。这允许我跳过无论如何都不会影响 r 值的 t 值。在我的情况下,预先计算的起始值挑选出“最小正”平方根模 8192。

Even if this code doesn't work faster for you, I hope you enjoy some of the ideas it contains. Complete, tested code follows, including the precomputed tables.

即使此代码对您来说不能更快地工作,我希望您喜欢它包含的一些想法。完整的、经过测试的代码如下,包括预先计算的表。

typedef signed long long int int64;

int start[1024] =
{1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11,
1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203,
129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395,
1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587,
257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779,
1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971,
385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163,
1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355,
513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547,
1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739,
641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931,
1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973,
769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781,
1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589,
897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397,
1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205,
1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013,
959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821,
1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629,
831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437,
1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245,
703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53,
1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139,
575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331,
1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523,
447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715,
1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907,
319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099,
1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291,
191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483,
1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675,
63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867,
2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037,
65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845,
1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653,
193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461,
1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269,
321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077,
1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885,
449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693,
1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501,
577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309,
1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117,
705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75,
1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267,
833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459,
1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651,
961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843,
1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035,
1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227,
895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419,
1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611,
767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803,
1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995,
639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909,
1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717,
511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525,
1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333,
383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141,
1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949,
255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757,
1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565,
127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373,
1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181};

bool bad255[512] =
{0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
 0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
 0,0};

inline bool square( int64 x ) {
    // Quickfail
    if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
        return false;
    if( x == 0 )
        return true;

    // Check mod 255 = 3 * 5 * 17, for fun
    int64 y = x;
    y = (y & 4294967295LL) + (y >> 32);
    y = (y & 65535) + (y >> 16);
    y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
    if( bad255[y] )
        return false;

    // Divide out powers of 4 using binary search
    if((x & 4294967295LL) == 0)
        x >>= 32;
    if((x & 65535) == 0)
        x >>= 16;
    if((x & 255) == 0)
        x >>= 8;
    if((x & 15) == 0)
        x >>= 4;
    if((x & 3) == 0)
        x >>= 2;

    if((x & 7) != 1)
        return false;

    // Compute sqrt using something like Hensel's lemma
    int64 r, t, z;
    r = start[(x >> 3) & 1023];
    do {
        z = x - r * r;
        if( z == 0 )
            return true;
        if( z < 0 )
            return false;
        t = z & (-z);
        r += (z & t) >> 1;
        if( r > (t  >> 1) )
            r = t - r;
    } while( t <= (1LL << 33) );

    return false;
}

回答by Celestial M Weasel

If you want speed, given that your integers are of finite size, I suspect that the quickest way would involve (a) partitioning the parameters by size (e.g. into categories by largest bit set), then checking the value against an array of perfect squares within that range.

如果您想要速度,鉴于您的整数大小有限,我怀疑最快的方法将涉及(a)按大小对参数进行分区(例如,按最大位集划分类别),然后根据完美平方数组检查该值在那个范围内。

回答by chakrit

I was thinking about the horrible times I've spent in Numerical Analysis course.

我在想我在数值分析课程中度过的可怕时光。

And then I remember, there was this function circling around the 'net from the Quake Source code:

然后我记得,从 Quake 源代码中,有这个函数在 'net 周围盘旋:

float Q_rsqrt( float number )
{
  long i;
  float x2, y;
  const float threehalfs = 1.5F;

  x2 = number * 0.5F;
  y  = number;
  i  = * ( long * ) &y;  // evil floating point bit level hacking
  i  = 0x5f3759df - ( i >> 1 ); // wtf?
  y  = * ( float * ) &i;
  y  = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
  // y  = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed

  #ifndef Q3_VM
  #ifdef __linux__
    assert( !isnan(y) ); // bk010122 - FPE?
  #endif
  #endif
  return y;
}

Which basically calculates a square root, using Newton's approximation function (cant remember the exact name).

基本上计算平方根,使用牛顿的近似函数(不记得确切的名称)。

It should be usable and might even be faster, it's from one of the phenomenal id software's game!

它应该是可用的,甚至可能更快,它来自一款出色的 id 软件游戏!

It's written in C++ but it should not be too hard to reuse the same technique in Java once you get the idea:

它是用 C++ 编写的,但是一旦你有了这个想法,在 Java 中重用相同的技术应该不会太难:

I originally found it at: http://www.codemaestro.com/reviews/9

我最初在以下位置找到它:http: //www.codemaestro.com/reviews/9

Newton's method explained at wikipedia: http://en.wikipedia.org/wiki/Newton%27s_method

维基百科解释的牛顿方法:http: //en.wikipedia.org/wiki/Newton%27s_method

You can follow the link for more explanation of how it works, but if you don't care much, then this is roughly what I remember from reading the blog and from taking the Numerical Analysis course:

您可以点击链接以获取有关其工作原理的更多说明,但如果您不太在意,那么这大致是我从阅读博客和参加数值分析课程中记得的内容:

  • the * (long*) &yis basically a fast convert-to-long function so integer operations can be applied on the raw bytes.
  • the 0x5f3759df - (i >> 1);line is a pre-calculated seed value for the approximation function.
  • the * (float*) &iconverts the value back to floating point.
  • the y = y * ( threehalfs - ( x2 * y * y ) )line bascially iterates the value over the function again.
  • * (long*) &y基本上是一个快速转换到长功能,所以整数运算可以在原始字节来施加。
  • 0x5f3759df - (i >> 1);线是近似函数的预先计算的种子值。
  • * (float*) &i值转换回浮点。
  • y = y * ( threehalfs - ( x2 * y * y ) )行基本上再次迭代该函数的值。

The approximation function gives more precise values the more you iterate the function over the result. In Quake's case, one iteration is "good enough", but if it wasn't for you... then you could add as much iteration as you need.

近似函数给出的值越精确,您在结果上迭代函数的次数越多。在 Quake 的情况下,一次迭代“足够好”,但如果它不适合您……那么您可以根据需要添加尽可能多的迭代。

This should be faster because it reduces the number of division operations done in naive square rooting down to a simple divide by 2 (actually a * 0.5Fmultiply operation) and replace it with a few fixed number of multiplication operations instead.

这应该更快,因为它将朴素平方根中的除法运算次数减少到简单的除以 2(实际上是* 0.5F乘法运算),并用一些固定数量的乘法运算代替它。

回答by Jon Skeet

If you do a binary chop to try to find the "right" square root, you can fairly easily detect if the value you've got is close enough to tell:

如果你做一个二分法来试图找到“正确”的平方根,你可以很容易地检测你得到的值是否足够接近:

(n+1)^2 = n^2 + 2n + 1
(n-1)^2 = n^2 - 2n + 1

So having calculated n^2, the options are:

因此,计算后n^2,选项是:

  • n^2 = target: done, return true
  • n^2 + 2n + 1 > target > n^2: you're close, but it's not perfect: return false
  • n^2 - 2n + 1 < target < n^2: ditto
  • target < n^2 - 2n + 1: binary chop on a lower n
  • target > n^2 + 2n + 1: binary chop on a higher n
  • n^2 = target: 完成,返回真
  • n^2 + 2n + 1 > target > n^2:你很接近,但它并不完美:返回假
  • n^2 - 2n + 1 < target < n^2: 同上
  • target < n^2 - 2n + 1: 在较低的位置上进行二进制斩波 n
  • target > n^2 + 2n + 1: 更高的二进制斩波 n

(Sorry, this uses nas your current guess, and targetfor the parameter. Apologise for the confusion!)

(抱歉,这n用作您当前的猜测和target参数。为造成混淆道歉!)

I don't know whether this will be faster or not, but it's worth a try.

我不知道这是否会更快,但值得一试。

EDIT: The binary chop doesn't have to take in the whole range of integers, either (2^x)^2 = 2^(2x), so once you've found the top set bit in your target (which can be done with a bit-twiddling trick; I forget exactly how) you can quickly get a range of potential answers. Mind you, a naive binary chop is still only going to take up to 31 or 32 iterations.

编辑:二进制斩波也不必考虑整个整数范围(2^x)^2 = 2^(2x),所以一旦你在目标中找到了最高的设置位(这可以通过一个位处理技巧来完成;我忘记了具体是怎么做的)您可以快速获得一系列可能的答案。请注意,一个简单的二进制切割仍然只需要 31 或 32 次迭代。

回答by Kibbee

I'm not sure if it would be faster, or even accurate, but you could use John Carmack's Magical Square Root, algorithm to solve the square root faster. You could probably easily test this for all possible 32 bit integers, and validate that you actually got correct results, as it's only an appoximation. However, now that I think about it, using doubles is approximating also, so I'm not sure how that would come into play.

我不确定它是否会更快,甚至更准确,但是您可以使用John Carmack 的 Magical Square Root算法来更快地求解平方根。您可能可以轻松测试所有可能的 32 位整数,并验证您实际上得到了正确的结果,因为这只是一个近似值。但是,现在我考虑了一下,使用双打也是近似的,所以我不确定这会如何发挥作用。

回答by Joel Coehoorn

Don't know about fastest, but the simplest is to take the square root in the normal fashion, multiply the result by itself, and see if it matches your original value.

不知道最快,但最简单的是以正常方式取平方根,将结果乘以自身,看看它是否与您的原始值匹配。

Since we're talking integers here, the fasted would probably involve a collection where you can just make a lookup.

由于我们在这里谈论的是整数,因此禁食可能会涉及一个集合,您可以在其中进行查找。

回答by John D. Cook

You'll have to do some benchmarking. The best algorithm will depend on the distribution of your inputs.

你必须做一些基准测试。最好的算法将取决于您输入的分布。

Your algorithm may be nearly optimal, but you might want to do a quick check to rule out some possibilities before calling your square root routine. For example, look at the last digit of your number in hex by doing a bit-wise "and." Perfect squares can only end in 0, 1, 4, or 9 in base 16, So for 75% of your inputs (assuming they are uniformly distributed) you can avoid a call to the square root in exchange for some very fast bit twiddling.

您的算法可能接近最优,但您可能希望在调用平方根例程之前进行快速检查以排除某些可能性。例如,通过按位“和”查看十六进制数字的最后一位数字。完美平方只能以 0、1、4 或 9 为底,以 16 为底,因此对于 75% 的输入(假设它们是均匀分布的),您可以避免调用平方根以换取一些非常快速的位运算。

Kip benchmarked the following code implementing the hex trick. When testing numbers 1 through 100,000,000, this code ran twice as fast as the original.

Kip 对以下实现十六进制技巧的代码进行了基准测试。在测试数字 1 到 100,000,000 时,此代码的运行速度是原始代码的两倍。

public final static boolean isPerfectSquare(long n)
{
    if (n < 0)
        return false;

    switch((int)(n & 0xF))
    {
    case 0: case 1: case 4: case 9:
        long tst = (long)Math.sqrt(n);
        return tst*tst == n;

    default:
        return false;
    }
}

When I tested the analogous code in C++, it actually ran slower than the original. However, when I eliminated the switch statement, the hex trick once again make the code twice as fast.

当我在 C++ 中测试类似的代码时,它实际上比原始代码运行得慢。然而,当我去掉 switch 语句时,十六进制技巧再次使代码速度提高了一倍。

int isPerfectSquare(int n)
{
    int h = n & 0xF;  // h is the last hex "digit"
    if (h > 9)
        return 0;
    // Use lazy evaluation to jump out of the if statement as soon as possible
    if (h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8)
    {
        int t = (int) floor( sqrt((double) n) + 0.5 );
        return t*t == n;
    }
    return 0;
}

Eliminating the switch statement had little effect on the C# code.

消除 switch 语句对 C# 代码的影响很小。

回答by Bill the Lizard

It should be much faster to use Newton's methodto calculate the Integer Square Root, then square this number and check, as you do in your current solution. Newton's method is the basis for the Carmack solution mentioned in some other answers. You should be able to get a faster answer since you're only interested in the integer part of the root, allowing you to stop the approximation algorithm sooner.

使用牛顿方法计算Integer Square Root,然后对这个数字进行平方并检查应该会快得多,就像您在当前解决方案中所做的那样。Newton 的方法是其他一些答案中提到的 Carmack 解决方案的基础。您应该能够获得更快的答案,因为您只对根的整数部分感兴趣,从而可以更快地停止近似算法。

Another optimization that you can try: If the Digital Rootof a number doesn't end in 1, 4, 7, or 9 the number is nota perfect square. This can be used as a quick way to eliminate 60% of your inputs before applying the slower square root algorithm.

您可以尝试的另一种优化:如果数字的数字根不是以 1、4、7或 9 结尾,则该数字不是完全平方数。这可以用作在应用较慢的平方根算法之前消除 60% 输入的快速方法。

回答by Elijah

If speed is a concern, why not partition off the most commonly used set of inputs and their values to a lookup table and then do whatever optimized magic algorithm you have come up with for the exceptional cases?

如果速度是一个问题,为什么不将最常用的一组输入及其值划分到查找表中,然后针对特殊情况执行您想出的任何优化魔术算法?

回答by mrzl

I want this function to work with all positive 64-bit signed integers

我希望这个函数可以处理所有的 64 位有符号整数

Math.sqrt()works with doubles as input parameters, so you won't get accurate results for integers bigger than 2^53.

Math.sqrt()使用 doubles 作为输入参数,因此对于大于2^53 的整数,您将无法获得准确的结果。

回答by Hugh Allen

It's been pointed out that the last ddigits of a perfect square can only take on certain values. The last ddigits (in base b) of a number nis the same as the remainder when nis divided by bd, ie. in C notation n % pow(b, d).

有人指出,d完美正方形的最后一位数字只能取某些值。d数字的最后一位数字(以基数表示b)与被除以n的余数相同,即。在 C 符号中。nbdn % pow(b, d)

This can be generalized to any modulus m, ie. n % mcan be used to rule out some percentage of numbers from being perfect squares. The modulus you are currently using is 64, which allows 12, ie. 19% of remainders, as possible squares. With a little coding I found the modulus 110880, which allows only 2016, ie. 1.8% of remainders as possible squares. So depending on the cost of a modulus operation (ie. division) and a table lookup versus a square root on your machine, using this modulus might be faster.

这可以推广到任何模数m,即。n % m可用于从完全平方数中排除某些百分比的数字。您当前使用的模数是 64,它允许 12,即。19% 的余数,作为可能的平方。通过一些编码,我发现了模数 110880,它只允许 2016,即。1.8% 的余数作为可能的平方。因此,根据模数运算(即除法)和计算机上的表查找与平方根的成本,使用此模数可能会更快。

By the way if Java has a way to store a packed array of bits for the lookup table, don't use it. 110880 32-bit words is not much RAM these days and fetching a machine word is going to be faster than fetching a single bit.

顺便说一句,如果 Java 有办法为查找表存储一组压缩的位数组,请不要使用它。如今,110880 个 32 位字的 RAM 并不多,获取机器字将比获取单个位更快。