C++ 纳秒到毫秒 - 快速除以 1000000

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1269994/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 19:24:07  来源:igfitidea点击:

nanoseconds to milliseconds - fast division by 1000000

c++gccsolarissparc

提问by Matt

I'm wanting to convert the output from gethrtime to milliseconds.

我想将输出从 gethrtime 转换为毫秒。

The obvious way to do this is to divide by 1000000. However, I'm doing this quite often and wonder if it could become a bottleneck.

显而易见的方法是除以 1000000。但是,我经常这样做,想知道它是否会成为瓶颈。

Is there an optimized divide operation when dealing with numbers like 1000000?

处理像 1000000 这样的数字时是否有优化的除法运算?

Note: Any code must be portable. I'm using gcc and this is generally on Sparc hardware

注意:任何代码都必须是可移植的。我正在使用 gcc,这通常在 Sparc 硬件上

Some quick testing using the code below... hope that is right.

使用下面的代码进行一些快速测试...希望是对的。

#include <sys/time.h>
#include <iostream>

using namespace std;

const double NANOSECONDS_TO_MILLISECONDS = 1.0 / 1000000.0;

int main()
{
    hrtime_t start;
    hrtime_t tmp;
    hrtime_t fin;

    start = gethrtime();
    tmp = (hrtime_t)(start * NANOSECONDS_TO_MILLISECONDS);
    fin = gethrtime();

    cout << "Method 1"
    cout << "Original val: " << start << endl;
    cout << "Computed: " << tmp << endl;
    cout << "Time:" << fin - start << endl;

    start = gethrtime();
    tmp = (start / 1000000);
    fin = gethrtime();

    cout "Method 2"    
    cout << "Original val: " << start << endl;
    cout << "Computed: " << tmp << endl;
    cout << "Time:" << fin - start << endl;

    return 0;
}  

Example outputs:

示例输出:

Original val: 3048161553965997
Computed: 3048161553
Time:82082
Original val: 3048161556359586
Computed: 3048161556
Time:31230

Original val: 3048239663018915
Computed: 3048239663
Time:79381
Original val: 3048239665393873
Computed: 3048239665
Time:31321

Original val: 3048249874282285
Computed: 3048249874
Time:81812
Original val: 3048249876664084
Computed: 3048249876
Time:34830

If this is correct, then the multiple by reciprocal is actually slower in this case. It's probably due to using floating point math instead of fixed point math. I will just stick to integer division then which still takes hardly any time at all.

如果这是正确的,那么在这种情况下,倒数的倍数实际上更慢。这可能是由于使用浮点数学而不是定点数学。我将坚持使用整数除法,这仍然几乎不需要任何时间。

回答by Josh Haberman

Let your compiler figure it out!

让你的编译器弄清楚!

Seriously, if you're really concerned about optimizations at this level (and you shouldn't be unless it shows up in a profile), you should get used to looking at your compiler's assembly language output. You will be amazed what the compiler is doing on your behalf.

说真的,如果您真的关心这个级别的优化(除非它出现在配置文件中,否则您不应该关心),您应该习惯于查看编译器的汇编语言输出。您会惊讶于编译器为您所做的事情。

All the people who are recommending math tricks either have bad compilers or they are underestimating their compilers. For example, try compiling this function:

所有推荐数学技巧的人要么编译器不好,要么低估了他们的编译器。例如,尝试编译这个函数:

unsigned long div1000000(unsigned long n) {
  return n / 1000000UL;
}

Compiled with gcc 4.3.3 on x86 (-O3, -fomit-frame-pointer), I get:

在 x86(-O3,-fomit-frame-pointer)上用 gcc 4.3.3 编译,我得到:

$ objdump -d div.o -M intel

test2.o:     file format elf32-i386


Disassembly of section .text:

00000000 <div1000000>:
   0:   b8 83 de 1b 43          mov    eax,0x431bde83
   5:   f7 64 24 04             mul    DWORD PTR [esp+0x4]
   9:   c1 ea 12                shr    edx,0x12
   c:   89 d0                   mov    eax,edx
   e:   c3                      ret    

In other words, the compiler took n / 1000000ULand turned it into (unsigned long long)(n * 0x431bde83) >> (0x12 + 32). Why does that work? Off the top of my head, I have no idea! But the compiler decided that it would be faster than issuing a native divide.

换句话说,编译器n / 1000000UL将其转换为(unsigned long long)(n * 0x431bde83) >> (0x12 + 32). 为什么这样做?在我的头顶上,我不知道!但是编译器决定它比发布本地除法要快。

Moral of the story:

故事的道德启示:

  • don't optimize this unless you're sure it's a bottleneck.
  • don't do fancy arithmetic (multiplying by the reciprocal, shifts, etc) unless you already know what your compiler is doing and you think you can beat it.
  • benchmark the result -- only leave in a wart like fancy bitmath if you have demonstrated that you've outdone your compiler.
  • 除非你确定这是一个瓶颈,否则不要优化它。
  • 不要做花哨的算术(乘以倒数、移位等),除非你已经知道你的编译器在做什么并且你认为你可以打败它。
  • 对结果进行基准测试——如果你已经证明你已经超越了你的编译器,那么只留下像花哨的位数学这样的缺陷。

回答by Robert Cartaino

Division is notan expensive operation. I doubt very much if a divide-by-1000000 operation will be anywhere near the main bottleneck in your application. Floating-point processors will be way faster than any sort of "tricks" you can come up with than just doing the single operation.

除法不是一项昂贵的操作。我非常怀疑除以 1000000 的操作是否会接近应用程序中的主要瓶颈。浮点处理器将比您想出的任何“技巧”都要快,而不仅仅是执行单个操作。

回答by Potatoswatter

I'm surprised nobody has gotten this yet…

我很惊讶还没有人得到这个…

  • division is the same as multiplication by a fraction
  • multiplying by a fractional power of 2 is fast: just bit-shift
  • integral division involves rounding down
  • rounding down is like multiplying by a slightly smaller fraction (up to a certain point, you need to be aware of your ranges)
  • 除法与乘以分数相同
  • 乘以 2 的分数幂很快:只是位移
  • 积分除法涉及四舍五入
  • 四舍五入就像乘以一个稍小的分数(直到某个点,你需要知道你的范围)

So,

所以,

const uint64_t numerator = (1LL<<32)/1000000;

...

...

millionths = ( number * numerator ) >> 32;

Supah fast!

苏帕快!

回答by TheJacobTaylor

Multiply by 1/1,000,000. It should be faster. My Google search was saying to speed up divisions, multiply be the reciprocal. So I would pre-calculate the reciprocal or a list of reciprocals if there is a relatively known set of possible values, and then multiply.

乘以 1/1,000,000。它应该更快。我的谷歌搜索说要加快除法,乘以互惠。因此,如果有一组相对已知的可能值,我会预先计算倒数或倒数列表,然后相乘。

Jacob

雅各布

回答by e.James

However, I'm doing this quite often and wonder if it could become a bottleneck.

但是,我经常这样做,想知道它是否会成为瓶颈。

First things first. If you think this will be a bottleneck, profilethe code in question and find out for sure.

先说第一件事。如果您认为这将是一个瓶颈,请分析有问题的代码并确定。

If, (and only if) this is your bottleneck, then work on improving it.

如果(且仅当)这是您的瓶颈,则努力改进它。

Now, on to your improvement options:

现在,谈谈您的改进选项:

1.You may not need to convert to milliseconds right away. If you are simply collecting data, just store the full 64-bit number returned from gethrtime()and be done with it. Anything that a human needs to read can be post-processed at a later time, or on a much less agressive update frequency.

1.您可能不需要立即转换为毫秒。如果您只是收集数据,只需存储从中返回的完整 64 位数字gethrtime()并完成。人类需要阅读的任何内容都可以在以后进行后期处理,或者以不太激进的更新频率进行后期处理。

2.If you are timing some repetitive event, you can try performing the division on the differencebetween two calls, which should be verysmall if you are calling gethrtime()often enough to have a bottleneck:

2.如果您正在计时一些重复事件,您可以尝试对两个调用之间的差异执行除法,如果您调用的频率足够高而出现瓶颈,则该差异应该非常gethrtime()

static hrtime_t oldtime;
hrtime_t newtime = gethrtime();
int milliseconds = fastDivByOneMillion((UI32)(newtime - oldtime));
oldtime = newtime;

3.You can implement fastDivByOneMillion()as a multiplication and a division by a power of 2:

3.您可以fastDivByOneMillion()将乘法和除法实现为 2 的幂:

int fastDivByOneMillion(UI32 nanoseconds)
{
    return (int)((UI64)nanoseconds * 4295 >> 32);
}

Notes:

笔记:

  • Your compiler can figure out the best way to do >> 32on your hardware. Most of the time, this will be only one or two clock cylces.
  • 您的编译器可以找出>> 32在您的硬件上执行的最佳方法。大多数情况下,这只会是一两个时钟周期。
  • I used UI32and UI64to represent 32 and 64-bit unsigned numbers.
  • 我使用UI32andUI64来表示 32 位和 64 位无符号数。
  • All of this will require more profiling to be sure that it is actually producing a measurable improvement.

  • 所有这些都需要更多的分析,以确保它确实产生了可衡量的改进。

  • 回答by Michael Burr

    As Joshua Haberman mentioned, your compiler will probably already convert the division by a constant 1000000 to a multiply by a 'magic number' followed by a shift (if the division is an integer operation). You can get more details of what's going on in Henry Warren's "Hacker's Delight" book and at the companion website:

    正如Joshua Haberman 提到的,您的编译器可能已经将除法除以常数 1000000 转换为乘以“幻数”,然后是移位(如果除法是整数运算)。您可以在 Henry Warren 的“Hacker's Delight”一书中和配套网站上获得更多关于正在发生的事情的详细信息:

    He even has a page that has a Javascript calculator for the magic numbers:

    他甚至有一个页面,上面有一个用于计算幻数的 Javascript 计算器:

    回答by jalf

    First, the obvious disclaimer: Unless you perform the division a couple of million times per second at least, it won't be a bottleneck, and you should just leave it. Premature optimization and all that.

    首先,明显的免责声明:除非您至少每秒执行几百万次除法,否则它不会成为瓶颈,您应该离开它。过早的优化等等。

    Second, how accurate do you need the result to be? A handy rule of thumb for converting between binary and decimal is that 2^10 ~= 10^3.

    其次,你需要结果有多准确?在二进制和十进制之间转换的一个方便的经验法则是 2^10 ~= 10^3。

    In other words, a million is roughly equal to 2^20. So you could just right shift 20. The compiler won't do this for you automatically, of course, because it changes the result. But if you're willing to live with the slight accuracy, andthe division is actually a realperformance problem, this would be my suggestion.

    换句话说,一百万大致等于 2^20。所以你可以右移 20。当然,编译器不会自动为你做这件事,因为它改变了结果。但是,如果您愿意接受轻微的准确性,并且除法实际上是一个真正的性能问题,那么这就是我的建议。

    回答by teambob

    1/1000000 is 0.000000000000000000 0100 0011 0001 1011 1101 1110 1000 0010 1101 0111 1011 0110 0011 01 binary - which is 0x431BDE82 * 2^-18

    1/1000000 是 0.00000000000000000 0100 0011 0001 1011 1101 1110 1000 0010 1101 0111 1011 0110 0031 0110 0031 0*18 二进制 *12 *12

    Therefore n/1000000 is equivalent to (n*0x431BDE82)>>18

    因此 n/1000000 等价于 (n*0x431BDE82)>>18

    Also n/1000000 is equivalent to (n*0x8637BD04)>>19

    同样 n/1000000 相当于 (n*0x8637BD04)>>19

    Note that this is a "fixed point" calculation and you should be aware that precision could be lost.

    请注意,这是一个“定点”计算,您应该知道精度可能会丢失。

    回答by Greg Smith

    It's possible to transform integer division into a series of simpler operations. A generic method to do that popularized by Terje Mathisen is outlined on page 136 of Optimizing subroutines in assembly language. If you know in advance the width of your data types and what you're dividing by, that will lead you through how to turn that into a serious of simpler operations that in theory might be faster than a more generic division operation that has to handle any divisor. There will still be some platform issues to be concerned about if you're worried about differently sized integers on some of them.

    可以将整数除法转换为一系列更简单的运算。Terje Mathisen 推广的一种通用方法在Optimizing subroutines in assembly language 的第 136 页中概述 。如果你事先知道你的数据类型的宽度和你除以什么,这将引导你了解如何将它变成一系列更简单的操作,理论上这些操作可能比必须处理的更通用的除法操作更快任何除数。如果您担心某些平台上的整数大小不同,仍然会有一些平台问题需要关注。

    Unless you're actually programming this in assembly language, I'd bet against you actually improving anything in the process over the SPARC divide implementation. Maybe if you're using a positively ancient SPARC V7 processor, from before division was implemented in hardware, you might get some improvement, but even then I'd bet on the built-in division being faster.

    除非你真的用汇编语言编程,否则我打赌你不会在 SPARC 除法实现的过程中改进任何东西。也许如果您使用的是非常古老的 SPARC V7 处理器,从在硬件实现除法之前开始,您可能会得到一些改进,但即便如此,我还是打赌内置除法会更快。

    Regardless, I suspect you've launched into a bit of premature optimization here though. You should start here by profiling the app you've got before presuming this division has any significant impact on its runtime, and you should similarly profile any change to the division to prove it works as expected. It's quite easy to get code that you think will execute faster but actually doesn't nowadays, given how complicated things like CPU caches have gotten.

    无论如何,我怀疑您在这里进行了一些过早的优化。您应该首先分析您拥有的应用程序,然后再假设该部门对其运行时间有任何重大影响,并且您应该类似地分析对该部门的任何更改,以证明它按预期工作。考虑到 CPU 缓存之类的事情变得多么复杂,很容易获得您认为会执行得更快但实际上现在不会的代码。

    回答by Marcin

    If you can get around with that, here's my solution.

    如果你能解决这个问题,这是我的解决方案。

    • use integers instead of floats (they arefaster)
    • divide by 1048576 by shifting bits to the right (that is cheaperthan anything on floats)
    • 使用整数,而不是浮点数(它们更快)
    • 通过向右移动位除以 1048576(这比浮点数上的任何东西都便宜

    and convince yourself that milliseconds should be base2 and not base10. ;-)

    并说服自己毫秒应该是base2而不是base10。;-)