使用 C++ 以纳秒为单位提供时间的计时器功能

Question

提问by gagneet

I wish to calculate the time it took for an API to return a value. The time taken for such an action is in the space of nano seconds. As the API is a C++ class/function, I am using the timer.h to caculate the same:

我希望计算 API 返回值所需的时间。执行此类操作所需的时间为纳秒级。由于 API 是一个 C++ 类/函数，我使用 timer.h 来计算相同的：

  #include <ctime>
  #include <cstdio>

  using namespace std;

  int main(int argc, char** argv) {

      clock_t start;
      double diff;
      start = clock();
      diff = ( std::clock() - start ) / (double)CLOCKS_PER_SEC;
      cout<<"printf: "<< diff <<'\n';

      return 0;
  }

The above code gives the time in seconds. How do I get the same in nano seconds and with more precision?

上面的代码给出了以秒为单位的时间。如何在纳秒内更精确地获得相同的结果？

Answer 1

采纳答案by grieve

What others have posted about running the function repeatedly in a loop is correct.

其他人发布的关于在循环中重复运行该函数的内容是正确的。

For Linux (and BSD) you want to use clock_gettime().

对于 Linux（和 BSD），您想使用clock_gettime()。

#include <sys/time.h>

int main()
{
   timespec ts;
   // clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
   clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
}

For windows you want to use the QueryPerformanceCounter. And here is more on QPC

对于要使用QueryPerformanceCounter 的窗口。这里有更多关于QPC

Apparently there is a known issuewith QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:

显然，某些芯片组上的 QPC存在已知问题，因此您可能需要确保没有这些芯片组。此外，某些双核 AMD 也可能会导致问题。参见 sebbbi 的第二篇文章，他指出：

QueryPerformanceCounter() and QueryPerformanceFrequency() offer a bit better resolution, but have different issues. For example in Windows XP, all AMD Athlon X2 dual core CPUs return the PC of either of the cores "randomly" (the PC sometimes jumps a bit backwards), unless you specially install AMD dual core driver package to fix the issue. We haven't noticed any other dual+ core CPUs having similar issues (p4 dual, p4 ht, core2 dual, core2 quad, phenom quad).

QueryPerformanceCounter() 和 QueryPerformanceFrequency() 提供了更好的分辨率，但有不同的问题。例如在 Windows XP 中，所有 AMD Athlon X2 双核 CPU 都会“随机”返回任一核的 PC（PC 有时会向后跳一点），除非您专门安装 AMD 双核驱动程序包来解决此问题。我们还没有注意到任何其他双核 CPU 有类似问题（p4 dual、p4 ht、core2 dual、core2 quad、phenom quad）。

EDIT 2013/07/16:

编辑 2013/07/16：

It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http://msdn.microsoft.com/en-us/library/windows/desktop/ee417693(v=vs.85).aspx

如http://msdn.microsoft.com/en-us/library/windows/desktop/ee417693(v=vs.85).aspx 中所述，似乎在某些情况下 QPC 的功效存在一些争议

...While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for multiple processors, bugs in the BIOS or drivers may result in these routines returning different values as the thread moves from one processor to another...

...虽然 QueryPerformanceCounter 和 QueryPerformanceFrequency 通常针对多个处理器进行调整，但 BIOS 或驱动程序中的错误可能会导致这些例程在线程从一个处理器移动到另一个处理器时返回不同的值...

However this StackOverflow answer https://stackoverflow.com/a/4588605/34329states that QPC should work fine on any MS OS after Win XP service pack 2.

然而，这个 StackOverflow 回答https://stackoverflow.com/a/4588605/34329指出 QPC 在 Win XP service pack 2 之后应该可以在任何 MS 操作系统上正常工作。

This article shows that Windows 7 can determine if the processor(s) have an invariant TSC and falls back to an external timer if they don't. http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.htmlSynchronizing across processors is still an issue.

本文表明，Windows 7 可以确定处理器是否具有不变的 TSC，如果没有，则回退到外部计时器。http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html跨处理器同步仍然是一个问题。

回答by Howard Hinnant

This new answer uses C++11's <chrono>facility. While there are other answers that show how to use <chrono>, none of them shows how to use <chrono>with the RDTSCfacility mentioned in several of the other answers here. So I thought I would show how to use RDTSCwith <chrono>. Additionally I'll demonstrate how you can templatize the testing code on the clock so that you can rapidly switch between RDTSCand your system's built-in clock facilities (which will likely be based on clock(), clock_gettime()and/or QueryPerformanceCounter.

这个新答案使用了 C++11 的<chrono>功能。虽然还有其他答案显示了如何使用<chrono>，但没有一个显示如何使用此处其他几个答案中提到<chrono>的RDTSC工具。所以我想我会展示如何使用RDTSCwith <chrono>。此外，我将演示如何对时钟上的测试代码进行模板化，以便您可以在RDTSC系统的内置时钟设施（可能基于clock(),clock_gettime()和/或QueryPerformanceCounter.

Note that the RDTSCinstruction is x86-specific. QueryPerformanceCounteris Windows only. And clock_gettime()is POSIX only. Below I introduce two new clocks: std::chrono::high_resolution_clockand std::chrono::system_clock, which, if you can assume C++11, are now cross-platform.

请注意，该RDTSC指令是特定于 x86 的。 QueryPerformanceCounter仅适用于 Windows。并且clock_gettime()仅适用于 POSIX。下面我介绍两个新时钟：std::chrono::high_resolution_clockand std::chrono::system_clock，如果你能假设 C++11，它们现在是跨平台的。

First, here is how you create a C++11-compatible clock out of the Intel rdtscassembly instruction. I'll call it x::clock:

首先，这里是如何从英特尔rdtsc汇编指令中创建一个 C++11 兼容的时钟。我会称之为x::clock：

#include <chrono>

namespace x
{

struct clock
{
    typedef unsigned long long                 rep;
    typedef std::ratio<1, 2'800'000'000>       period; // My machine is 2.8 GHz
    typedef std::chrono::duration<rep, period> duration;
    typedef std::chrono::time_point<clock>     time_point;
    static const bool is_steady =              true;

    static time_point now() noexcept
    {
        unsigned lo, hi;
        asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
        return time_point(duration(static_cast<rep>(hi) << 32 | lo));
    }
};

}  // x

All this clock does is count CPU cycles and store it in an unsigned 64-bit integer. You may need to tweak the assembly language syntax for your compiler. Or your compiler may offer an intrinsic you can use instead (e.g. now() {return __rdtsc();}).

这个时钟所做的就是计算 CPU 周期并将其存储在一个无符号的 64 位整数中。您可能需要调整编译器的汇编语言语法。或者您的编译器可能会提供您可以使用的内在函数（例如now() {return __rdtsc();}）。

To build a clock you have to give it the representation (storage type). You must also supply the clock period, which must be a compile time constant, even though your machine may change clock speed in different power modes. And from those you can easily define your clock's "native" time duration and time point in terms of these fundamentals.

要构建时钟，您必须为其提供表示（存储类型）。您还必须提供时钟周期，它必须是编译时间常数，即使您的机器可能会在不同的功耗模式下更改时钟速度。从这些中，您可以根据这些基本原理轻松定义时钟的“本机”持续时间和时间点。

If all you want to do is output the number of clock ticks, it doesn't really matter what number you give for the clock period. This constant only comes into play if you want to convert the number of clock ticks into some real-time unit such as nanoseconds. And in that case, the more accurate you are able to supply the clock speed, the more accurate will be the conversion to nanoseconds, (milliseconds, whatever).

如果您只想输出时钟滴答的数量，那么您为时钟周期提供的数字并不重要。只有当您想将时钟滴答数转换为某个实时单位（例如纳秒）时，此常量才会起作用。在这种情况下，您能够提供的时钟速度越准确，转换为纳秒（毫秒，无论如何）的精度就越高。

Below is example code which shows how to use x::clock. Actually I've templated the code on the clock as I'd like to show how you can use many different clocks with the exact same syntax. This particular test is showing what the looping overhead is when running what you want to time under a loop:

下面是示例代码，显示了如何使用x::clock. 实际上，我已经对时钟上的代码进行了模板化，因为我想展示如何使用具有完全相同语法的许多不同时钟。这个特定的测试显示了在循环下运行您想要计时的循环开销是多少：

#include <iostream>

template <class clock>
void
test_empty_loop()
{
    // Define real time units
    typedef std::chrono::duration<unsigned long long, std::pico> picoseconds;
    // or:
    // typedef std::chrono::nanoseconds nanoseconds;
    // Define double-based unit of clock tick
    typedef std::chrono::duration<double, typename clock::period> Cycle;
    using std::chrono::duration_cast;
    const int N = 100000000;
    // Do it
    auto t0 = clock::now();
    for (int j = 0; j < N; ++j)
        asm volatile("");
    auto t1 = clock::now();
    // Get the clock ticks per iteration
    auto ticks_per_iter = Cycle(t1-t0)/N;
    std::cout << ticks_per_iter.count() << " clock ticks per iteration\n";
    // Convert to real time units
    std::cout << duration_cast<picoseconds>(ticks_per_iter).count()
              << "ps per iteration\n";
}

The first thing this code does is create a "real time" unit to display the results in. I've chosen picoseconds, but you can choose any units you like, either integral or floating point based. As an example there is a pre-made std::chrono::nanosecondsunit I could have used.

这段代码做的第一件事是创建一个“实时”单位来显示结果。我选择了皮秒，但您可以选择任何您喜欢的单位，无论是基于整数还是基于浮点。例如，有一个std::chrono::nanoseconds我可以使用的预制单元。

As another example I want to print out the average number of clock cycles per iteration as a floating point, so I create another duration, based on double, that has the same units as the clock's tick does (called Cyclein the code).

作为另一个示例，我想将每次迭代的平均时钟周期数打印为浮点数，因此我创建了另一个基于 double 的持续时间，它具有与时钟滴答相同的单位（Cycle在代码中调用）。

The loop is timed with calls to clock::now()on either side. If you want to name the type returned from this function it is:

循环通过调用clock::now()任一侧的来计时。如果你想命名从这个函数返回的类型，它是：

typename clock::time_point t0 = clock::now();

(as clearly shown in the x::clockexample, and is also true of the system-supplied clocks).

（如x::clock示例所示，系统提供的时钟也是如此）。

To get a duration in terms of floating point clock ticks one merely subtracts the two time points, and to get the per iteration value, divide that duration by the number of iterations.

要获得浮点时钟滴答的持续时间，只需减去两个时间点，并获得每次迭代的值，将该持续时间除以迭代次数。

You can get the count in any duration by using the count()member function. This returns the internal representation. Finally I use std::chrono::duration_castto convert the duration Cycleto the duration picosecondsand print that out.

您可以使用count()成员函数获取任何持续时间的计数。这将返回内部表示。最后，我使用std::chrono::duration_cast将持续时间转换为持续Cycle时间picoseconds并将其打印出来。

To use this code is simple:

使用此代码很简单：

int main()
{
    std::cout << "\nUsing rdtsc:\n";
    test_empty_loop<x::clock>();

    std::cout << "\nUsing std::chrono::high_resolution_clock:\n";
    test_empty_loop<std::chrono::high_resolution_clock>();

    std::cout << "\nUsing std::chrono::system_clock:\n";
    test_empty_loop<std::chrono::system_clock>();
}

Above I exercise the test using our home-made x::clock, and compare those results with using two of the system-supplied clocks: std::chrono::high_resolution_clockand std::chrono::system_clock. For me this prints out:

上面我使用我们自制的进行了测试x::clock，并将这些结果与使用系统提供的两个时钟进行比较： std::chrono::high_resolution_clock和std::chrono::system_clock。对我来说，这会打印出来：

Using rdtsc:
1.72632 clock ticks per iteration
616ps per iteration

Using std::chrono::high_resolution_clock:
0.620105 clock ticks per iteration
620ps per iteration

Using std::chrono::system_clock:
0.00062457 clock ticks per iteration
624ps per iteration

This shows that each of these clocks has a different tick period, as the ticks per iteration is vastly different for each clock. However when converted to a known unit of time (e.g. picoseconds), I get approximately the same result for each clock (your mileage may vary).

这表明这些时钟中的每一个都有不同的滴答周期，因为每个时钟的每次迭代的滴答声都大不相同。但是，当转换为已知的时间单位（例如皮秒）时，每个时钟的结果大致相同（您的里程可能会有所不同）。

Note how my code is completely free of "magic conversion constants". Indeed, there are only two magic numbers in the entire example:

请注意我的代码如何完全没有“魔术转换常量”。事实上，整个例子中只有两个幻数：

The clock speed of my machine in order to define x::clock.
The number of iterations to test over. If changing this number makes your results vary greatly, then you should probably make the number of iterations higher, or empty your computer of competing processes while testing.

我机器的时钟速度以定义x::clock.
要测试的迭代次数。如果更改此数字会使您的结果差异很大，那么您可能应该增加迭代次数，或者在测试时清空计算机中的竞争进程。

Answer 3

回答by VonC

With that level of accuracy, it would be better to reason in CPU tick rather than in system call like clock(). And do not forget that if it takes more than one nanosecond to execute an instruction... having a nanosecond accuracy is pretty much impossible.

有了这种准确度，最好在 CPU 滴答中进行推理，而不是在诸如 clock() 之类的系统调用中进行推理。并且不要忘记，如果执行一条指令所需的时间超过一纳秒……那么几乎不可能达到纳秒的精度。

Still, something like thatis a start:

不过，这样的事情是一个开始：

Here's the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started. It will work on Pentium and above (386/486 not supported). This code is actually MS Visual C++ specific, but can be probably very easy ported to whatever else, as long as it supports inline assembly.

这是检索自 CPU 上次启动以来传递的 80x86 CPU 时钟滴答数的实际代码。它适用于 Pentium 及更高版本（不支持 386/486）。这段代码实际上是 MS Visual C++ 特定的，但可以很容易地移植到其他任何地方，只要它支持内联汇编。

inline __int64 GetCpuClocks()
{

    // Counter
    struct { int32 low, high; } counter;

    // Use RDTSC instruction to get clocks count
    __asm push EAX
    __asm push EDX
    __asm __emit 0fh __asm __emit 031h // RDTSC
    __asm mov counter.low, EAX
    __asm mov counter.high, EDX
    __asm pop EDX
    __asm pop EAX

    // Return result
    return *(__int64 *)(&counter);

}

This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute.

这个函数还有一个优点就是速度非常快——执行通常不超过 50 个 cpu 周期。

Using the Timing Figures:
If you need to translate the clock counts into true elapsed time, divide the results by your chip's clock speed. Remember that the "rated" GHz is likely to be slightly different from the actual speed of your chip. To check your chip's true speed, you can use several very good utilities or the Win32 call, QueryPerformanceFrequency().

使用时序图：
如果您需要将时钟计数转换为真实的经过时间，请将结果除以芯片的时钟速度。请记住，“额定”GHz 可能与芯片的实际速度略有不同。要检查芯片的真实速度，您可以使用几个非常好的实用程序或 Win32 调用 QueryPerformanceFrequency()。

Answer 4

回答by Marius

To do this correctly you can use one of two ways, either go with RDTSCor with clock_gettime(). The second is about 2 times faster and has the advantage of giving the right absolute time. Note that for RDTSCto work correctly you need to use it as indicated (other comments on this page have errors, and may yield incorrect timing values on certain processors)

要正确执行此操作，您可以使用两种方法之一，使用RDTSC或使用clock_gettime(). 第二个大约快 2 倍，并且具有提供正确绝对时间的优势。请注意，为了RDTSC正常工作，您需要按照指示使用它（此页面上的其他评论有错误，并且可能会在某些处理器上产生不正确的计时值）

inline uint64_t rdtsc()
{
    uint32_t lo, hi;
    __asm__ __volatile__ (
      "xorl %%eax, %%eax\n"
      "cpuid\n"
      "rdtsc\n"
      : "=a" (lo), "=d" (hi)
      :
      : "%ebx", "%ecx" );
    return (uint64_t)hi << 32 | lo;
}

and for clock_gettime: (I chose microsecond resolution arbitrarily)

而对于clock_gettime：（我随意选择了微秒分辨率）

#include <time.h>
#include <sys/timeb.h>
// needs -lrt (real-time lib)
// 1970-01-01 epoch UTC time, 1 mcs resolution (divide by 1M to get time_t)
uint64_t ClockGetTime()
{
    timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);
    return (uint64_t)ts.tv_sec * 1000000LL + (uint64_t)ts.tv_nsec / 1000LL;
}

the timing and values produced:

产生的时间和价值：

Absolute values:
rdtsc           = 4571567254267600
clock_gettime   = 1278605535506855

Processing time: (10000000 runs)
rdtsc           = 2292547353
clock_gettime   = 1031119636

Answer 5

回答by gagneet

I am using the following to get the desired results:

我正在使用以下内容来获得所需的结果：

#include <time.h>
#include <iostream>
using namespace std;

int main (int argc, char** argv)
{
    // reset the clock
    timespec tS;
    tS.tv_sec = 0;
    tS.tv_nsec = 0;
    clock_settime(CLOCK_PROCESS_CPUTIME_ID, &tS);
    ...
    ... <code to check for the time to be put here>
    ...
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &tS);
    cout << "Time taken is: " << tS.tv_sec << " " << tS.tv_nsec << endl;

    return 0;
}

Answer 6

回答by gongzhitaao

For C++11, here is a simple wrapper:

对于C++11，这是一个简单的包装器：

#include <iostream>
#include <chrono>

class Timer
{
public:
    Timer() : beg_(clock_::now()) {}
    void reset() { beg_ = clock_::now(); }
    double elapsed() const {
        return std::chrono::duration_cast<second_>
            (clock_::now() - beg_).count(); }

private:
    typedef std::chrono::high_resolution_clock clock_;
    typedef std::chrono::duration<double, std::ratio<1> > second_;
    std::chrono::time_point<clock_> beg_;
};

Or for C++03 on *nix,

或者对于 *nix 上的 C++03，

class Timer
{
public:
    Timer() { clock_gettime(CLOCK_REALTIME, &beg_); }

    double elapsed() {
        clock_gettime(CLOCK_REALTIME, &end_);
        return end_.tv_sec - beg_.tv_sec +
            (end_.tv_nsec - beg_.tv_nsec) / 1000000000.;
    }

    void reset() { clock_gettime(CLOCK_REALTIME, &beg_); }

private:
    timespec beg_, end_;
};

Example of usage:

用法示例：

int main()
{
    Timer tmr;
    double t = tmr.elapsed();
    std::cout << t << std::endl;

    tmr.reset();
    t = tmr.elapsed();
    std::cout << t << std::endl;
    return 0;
}

From https://gist.github.com/gongzhitaao/7062087

来自https://gist.github.com/gongzhitaao/7062087

Answer 7

回答by Walter Bright

You can use the following function with gcc running under x86 processors:

您可以对在 x86 处理器下运行的 gcc 使用以下函数：

unsigned long long rdtsc()
{
  #define rdtsc(low, high) \
         __asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))

  unsigned int low, high;
  rdtsc(low, high);
  return ((ulonglong)high << 32) | low;
}

with Digital Mars C++:

使用数字火星 C++：

unsigned long long rdtsc()
{
   _asm
   {
        rdtsc
   }
}

which reads the high performance timer on the chip. I use this when doing profiling.

它读取芯片上的高性能定时器。我在做分析时使用它。

Answer 8

回答by Greg Hewgill

In general, for timing how long it takes to call a function, you want to do it many more times than just once. If you call your function only once and it takes a very short time to run, you still have the overhead of actually calling the timer functions and you don't know how long that takes.

一般来说，为了计时调用一个函数需要多长时间，您希望执行多次而不是一次。如果你只调用你的函数一次并且运行时间很短，你仍然有实际调用计时器函数的开销，而且你不知道这需要多长时间。

For example, if you estimate your function might take 800 ns to run, call it in a loop ten million times (which will then take about 8 seconds). Divide the total time by ten million to get the time per call.

例如，如果您估计您的函数可能需要 800 ns 才能运行，请在循环中调用它一千万次（然后大约需要 8 秒）。将总时间除以一千万得到每次调用的时间。

Answer 9

回答by Raymond Martineau

If you need subsecond precision, you need to use system-specific extensions, and will have to check with the documentation for the operating system. POSIX supports up to microseconds with gettimeofday, but nothing more precise since computers didn't have frequencies above 1GHz.

如果您需要亚秒级精度，则需要使用特定于系统的扩展，并且必须查看操作系统的文档。POSIX 使用gettimeofday最多支持微秒，但没有比这更精确的了，因为计算机没有高于 1GHz 的频率。

If you are using Boost, you can check boost::posix_time.

如果您正在使用 Boost，您可以检查boost::posix_time。

Answer 10

回答by Paul J Moesman

I'm using Borland code here is the code ti_hund gives me some times a negativnumber but timing is fairly good.

我在这里使用 Borland 代码是代码 ti_hund 有时给我一个负数，但时机相当好。

#include <dos.h>

void main() 
{
struct  time t;
int Hour,Min,Sec,Hun;
gettime(&t);
Hour=t.ti_hour;
Min=t.ti_min;
Sec=t.ti_sec;
Hun=t.ti_hund;
printf("Start time is: %2d:%02d:%02d.%02d\n",
   t.ti_hour, t.ti_min, t.ti_sec, t.ti_hund);
....
your code to time
...

// read the time here remove Hours and min if the time is in sec

gettime(&t);
printf("\nTid Hour:%d Min:%d Sec:%d  Hundreds:%d\n",t.ti_hour-Hour,
                             t.ti_min-Min,t.ti_sec-Sec,t.ti_hund-Hun);
printf("\n\nAlt Ferdig Press a Key\n\n");
getch();
} // end main

使用 C++ 以纳秒为单位提供时间的计时器功能

提问by gagneet

采纳答案by grieve

回答by Howard Hinnant

回答by VonC

回答by Marius

回答by gagneet

回答by gongzhitaao

回答by Walter Bright

回答by Greg Hewgill

回答by Raymond Martineau

回答by Paul J Moesman

相关推荐

最近更新

标签

使用 C++ 以纳秒为单位提供时间的计时器功能

提问by gagneet

采纳答案by grieve

回答by Howard Hinnant

回答by VonC

回答by Marius

回答by gagneet

回答by gongzhitaao

回答by Walter Bright

回答by Greg Hewgill

回答by Raymond Martineau

回答by Paul J Moesman

相关推荐

在 C++ 中是按值传递还是按常量引用传递更好？

C++ SDKDDKVer.h 有什么用？

是否可以用 C++ 对 iPhone 进行编程

C++ openMP 嵌套并行 for 循环与内部并行 for

相关推荐

最近更新

标签