如何计算 C++ 中代码片段的执行时间

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1861294/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 21:26:12  来源:igfitidea点击:

How to Calculate Execution Time of a Code Snippet in C++

c++benchmarking

提问by AhmetB - Google

I have to compute execution time of a C++ code snippet in seconds. It must be working either on Windows or Unix machines.

我必须以秒为单位计算 C++ 代码片段的执行时间。它必须在 Windows 或 Unix 机器上运行。

I use code the following code to do this. (import before)

我使用代码下面的代码来做到这一点。(之前导入)

clock_t startTime = clock();
// some code here
// to compute its execution duration in runtime
cout << double( clock() - startTime ) / (double)CLOCKS_PER_SEC<< " seconds." << endl;

However for small inputs or short statements such as a = a + 1, I get "0 seconds" result. I think it must be something like 0.0000001 seconds or something like that.

但是,对于小输入或短语句(例如 a = a + 1),我得到“0 秒”结果。我认为它必须是 0.0000001 秒之类的东西。

I remember that System.nanoTime()in Java works pretty well in this case. However I can't get same exact functionality from clock()function of C++.

我记得System.nanoTime()在这种情况下,Java 工作得很好。但是,我无法从clock()C++ 的函数中获得完全相同的功能。

Do you have a solution?

你有解决方案吗?

采纳答案by Thomas Bonini

You can use this function I wrote. You call GetTimeMs64(), and it returns the number of milliseconds elapsed since the unix epoch using the system clock - the just like time(NULL), except in milliseconds.

你可以使用我写的这个函数。您调用GetTimeMs64(),它返回自使用系统时钟的 unix 纪元以来经过的毫秒数 - 就像 一样time(NULL),但以毫秒为单位。

It works on both windows and linux; it is thread safe.

它适用于 windows 和 linux;它是线程安全的。

Note that the granularity is 15 ms on windows; on linux it is implementation dependent, but it usually 15 ms as well.

请注意,Windows 上的粒度为 15 ms;在 linux 上,它取决于实现,但通常也为 15 毫秒。

#ifdef _WIN32
#include <Windows.h>
#else
#include <sys/time.h>
#include <ctime>
#endif

/* Remove if already defined */
typedef long long int64; typedef unsigned long long uint64;

/* Returns the amount of milliseconds elapsed since the UNIX epoch. Works on both
 * windows and linux. */

uint64 GetTimeMs64()
{
#ifdef _WIN32
 /* Windows */
 FILETIME ft;
 LARGE_INTEGER li;

 /* Get the amount of 100 nano seconds intervals elapsed since January 1, 1601 (UTC) and copy it
  * to a LARGE_INTEGER structure. */
 GetSystemTimeAsFileTime(&ft);
 li.LowPart = ft.dwLowDateTime;
 li.HighPart = ft.dwHighDateTime;

 uint64 ret = li.QuadPart;
 ret -= 116444736000000000LL; /* Convert from file time to UNIX epoch time. */
 ret /= 10000; /* From 100 nano seconds (10^-7) to 1 millisecond (10^-3) intervals */

 return ret;
#else
 /* Linux */
 struct timeval tv;

 gettimeofday(&tv, NULL);

 uint64 ret = tv.tv_usec;
 /* Convert from micro seconds (10^-6) to milliseconds (10^-3) */
 ret /= 1000;

 /* Adds the seconds (10^0) after converting them to milliseconds (10^-3) */
 ret += (tv.tv_sec * 1000);

 return ret;
#endif
}

回答by arhuaco

I have another working example that uses microseconds (UNIX, POSIX, etc).

我还有另一个使用微秒的工作示例(UNIX、POSIX 等)。

    #include <sys/time.h>
    typedef unsigned long long timestamp_t;

    static timestamp_t
    get_timestamp ()
    {
      struct timeval now;
      gettimeofday (&now, NULL);
      return  now.tv_usec + (timestamp_t)now.tv_sec * 1000000;
    }

    ...
    timestamp_t t0 = get_timestamp();
    // Process
    timestamp_t t1 = get_timestamp();

    double secs = (t1 - t0) / 1000000.0L;

Here's the file where we coded this:

这是我们编码的文件:

https://github.com/arhuaco/junkcode/blob/master/emqbit-bench/bench.c

https://github.com/arhuaco/junkcode/blob/master/emqbit-bench/bench.c

回答by gongzhitaao

Here is a simple solution in C++11 which gives you satisfying resolution.

这是 C++11 中的一个简单解决方案,可为您提供令人满意的解决方案。

#include <iostream>
#include <chrono>

class Timer
{
public:
    Timer() : beg_(clock_::now()) {}
    void reset() { beg_ = clock_::now(); }
    double elapsed() const { 
        return std::chrono::duration_cast<second_>
            (clock_::now() - beg_).count(); }

private:
    typedef std::chrono::high_resolution_clock clock_;
    typedef std::chrono::duration<double, std::ratio<1> > second_;
    std::chrono::time_point<clock_> beg_;
};

Or on *nix, for c++03

或者在 *nix 上,对于 c++03

#include <iostream>
#include <ctime>

class Timer
{
public:
    Timer() { clock_gettime(CLOCK_REALTIME, &beg_); }

    double elapsed() {
        clock_gettime(CLOCK_REALTIME, &end_);
        return end_.tv_sec - beg_.tv_sec +
            (end_.tv_nsec - beg_.tv_nsec) / 1000000000.;
    }

    void reset() { clock_gettime(CLOCK_REALTIME, &beg_); }

private:
    timespec beg_, end_;
};

Here is the example usage:

以下是示例用法:

int main()
{
    Timer tmr;
    double t = tmr.elapsed();
    std::cout << t << std::endl;

    tmr.reset();
    t = tmr.elapsed();
    std::cout << t << std::endl;

    return 0;
}

From https://gist.github.com/gongzhitaao/7062087

来自https://gist.github.com/gongzhitaao/7062087

回答by Tomas Andrle

#include <boost/progress.hpp>

using namespace boost;

int main (int argc, const char * argv[])
{
  progress_timer timer;

  // do stuff, preferably in a 100x loop to make it take longer.

  return 0;
}

When progress_timergoes out of scope it will print out the time elapsed since its creation.

progress_timer超出范围时,它将打印自创建以来经过的时间。

UPDATE: Here's a version that works without Boost (tested on macOS/iOS):

更新:这是一个无需 Boost 即可工作的版本(在 macOS/iOS 上测试):

#include <chrono>
#include <string>
#include <iostream>
#include <math.h>
#include <unistd.h>

class NLTimerScoped {
private:
    const std::chrono::steady_clock::time_point start;
    const std::string name;

public:
    NLTimerScoped( const std::string & name ) : name( name ), start( std::chrono::steady_clock::now() ) {
    }


    ~NLTimerScoped() {
        const auto end(std::chrono::steady_clock::now());
        const auto duration_ms = std::chrono::duration_cast<std::chrono::milliseconds>( end - start ).count();

        std::cout << name << " duration: " << duration_ms << "ms" << std::endl;
    }

};

int main(int argc, const char * argv[]) {

    {
        NLTimerScoped timer( "sin sum" );

        float a = 0.0f;

        for ( int i=0; i < 1000000; i++ ) {
            a += sin( (float) i / 100 );
        }

        std::cout << "sin sum = " << a << std::endl;
    }



    {
        NLTimerScoped timer( "sleep( 4 )" );

        sleep( 4 );
    }



    return 0;
}

回答by Captain Comic

Windows provides QueryPerformanceCounter() function, and Unix has gettimeofday() Both functions can measure at least 1 micro-second difference.

Windows 提供了 QueryPerformanceCounter() 函数,Unix 有 gettimeofday() 两个函数都可以测量至少 1 微秒的差异。

回答by kriss

In some programs I wrote I used RDTSfor such purpose. RDTSC is not about time but about number of cycles from processor start. You have to calibrate it on your system to get a result in second, but it's really handy when you want to evaluate performance, it's even better to use number of cycles directly without trying to change them back to seconds.

在我编写的一些程序中,我为此目的使用了RDTS。RDTSC 与时间无关,而是与处理器启动后的周期数有关。你必须在你的系统上校准它才能在秒内得到结果,但是当你想评估性能时它真的很方便,直接使用循环数而不试图将它们改回秒会更好。

(link above is to a french wikipedia page, but it has C++ code samples, english version is here)

(上面的链接是一个法语维基百科页面,但它有 C++ 代码示例,英文版在这里

回答by Thomas Matthews

I suggest using the standard library functions for obtaining time information from the system.

我建议使用标准库函数从系统获取时间信息。

If you want finer resolution, perform more execution iterations. Instead of running the program once and obtaining samples, run it 1000 times or more.

如果您想要更精细的分辨率,请执行更多的执行迭代。与其运行一次程序并获取样本,不如运行 1000 次或更多次。

回答by Adisak

It is better to run the inner loop several times with the performance timing only once and average by dividing inner loop repetitions than to run the whole thing (loop + performance timing) several times and average. This will reduce the overhead of the performance timing code vs your actual profiled section.

最好在性能计时只运行一次并通过划分内循环重复进行平均而不是将整个事情(循环 + 性能计时)运行几次并取平均值。这将减少性能计时代码与实际分析部分的开销。

Wrap your timer calls for the appropriate system. For Windows, QueryPerformanceCounter is pretty fast and "safe" to use.

为适当的系统包装定时器调用。对于 Windows,QueryPerformanceCounter 使用起来非常快且“安全”。

You can use "rdtsc" on any modern X86 PC as well but there may be issues on some multicore machines (core hopping may change timer) or if you have speed-step of some sort turned on.

您也可以在任何现代 X86 PC 上使用“rdtsc”,但在某些多核机器上可能会出现问题(核心跳跃可能会改变计时器),或者如果您打开了某种速度步进。

回答by Hyman Giffin

A complete unfailing solution to thread scheduling, which should yield exactly the same times per each test, is to compile your program to be OS independent and boot up your computer so as to run the program in an OS-free environment. Yet, this is largely impractical and would be difficult at best.

一个完整的线程调度解决方案,每个测试应该产生完全相同的时间,是编译你的程序独立于操作系统并启动你的计算机,以便在无操作系统的环境中运行程序。然而,这在很大程度上是不切实际的,充其量也很困难。

A good substitute to going OS-free is just to set the affinity of the current thread to 1 core and the priority to the highest. This alternative should provide consistent-enough results.

一个很好的替代 OS-free 的方法是将当前线程的亲和性设置为 1 个核心,并将优先级设置为最高。此替代方案应提供足够一致的结果。

Also you should turn off optimizations which would interfere with debugging, which for g++ or gcc means adding -Ogto the command line, to prevent the code being tested from being optimized out. The -O0flag should not be used because it introduces extra unneeded overhead which would be included in the timing results, thus skewing the timed speed of the code.

此外,您应该关闭会干扰调试的优化,这对于 g++ 或 gcc 意味着添加-Og到命令行,以防止被测试的代码被优化掉。-O0不应使用该标志,因为它会引入额外的不需要的开销,这些开销将包含在计时结果中,从而影响代码的计时速度。

On the contrary, both assuming that you use -Ofast(or, at the very least, -O3) on the final production build and ignoring the issue of "dead" code elimination, -Ogperforms very few optimizations compared to -Ofast; thus -Ogcan misrepresent the real speed of the code in the final product.

相反,假设您在最终生产版本中使用-Ofast(或至少,-O3)并忽略“死”代码消除问题,-Og与 相比,执行的优化很少-Ofast;因此-Og可能会歪曲最终产品中代码的真实速度。

Further, all speed tests (to some extent) perjure: in the final production product compiled with -Ofast, each snippet/section/function of code is not isolated; rather, each snippet of code continuously flows into the next, thus allowing the compiler to potential join, merge, and optimize together pieces of code from all over the place.

此外,所有速度测试(在某种程度上)都证明:在使用 编译的最终生产产品中-Ofast,代码的每个片段/部分/功能都不是孤立的;相反,每个代码片段不断流入下一个,从而允许编译器潜在地将来自各地的代码片段连接、合并和优化在一起。

At the same time, if you are benchmarking a snippet of code which makes heavy use of realloc(), then the snippet of code might run slower in a production product with high enough memory fragmentation. Hence, the expression "the whole is more than the sum of its parts" applies to this situation because code in the final production build might run noticeably faster or slower than the individual snippet which you are speed testing.

同时,如果您正在对大量使用 的代码片段进行基准测试realloc(),那么该代码片段在具有足够高内存碎片的生产产品中可能会运行得更慢。因此,表达“整体大于部分之和”适用于这种情况,因为最终生产版本中的代码运行速度可能明显快于或慢于您正在测试的单个片段。

A partial solution that may lessen the incongruity is using -Ofastfor speed testing WITH the addition of asm volatile("" :: "r"(var))to the variables involved in the test for preventing dead code/loop elimination.

可以减少不一致的部分解决方案是-Ofast用于速度测试,并添加asm volatile("" :: "r"(var))到测试中涉及的变量以防止死代码/循环消除。

Here is an example of how to benchmark square root functions on a Windows computer.

以下是如何在 Windows 计算机上对平方根函数进行基准测试的示例。

// set USE_ASM_TO_PREVENT_ELIMINATION  to 0 to prevent `asm volatile("" :: "r"(var))`
// set USE_ASM_TO_PREVENT_ELIMINATION  to 1 to enforce `asm volatile("" :: "r"(var))`
#define USE_ASM_TO_PREVENT_ELIMINATION 1

#include <iostream>
#include <iomanip>
#include <cstdio>
#include <chrono>
#include <cmath>
#include <windows.h>
#include <intrin.h>
#pragma intrinsic(__rdtsc)
#include <cstdint>

class Timer {
public:
    Timer() : beg_(clock_::now()) {}
    void reset() { beg_ = clock_::now(); }
    double elapsed() const { 
        return std::chrono::duration_cast<second_>
            (clock_::now() - beg_).count(); }
private:
    typedef std::chrono::high_resolution_clock clock_;
    typedef std::chrono::duration<double, std::ratio<1> > second_;
    std::chrono::time_point<clock_> beg_;
};

unsigned int guess_sqrt32(register unsigned int n) {
    register unsigned int g = 0x8000;
    if(g*g > n) {
        g ^= 0x8000;
    }
    g |= 0x4000;
    if(g*g > n) {
        g ^= 0x4000;
    }
    g |= 0x2000;
    if(g*g > n) {
        g ^= 0x2000;
    }
    g |= 0x1000;
    if(g*g > n) {
        g ^= 0x1000;
    }
    g |= 0x0800;
    if(g*g > n) {
        g ^= 0x0800;
    }
    g |= 0x0400;
    if(g*g > n) {
        g ^= 0x0400;
    }
    g |= 0x0200;
    if(g*g > n) {
        g ^= 0x0200;
    }
    g |= 0x0100;
    if(g*g > n) {
        g ^= 0x0100;
    }
    g |= 0x0080;
    if(g*g > n) {
        g ^= 0x0080;
    }
    g |= 0x0040;
    if(g*g > n) {
        g ^= 0x0040;
    }
    g |= 0x0020;
    if(g*g > n) {
        g ^= 0x0020;
    }
    g |= 0x0010;
    if(g*g > n) {
        g ^= 0x0010;
    }
    g |= 0x0008;
    if(g*g > n) {
        g ^= 0x0008;
    }
    g |= 0x0004;
    if(g*g > n) {
        g ^= 0x0004;
    }
    g |= 0x0002;
    if(g*g > n) {
        g ^= 0x0002;
    }
    g |= 0x0001;
    if(g*g > n) {
        g ^= 0x0001;
    }
    return g;
}

unsigned int empty_function( unsigned int _input ) {
    return _input;
}

unsigned long long empty_ticks=0;
double empty_seconds=0;
Timer my_time;

template<unsigned int benchmark_repetitions>
void benchmark( char* function_name, auto (*function_to_do)( auto ) ) {
    register unsigned int i=benchmark_repetitions;
    register unsigned long long start=0;
    my_time.reset();
    start=__rdtsc();
    while ( i-- ) {
        auto result = (*function_to_do)( i << 7 );
        #if USE_ASM_TO_PREVENT_ELIMINATION == 1
            asm volatile("" :: "r"(
                // There is no data type in C++ that is smaller than a char, so it will
                //  not throw a segmentation fault error to reinterpret any arbitrary
                //  data type as a char. Although, the compiler might not like it.
                result
            ));
        #endif
    }
    if ( function_name == nullptr ) {
        empty_ticks = (__rdtsc()-start);
        empty_seconds = my_time.elapsed();
        std::cout<< "Empty:\n" << empty_ticks
              << " ticks\n" << benchmark_repetitions << " repetitions\n"
               << std::setprecision(15) << empty_seconds
                << " seconds\n\n";
    } else {
        std::cout<< function_name<<":\n" << (__rdtsc()-start-empty_ticks)
              << " ticks\n" << benchmark_repetitions << " repetitions\n"
               << std::setprecision(15) << (my_time.elapsed()-empty_seconds)
                << " seconds\n\n";
    }
}


int main( void ) {
    void* Cur_Thread=   GetCurrentThread();
    void* Cur_Process=  GetCurrentProcess();
    unsigned long long  Current_Affinity;
    unsigned long long  System_Affinity;
    unsigned long long furthest_affinity;
    unsigned long long nearest_affinity;

    if( ! SetThreadPriority(Cur_Thread,THREAD_PRIORITY_TIME_CRITICAL) ) {
        SetThreadPriority( Cur_Thread, THREAD_PRIORITY_HIGHEST );
    }
    if( ! SetPriorityClass(Cur_Process,REALTIME_PRIORITY_CLASS) ) {
        SetPriorityClass( Cur_Process, HIGH_PRIORITY_CLASS );
    }
    GetProcessAffinityMask( Cur_Process, &Current_Affinity, &System_Affinity );
    furthest_affinity = 0x8000000000000000ULL>>__builtin_clzll(Current_Affinity);
    nearest_affinity  = 0x0000000000000001ULL<<__builtin_ctzll(Current_Affinity);
    SetProcessAffinityMask( Cur_Process, furthest_affinity );
    SetThreadAffinityMask( Cur_Thread, furthest_affinity );

    const int repetitions=524288;

    benchmark<repetitions>( nullptr, empty_function );
    benchmark<repetitions>( "Standard Square Root", standard_sqrt );
    benchmark<repetitions>( "Original Guess Square Root", original_guess_sqrt32 );
    benchmark<repetitions>( "New Guess Square Root", new_guess_sqrt32 );


    SetThreadPriority( Cur_Thread, THREAD_PRIORITY_IDLE );
    SetPriorityClass( Cur_Process, IDLE_PRIORITY_CLASS );
    SetProcessAffinityMask( Cur_Process, nearest_affinity );
    SetThreadAffinityMask( Cur_Thread, nearest_affinity );
    for (;;) { getchar(); }

    return 0;
}

Also, credit to Mike Jarvis for his Timer.

另外,感谢 Mike Jarvis 的 Timer。

Please note (this is very important) that if you are going to be running bigger code snippets, then you really must turn down the number of iterations to prevent your computer from freezing up.

请注意(这很重要),如果您要运行更大的代码片段,那么您确实必须减少迭代次数以防止您的计算机死机。

回答by Hyman Giffin

(windows specific solution) The current (circa 2017) way to get accurate timings under windows is to use "QueryPerformanceCounter". This approach has the benefit of giving very accurate results and is recommended by MS. Just plop the code blob into a new console app to get a working sample. There is a lengthy discussion here: Acquiring High resolution time stamps

(Windows 特定解决方案)当前(大约 2017 年)在 Windows 下获得准确计时的方法是使用“QueryPerformanceCounter”。这种方法的好处是可以提供非常准确的结果,并且是 MS 推荐的。只需将代码 blob 放入一个新的控制台应用程序中即可获得一个工作示例。这里有一个冗长的讨论:获取高分辨率时间戳

#include <iostream>
#include <tchar.h>
#include <windows.h>

int main()
{
constexpr int MAX_ITER{ 10000 };
constexpr __int64 us_per_hour{ 3600000000ull }; // 3.6e+09
constexpr __int64 us_per_min{ 60000000ull };
constexpr __int64 us_per_sec{ 1000000ull };
constexpr __int64 us_per_ms{ 1000ull };

// easy to work with
__int64 startTick, endTick, ticksPerSecond, totalTicks = 0ull;

QueryPerformanceFrequency((LARGE_INTEGER *)&ticksPerSecond);

for (int iter = 0; iter < MAX_ITER; ++iter) {// start looping
    QueryPerformanceCounter((LARGE_INTEGER *)&startTick); // Get start tick
    // code to be timed
    std::cout << "cur_tick = " << iter << "\n";
    QueryPerformanceCounter((LARGE_INTEGER *)&endTick); // Get end tick
    totalTicks += endTick - startTick; // accumulate time taken
}

// convert to elapsed microseconds
__int64 totalMicroSeconds =  (totalTicks * 1000000ull)/ ticksPerSecond;

__int64 hours = totalMicroSeconds / us_per_hour;
totalMicroSeconds %= us_per_hour;
__int64 minutes = totalMicroSeconds / us_per_min;
totalMicroSeconds %= us_per_min;
__int64 seconds = totalMicroSeconds / us_per_sec;
totalMicroSeconds %= us_per_sec;
__int64 milliseconds = totalMicroSeconds / us_per_ms;
totalMicroSeconds %= us_per_ms;


std::cout << "Total time: " << hours << "h ";
std::cout << minutes << "m " << seconds << "s " << milliseconds << "ms ";
std::cout << totalMicroSeconds << "us\n";

return 0;
}