如何在 Linux 中创建高分辨率计时器来衡量程序性能?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6749621/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to create a high resolution timer in Linux to measure program performance?
提问by sj755
I'm trying to compare GPU to CPU performance. For the NVIDIA GPU I've been using the cudaEvent_t
types to get a very precise timing.
我正在尝试将 GPU 与 CPU 性能进行比较。对于 NVIDIA GPU,我一直在使用这些cudaEvent_t
类型来获得非常精确的计时。
For the CPU I've been using the following code:
对于 CPU,我一直在使用以下代码:
// Timers
clock_t start, stop;
float elapsedTime = 0;
// Capture the start time
start = clock();
// Do something here
.......
// Capture the stop time
stop = clock();
// Retrieve time elapsed in milliseconds
elapsedTime = (float)(stop - start) / (float)CLOCKS_PER_SEC * 1000.0f;
Apparently, that piece of code is only good if you're counting in seconds. Also, the results sometime come out quite strange.
显然,这一段代码只有在你以秒为单位计算时才有用。此外,结果有时会很奇怪。
Does anyone know of some way to create a high resolution timer in Linux?
有谁知道在 Linux 中创建高分辨率计时器的某种方法?
采纳答案by NPE
Check out clock_gettime
, which is a POSIX interface to high-resolution timers.
查看clock_gettime
,这是高分辨率计时器的 POSIX 接口。
If, having read the manpage, you're left wondering about the difference between CLOCK_REALTIME
and CLOCK_MONOTONIC
, see Difference between CLOCK_REALTIME and CLOCK_MONOTONIC?
如果在阅读手册页后,您对CLOCK_REALTIME
和之间的区别感到疑惑CLOCK_MONOTONIC
,请参阅CLOCK_REALTIME 和 CLOCK_MONOTONIC 之间的区别?
See the following page for a complete example: http://www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime/
有关完整示例,请参见以下页面:http: //www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime/
#include <iostream>
#include <time.h>
using namespace std;
timespec diff(timespec start, timespec end);
int main()
{
timespec time1, time2;
int temp;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time1);
for (int i = 0; i< 242000000; i++)
temp+=temp;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time2);
cout<<diff(time1,time2).tv_sec<<":"<<diff(time1,time2).tv_nsec<<endl;
return 0;
}
timespec diff(timespec start, timespec end)
{
timespec temp;
if ((end.tv_nsec-start.tv_nsec)<0) {
temp.tv_sec = end.tv_sec-start.tv_sec-1;
temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
} else {
temp.tv_sec = end.tv_sec-start.tv_sec;
temp.tv_nsec = end.tv_nsec-start.tv_nsec;
}
return temp;
}
回答by Nikolai Fetissov
回答by Karoly Horvath
struct timespec t;
clock_gettime(CLOCK_REALTIME, &t);
there is also CLOCK_REALTIME_HR, but I'm not sure whether it makes any difference..
还有 CLOCK_REALTIME_HR,但我不确定它是否有任何区别..
回答by Foo Bah
Are you interested in wall time (how much time actually elapses) or cycle count (how many cycles)? In the first case, you should use something like gettimeofday
.
您对挂墙时间(实际过去了多少时间)或周期数(多少个周期)感兴趣?在第一种情况下,您应该使用类似gettimeofday
.
The highest resolution timer uses the RDTSC
x86 assembly instruction. However, this measures clock ticks, so you should be sure that power saving mode is disabled.
最高分辨率定时器使用RDTSC
x86 汇编指令。但是,这会测量时钟滴答,因此您应该确保禁用省电模式。
The wiki page for TSC gives a few examples: http://en.wikipedia.org/wiki/Time_Stamp_Counter
TSC 的 wiki 页面提供了一些示例:http: //en.wikipedia.org/wiki/Time_Stamp_Counter
回答by Alex
To summarise information presented so far, these are the two functions required for typical applications.
总结到目前为止提供的信息,这些是典型应用程序所需的两个功能。
#include <time.h>
// call this function to start a nanosecond-resolution timer
struct timespec timer_start(){
struct timespec start_time;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start_time);
return start_time;
}
// call this function to end a timer, returning nanoseconds elapsed as a long
long timer_end(struct timespec start_time){
struct timespec end_time;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end_time);
long diffInNanos = (end_time.tv_sec - start_time.tv_sec) * (long)1e9 + (end_time.tv_nsec - start_time.tv_nsec);
return diffInNanos;
}
Here is an example of how to use them in timing how long it takes to calculate the variance of a list of input.
这是一个示例,说明如何使用它们来计时计算输入列表的方差所需的时间。
struct timespec vartime = timer_start(); // begin a timer called 'vartime'
double variance = var(input, MAXLEN); // perform the task we want to time
long time_elapsed_nanos = timer_end(vartime);
printf("Variance = %f, Time taken (nanoseconds): %ld\n", variance, time_elapsed_nanos);
回答by Kevin Lee
epoll implemention: https://github.com/ielife/simple-timer-for-c-language
epoll 实现:https: //github.com/ielife/simple-timer-for-c-language
use like this:
像这样使用:
timer_server_handle_t *timer_handle = timer_server_init(1024);
if (NULL == timer_handle) {
fprintf(stderr, "timer_server_init failed\n");
return -1;
}
ctimer timer1;
timer1.count_ = 3;
timer1.timer_internal_ = 0.5;
timer1.timer_cb_ = timer_cb1;
int *user_data1 = (int *)malloc(sizeof(int));
*user_data1 = 100;
timer1.user_data_ = user_data1;
timer_server_addtimer(timer_handle, &timer1);
ctimer timer2;
timer2.count_ = -1;
timer2.timer_internal_ = 0.5;
timer2.timer_cb_ = timer_cb2;
int *user_data2 = (int *)malloc(sizeof(int));
*user_data2 = 10;
timer2.user_data_ = user_data2;
timer_server_addtimer(timer_handle, &timer2);
sleep(10);
timer_server_deltimer(timer_handle, timer1.fd);
timer_server_deltimer(timer_handle, timer2.fd);
timer_server_uninit(timer_handle);
回答by radato
After reading this thread I started testing the code for clock_gettime against c++11's chrono and they don't seem to match.
阅读此线程后,我开始针对 c++11 的 chrono 测试 clock_gettime 的代码,但它们似乎不匹配。
There is a huge gap between them!
他们之间的差距实在是太大了!
The std::chrono::seconds(1)seems to be equivalent to ~30,000of the clock_gettime
所述的std ::计时::秒(1)似乎是相当于〜30000的的clock_gettime
#include <ctime>
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <thread>
#include <chrono>
#include <iomanip>
#include <vector>
timespec diff(timespec start, timespec end);
timespec get_cpu_now_time();
std::vector<timespec> get_start_end_pairs();
void output_deltas(const std::vector<timespec> &start_end_pairs);
//=============================================================
int main()
{
std::cout << "Hello waiter" << std::endl; // flush is intentional
std::vector<timespec> start_end_pairs = get_start_end_pairs();
output_deltas(start_end_pairs);
return EXIT_SUCCESS;
}
//=============================================================
std::vector<timespec> get_start_end_pairs()
{
std::vector<timespec> start_end_pairs;
for (int i = 0; i < 20; ++i)
{
start_end_pairs.push_back(get_cpu_now_time());
std::this_thread::sleep_for(std::chrono::seconds(1));
start_end_pairs.push_back(get_cpu_now_time());
}
return start_end_pairs;
}
//=============================================================
void output_deltas(const std::vector<timespec> &start_end_pairs)
{
for (auto it_start = start_end_pairs.begin(); it_start != start_end_pairs.end(); it_start += 2)
{
auto it_end = it_start + 1;
auto delta = diff(*it_start, *it_end);
std::cout
<< "Waited ("
<< delta.tv_sec
<< "\ts\t"
<< std::setw(9)
<< std::setfill('0')
<< delta.tv_nsec
<< "\tns)"
<< std::endl;
}
}
//=============================================================
timespec diff(timespec start, timespec end)
{
timespec temp;
temp.tv_sec = end.tv_sec-start.tv_sec;
temp.tv_nsec = end.tv_nsec-start.tv_nsec;
if (temp.tv_nsec < 0) {
++temp.tv_sec;
temp.tv_nsec += 1000000000;
}
return temp;
}
//=============================================================
timespec get_cpu_now_time()
{
timespec now_time;
memset(&now_time, 0, sizeof(timespec));
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &now_time);
return now_time;
}
output:
输出:
Waited (0 s 000064802 ns)
Waited (0 s 000028512 ns)
Waited (0 s 000030664 ns)
Waited (0 s 000041233 ns)
Waited (0 s 000013458 ns)
Waited (0 s 000024068 ns)
Waited (0 s 000027591 ns)
Waited (0 s 000028148 ns)
Waited (0 s 000033783 ns)
Waited (0 s 000022382 ns)
Waited (0 s 000027866 ns)
Waited (0 s 000028085 ns)
Waited (0 s 000028012 ns)
Waited (0 s 000028172 ns)
Waited (0 s 000022121 ns)
Waited (0 s 000052940 ns)
Waited (0 s 000032138 ns)
Waited (0 s 000028082 ns)
Waited (0 s 000034486 ns)
Waited (0 s 000018875 ns)