C语言更快的等价于 gettimeofday

Question

提问by Humble Debugger

In trying to build a very latency sensitive application, that needs to send 100s of messages a seconds, each message having the time field, we wanted to consider optimizing gettimeofday. Out first thought was rdtscbased optimization. Any thoughts ? Any other pointers ? Required accurancy of the time value returned is in milliseconds, but it isn't a big deal if the value is occasionally out of sync with the receiver for 1-2 milliseconds. Trying to do better than the 62 nanoseconds gettimeofday takes

在尝试构建一个对延迟非常敏感的应用程序时，需要每秒发送 100 条消息，每条消息都有时间字段，我们想考虑优化 gettimeofday。首先想到的是rdtsc基于优化。有什么想法吗？任何其他指针？返回的时间值所需的精度以毫秒为单位，但如果该值偶尔与接收器不同步 1-2 毫秒，这也没什么大不了的。试图做得比 gettimeofday 的 62 纳秒更好

Answer 1

回答by David Terei

POSIX Clocks

POSIX 时钟

I wrote a benchmark for POSIX clock sources:

我为 POSIX 时钟源编写了一个基准测试：

time (s) => 3 cycles
ftime (ms) => 54 cycles
gettimeofday (us) => 42 cycles
clock_gettime (ns) => 9 cycles (CLOCK_MONOTONIC_COARSE)
clock_gettime (ns) => 9 cycles (CLOCK_REALTIME_COARSE)
clock_gettime (ns) => 42 cycles (CLOCK_MONOTONIC)
clock_gettime (ns) => 42 cycles (CLOCK_REALTIME)
clock_gettime (ns) => 173 cycles (CLOCK_MONOTONIC_RAW)
clock_gettime (ns) => 179 cycles (CLOCK_BOOTTIME)
clock_gettime (ns) => 349 cycles (CLOCK_THREAD_CPUTIME_ID)
clock_gettime (ns) => 370 cycles (CLOCK_PROCESS_CPUTIME_ID)
rdtsc (cycles) => 24 cycles

时间 (s) => 3 个周期
ftime (ms) => 54 个周期
gettimeofday (us) => 42 个周期
clock_gettime (ns) => 9 个周期 (CLOCK_MONOTONIC_COARSE)
clock_gettime (ns) => 9 个周期 (CLOCK_REALTIME_COARSE)
clock_gettime (ns) => 42 个周期 (CLOCK_MONOTONIC)
clock_gettime (ns) => 42 个周期 (CLOCK_REALTIME)
clock_gettime (ns) => 173 个周期 (CLOCK_MONOTONIC_RAW)
clock_gettime (ns) => 179 个周期 (CLOCK_BOOTTIME)
clock_gettime (ns) => 349 个周期 (CLOCK_THREAD_CPUTIME_ID)
clock_gettime (ns) => 370 个周期 (CLOCK_PROCESS_CPUTIME_ID)
rdtsc（周期）=> 24 个周期

These numbers are from an Intel Core i7-4771 CPU @ 3.50GHz on Linux 4.0. These measurements were taken using the TSC register and running each clock method thousands of times and taking the minimum cost value.

这些数字来自 Linux 4.0 上的 Intel Core i7-4771 CPU @ 3.50GHz。这些测量是使用 TSC 寄存器进行的，每个时钟方法运行数千次并取最小成本值。

You'll want to test on the machines you intend to run on though as how these are implemented varies from hardware and kernel version. The code can be found here. It relies on the TSC register for cycle counting, which is in the same repo (tsc.h).

您需要在您打算运行的机器上进行测试，因为它们的实现方式因硬件和内核版本而异。代码可以在这里找到。它依赖于 TSC 寄存器进行周期计数，它在同一个 repo ( tsc.h) 中。

TSC

Access the TSC (processor time-stamp counter) is the most accurate and cheapest way to time things. Generally, this is what the kernel is using itself. It's also quite straight-forward on modern Intel chips as the TSC is synchronized across cores and unaffected by frequency scaling. So it provides a simple, global time source. You can see an example of using it herewith a walkthrough of the assembly code here.

访问 TSC（处理器时间戳计数器）是最准确、最便宜的计时方式。通常，这是内核自己使用的。它在现代英特尔芯片上也非常简单，因为 TSC 跨内核同步并且不受频率缩放的影响。所以它提供了一个简单的全球时间源。你可以看到使用它的一个例子在这里与汇编代码的演练在这里。

The main issue with this (other than portability) is that there doesn't seem to be a good way to go from cycles to nanoseconds. The Intel docs as far as I can find state that the TSC runs at a fixed frequency, but that this frequency may differ from the processors stated frequency. Intel doesn't appear to provide a reliable way to figure out the TSC frequency. The Linux kernel appears to solve this by testing how many TSC cycles occur between two hardware timers (see here).

这个的主要问题（除了便携性）是从周期到纳秒似乎没有一个好方法。据我所知，英特尔文档指出 TSC 以固定频率运行，但该频率可能与处理器声明的频率不同。英特尔似乎没有提供一种可靠的方法来计算 TSC 频率。Linux 内核似乎通过测试两个硬件计时器之间发生了多少 TSC 周期来解决这个问题（请参阅此处）。

Memcached

内存缓存

Memcached bothers to do the cache method. It may simply be to make sure the performance is more predictable across platforms, or scale better with multiple cores. It may also no be a worthwhile optimization.

Memcached 费心做缓存方法。可能只是为了确保跨平台的性能更具可预测性，或者通过多核更好地扩展。它也可能不是一个有价值的优化。

Answer 2

回答by bdonlan

Have you actually benchmarked, and found gettimeofdayto be unacceptably slow?

您是否真的进行了基准测试，发现gettimeofday速度慢得令人无法接受？

At the rate of 100 messages a second, you have 10ms of CPU time per message. If you have multiple cores, assuming it can be fully parallelized, you can easily increase that by 4-6x - that's 40-60ms per message! The cost of gettimeofday is unlikely to be anywhere near 10ms - I'd suspect it to be more like 1-10 microseconds (on my system, microbenchmarking it gives about 1 microsecond per call - try it for yourself). Your optimization efforts would be better spent elsewhere.

以每秒 100 条消息的速度计算，每条消息的 CPU 时间为 10 毫秒。如果您有多个内核，假设它可以完全并行化，您可以轻松地将其增加 4-6 倍——即每条消息 40-60 毫秒！gettimeofday 的成本不太可能接近 10 毫秒 - 我怀疑它更像是 1-10 微秒（在我的系统上，微基准测试它每次调用大约 1 微秒 -自己尝试一下）。您的优化工作最好花在其他地方。

While using the TSC is a reasonable idea, modern Linux already has a userspace TSC-based gettimeofday- where possible, the vdso will pull in an implementation of gettimeofday that applies an offset (read from a shared kernel-user memory segment) to rdtsc's value, thus computing the time of day without entering the kernel. However, some CPU models don't have a TSC synchronized between different cores or different packages, and so this can end up being disabled. If you want high performance timing, you might first want to consider finding a CPU model that does have a synchronized TSC.

虽然使用 TSC 是一个合理的想法，但现代 Linux 已经有一个基于用户空间 TSC 的 gettimeofday- 在可能的情况下，vdso 将引入 gettimeofday 的实现，该实现将偏移量（从共享内核用户内存段读取）应用于rdtsc's值，从而在不进入内核的情况下计算一天中的时间。但是，某些 CPU 型号没有在不同内核或不同软件包之间同步的 TSC，因此最终可能会被禁用。如果您想要高性能计时，您可能首先要考虑寻找具有同步 TSC 的 CPU 型号。

That said, if you're willing to sacrifice a significant amount of resolution (your timing will only be accurate to the last tick, meaning it could be off by tens of milliseconds), you could use CLOCK_MONOTONIC_COARSE or CLOCK_REALTIME_COARSEwith clock_gettime. This is also implemented with the vdso as well, and guaranteed not to call into the kernel (for recent kernels and glibc).

也就是说，如果您愿意牺牲大量的分辨率（您的时间只会精确到最后一个滴答声，这意味着它可能会关闭数十毫秒），您可以将CLOCK_MONOTONIC_COARSE 或 CLOCK_REALTIME_COARSE与clock_gettime 一起使用。这也通过 vdso 实现，并保证不会调用内核（对于最近的内核和 glibc）。

Answer 3

回答by Humble Debugger

Like bdonian says, if you're only sending a few hundred messages per second, gettimeofdayis going to be fast enough.

就像 bdonian 说的，如果你每秒只发送几百条消息，gettimeofday那就足够快了。

However, if you were sending millions of messages per second, it might be different (but you should still measurethat it is a bottleneck). In that case, you might want to consider something like this:

但是，如果您每秒发送数百万条消息，则情况可能会有所不同（但您仍应衡量这是一个瓶颈）。在这种情况下，您可能需要考虑这样的事情：

have a global variable, giving the current timestamp in your desired accuracy
have a dedicated background thread that does nothing except update the timestamp (if timestamp should be updated every T units of time, then have the thread sleep some fraction of T and then update the timestamp; use real-time features if you need to)
all other threads (or the main process, if you don't use threads otherwise) just reads the global variable

有一个全局变量，以您想要的精度给出当前时间戳
有一个专用的后台线程，除了更新时间戳之外什么都不做（如果时间戳应该每 T 个时间单位更新一次，那么让线程休眠 T 的一部分，然后更新时间戳；如果需要，使用实时功能）
所有其他线程（或主进程，如果您不使用其他线程）只读取全局变量

The C language does not guarantee that you can read the timestamp value if it is larger than sig_atomic_t. You could use locking to deal with that, but locking is heavy. Instead, you could use a volatile sig_atomic_ttyped variable to index an array of timestamps: the background thread updates the next element in the array, and then updates the index. The other threads read the index, and then read the array: they might get a tiny bit out-of-date timestamp (but they get the right one next time), but they do not run into the problem where they read the timestamp at the same time it is being updated, and get some bytes of the old value and some of the new value.

如果时间戳值大于，C 语言不保证您可以读取时间戳值sig_atomic_t。您可以使用锁定来解决这个问题，但锁定很重。相反，您可以使用volatile sig_atomic_t类型变量来索引时间戳数组：后台线程更新数组中的下一个元素，然后更新索引。其他线程读取索引，然后读取数组：它们可能会得到一点点过时的时间戳（但下次它们会得到正确的时间戳），但它们不会遇到读取时间戳的问题同时它被更新，并获得一些旧值和一些新值的字节。

But all this is much overkill for just hundreds of messages per second.

但是，对于每秒仅数百条消息而言，所有这些都太过分了。

Answer 4

回答by edW

Below is a benchmark. I see about 30ns. printTime() from rashad How to get current time and date in C++?

下面是一个基准。我看到大约 30ns。来自rashad的printTime()如何在C++中获取当前时间和日期？

#include <string>
#include <iostream>
#include <sys/time.h>
using namespace std;

void printTime(time_t now)
{
    struct tm  tstruct;
    char       buf[80];
    tstruct = *localtime(&now);
    strftime(buf, sizeof(buf), "%Y-%m-%d.%X", &tstruct);
    cout << buf << endl;
}

int main()
{
   timeval tv;
   time_t tm;

   gettimeofday(&tv,NULL);
   printTime((time_t)tv.tv_sec);
   for(int i=0; i<100000000; i++)
        gettimeofday(&tv,NULL);
   gettimeofday(&tv,NULL);
   printTime((time_t)tv.tv_sec);

   printTime(time(NULL));
   for(int i=0; i<100000000; i++)
        tm=time(NULL);
   printTime(time(NULL));

   return 0;
}

3 sec for 100,000,000 calls or 30ns;

100,000,000 次调用为 3 秒或 30ns；

2014-03-20.09:23:35
2014-03-20.09:23:38
2014-03-20.09:23:38
2014-03-20.09:23:41

Answer 5

回答by Vinicius Kamakura

Do you need the millisecond precision? If not you could simply use time()and deal with the unix timestamp.

你需要毫秒精度吗？如果不是，您可以简单地使用time()和处理 unix 时间戳。

C语言更快的等价于 gettimeofday

提问by Humble Debugger

回答by David Terei

POSIX Clocks

POSIX 时钟

TSC

TSC

Memcached

内存缓存

回答by bdonlan

回答by Humble Debugger

回答by edW

回答by Vinicius Kamakura

相关推荐

最近更新

标签

C语言 更快的等价于 gettimeofday

提问by Humble Debugger

回答by David Terei

POSIX Clocks

POSIX 时钟

TSC

TSC

Memcached

内存缓存

回答by bdonlan

回答by Humble Debugger

回答by edW

回答by Vinicius Kamakura

相关推荐

C语言 将 OpenMP 指定为 GCC

C语言 CMake下的多个目录

C语言 如何在 C 中编写从 a 到 z 和 A 到 Z 的单个 for 循环？

C语言 在 C 中如何将字节数组转换为十六进制字符串？

相关推荐

最近更新

标签

C语言更快的等价于 gettimeofday

C语言将 OpenMP 指定为 GCC

C语言如何在 C 中编写从 a 到 z 和 A 到 Z 的单个 for 循环？

C语言在 C 中如何将字节数组转换为十六进制字符串？