multithreading x86 暂停指令如何在自旋锁中工作 * 和 * 是否可以用于其他场景?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4725676/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 01:12:25  来源:igfitidea点击:

How does x86 pause instruction work in spinlock *and* can it be used in other scenarios?

multithreadingx86spinlock

提问by Infinite

pause instruction is commonly used in the loop of testing spinlock, when some other thread owns the spinlock, to mitigate the tight loop. It's said that it is equivalent to some NOP instructions. Could somebody tell me how exactly it works for spinlock optimization? It seems to me that even the NOP instructions are a waste of CPU time. Will they decrease CPU usage?

pause 指令常用于测试自旋锁的循环中,当其他线程拥有自旋锁时,以缓解紧循环。据说相当于一些NOP指令。有人能告诉我它究竟是如何用于自旋锁优化的吗?在我看来,即使是 NOP 指令也是浪费 CPU 时间。它们会降低 CPU 使用率吗?

Another question is that could I use pause instruction for other similar purposes. For example, I have a busy thread which keeps scanning some places (e.g. a queue) to retrieve new nodes; however, sometimes the queue is empty and the thread is justing wasting cpu time. sleep the thread and wake it up by other threads may be an option, however the thread is critical, so I don't want to make it sleep. Could pause instruction work for my purpose to mitigate the CPU usage? Currently it uses 100% cpu of a physical core?

另一个问题是我可以将暂停指令用于其他类似目的。例如,我有一个繁忙的线程,它不断扫描某些地方(例如队列)以检索新节点;然而,有时队列是空的,线程只是在浪费 CPU 时间。休眠线程并由其他线程唤醒它可能是一种选择,但是线程很关键,所以我不想让它休眠。暂停指令可以用于我的目的以减轻 CPU 使用率吗?目前它使用物理核心的 100% cpu?

Thanks.

谢谢。

采纳答案by blaze

PAUSEnotifies the CPU that this is a spinlock wait loop so memory and cache accesses may be optimized. See also pause instruction in x86for some more details about avoiding the memory-order mis-speculation when leaving the spin-loop.

PAUSE通知 CPU 这是一个自旋锁等待循环,因此可以优化内存和缓存访问。有关在离开自旋循环时避免内存顺序错误推测的更多详细信息,另请参阅x86 中的暂停指令

PAUSE may actually stop CPU for some time to save power. Older CPUs decode it as REP NOP, so you don't have to check if its supported. Older CPUs will simply do nothing (NOP) as fast as possible.

PAUSE 实际上可能会停止 CPU 一段时间以节省电量。较旧的 CPU 将其解码为 REP NOP,因此您无需检查其是否受支持。较旧的 CPU 将尽可能快地什么都不做 (NOP)。

See also https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops

另请参阅https://software.intel.com/en-us/articles/benefit-power-and-performance-sleep-loops



Update: I don't think it's a good idea to use PAUSE in queue checking unless you are going to make your queue spinlock-like (and there is no obvious way to do it).

更新:我认为在队列检查中使用 PAUSE 不是一个好主意,除非您打算使队列类似于自旋锁(并且没有明显的方法)。

Spinning for a very long time is still very bad, even with PAUSE.

旋转很长时间仍然很糟糕,即使有 PAUSE。

回答by Nitin Kunal

A processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops. An additional function of the PAUSE instruction is to reduce the power consumed by Intel processors.

处理器在退出循环时会遭受严重的性能损失,因为它检测到可能的内存顺序违规。PAUSE 指令向处理器提示代码序列是一个自旋等待循环。处理器在大多数情况下使用此提示来避免内存顺序违规,从而大大提高了处理器性能。因此,建议在所有自旋等待循环中放置一条 PAUSE 指令。PAUSE 指令的另一个功能是降低 Intel 处理器的功耗。

[source: Intel manual]

[来源:英特尔手册]

回答by Maxim Masiutin

Intel does only recommend using the PAUSEinstructions when the spin-loop is very short.

Intel 仅建议PAUSE在自旋循环非常短时使用这些指令。

As I understood from your questions, the waits in your case are very long. In this case, spin-loops are not recommended.

正如我从您的问题中了解到的,您的案件等待时间很长。在这种情况下,不推荐使用自旋循环。

You wrote that you have a "thread which keeps scanning some places (e.g. a queue) to retrieve new nodes".

您写道,您有一个“不断扫描某些地方(例如队列)以检索新节点的线程”。

In such a case, Intel recommends using synchronization API functions of your operating system. For example, you can create an event when a new node appears in a queue, and just wait for this event using the WaitForSingleObject(Handle, INFINITE). The queue will trigger this event whenever a new node will appear.

在这种情况下,英特尔建议使用操作系统的同步 API 函数。例如,您可以在队列中出现新节点时创建一个事件,然后使用WaitForSingleObject(Handle, INFINITE). 每当出现新节点时,队列都会触发此事件。

According to the Intel Optimization Manual, the PAUSEinstruction is typically used with software threads executing on two logical processors located in the same processor core, waiting for a lock to be released. Such short wait loops tend to last between tens and a few hundreds of cycles (i.e. 20-500 CPU cycles), so performance-wise it is more beneficial to wait while occupying the CPU than yielding to the OS.

根据英特尔优化手册,该PAUSE指令通常与在位于同一处理器内核中的两个逻辑处理器上执行的软件线程一起使用,等待锁定被释放。如此短的等待循环往往会持续数十到数百个周期(即 20-500 个 CPU 周期),因此在性能方面,在占用 CPU 时等待比让步给操作系统更有利。

500 CPU cycles on a 4500 MHz Core i7 7700K processor is 0.0000001 seconds, i.e. 1/10000000th of a second: the CPU can make 10 million times per second this 500 CPU cycles loop.

4500 MHz Core i7 7700K 处理器上的 500 个 CPU 周期是 0.0000001 秒,即 1/10000000 秒:CPU 每秒可以进行 1000 万次这个 500 个 CPU 周期循环。

As you see, this PAUSEinstruction is for really shortperiods of time.

如您所见,此PAUSE说明适用于非常的时间。

On the other hand, each call to an API function like Sleep() experiences the expensive cost of a context switch, which can be 10000+ cycles; it also suffers the cost of ring 3 to ring 0 transitions, which can be 1000+ cycles.

另一方面,对像 Sleep() 这样的 API 函数的每次调用都会经历上下文切换的昂贵成本,可能是 10000 多个周期;它还遭受环 3 到环 0 转换的成本,这可能是 1000 多个周期。

If there are more threads then the processor cores (multiplied to hyperthreading feature, if present) are available, and a thread will get switched to another one in the middle of a critical section, waiting for the critical section from another thread may really take looong, at least 10000+ cycles, so the PAUSEinstruction will be futile.

如果有更多线程,那么处理器内核(增加到超线程功能,如果存在)可用,并且一个线程将在临界区中间切换到另一个线程,等待来自另一个线程的临界区可能真的需要很长时间, 至少 10000+ 个周期,所以PAUSE指令将是徒劳的。

Please see this articles for more information:

请参阅此文章以获取更多信息:

When the wait loop is expected to last for thousands of cycles or more, it is preferable to yield to the operating system by calling one of the OS synchronization API functions, such as WaitForSingleObject on Windows OS.

当等待循环预计持续数千个周期或更长时间时,最好通过调用操作系统同步 API 函数之一(例如 Windows 操作系统上的 WaitForSingleObject)让步给操作系统。

As a conclusion: in your scenario, the PAUSEinstruction won't be the best choice, since your waiting time is long while the PAUSEis intended for very short loops. PAUSE is just 131 cycles SkyWell or later processors. For example, it is just or 31.19ns on Intel Core i7-7700K CPU @ 4.20GHz Kaby Lake.

作为结论:在您的场景中,该PAUSE指令将不是最佳选择,因为您的等待时间很长,而该指令PAUSE用于非常短的循环。PAUSE 只是 131 个周期 SkyWell 或更高版本的处理器。例如,它在 Intel Core i7-7700K CPU @ 4.20GHz Kaby Lake 上仅为 31.19ns。

On earlier processors, like Haswell, i has about 9 cycles. It is 2.81ns on Intel Core i5-4430 @ 3GHz. So, for the long loops, it's better to relinquish control to other threads using the OS synchronization API functions than to occupy CPU with the PAUSEloop.

在早期的处理器上,比如 Haswell,我有大约 9 个周期。在 Intel Core i5-4430 @ 3GHz 上为 2.81ns。因此,对于长循环,最好使用 OS 同步 API 函数将控制权交给其他线程,而不是让PAUSE循环占用 CPU 。

回答by egbit

The PAUSE instruction also appears to be used in hyper-threading processors to mitigate performance impact on other hyper threads, presumably by relinquishing more CPU time to them.

PAUSE 指令似乎也用于超线程处理器,以减轻对其他超线程的性能影响,大概是通过将更多 CPU 时间交给它们。

The following Intel article outlines this, and not surprisingly recommends avoiding busy wait loops on such processors: https://software.intel.com/en-us/articles/long-duration-spin-wait-loops-on-hyper-threading-technology-enabled-intel-processors

以下英特尔文章对此进行了概述,并且毫不奇怪地建议避免此类处理器上的繁忙等待循环:https: //software.intel.com/en-us/articles/long-duration-spin-wait-loops-on-hyper-threading -技术支持的英特尔处理器