C++ 如何在适用于 x86、arm、GCC 和 icc 的 Linux 上执行原子操作?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2287451/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 22:53:43  来源:igfitidea点击:

How to perform atomic operations on Linux that work on x86, arm, GCC and icc?

c++clinuxatomic

提问by Artyom

Every Modern OS provides today some atomic operations:

如今,每个现代操作系统都提供了一些原子操作:

  • Windows has Interlocked*API
  • FreeBSD has <machine/atomic.h>
  • Solaris has <atomic.h>
  • Mac OS X has <libkern/OSAtomic.h>
  • Windows 有Interlocked*API
  • FreeBSD 有 <machine/atomic.h>
  • Solaris 有 <atomic.h>
  • Mac OS X 有 <libkern/OSAtomic.h>

Anything like that for Linux?

Linux 有类似的东西吗?

  • I need it to work on most Linux supported platforms including: x86, x86_64 and arm.
  • I need it to work on at least GCC and Intel Compiler.
  • I need not to use 3rd par library like glib or qt.
  • I need it to work in C++ (C not required)
  • 我需要它在大多数 Linux 支持的平台上工作,包括:x86、x86_64 和 arm
  • 我需要它至少在 GCC 和英特尔编译器上工作。
  • 我不需要使用像 glib 或 qt 这样的 3rd par 库。
  • 我需要它在 C++ 中工作(不需要 C)

Issues:

问题:

  • GCC atomic builtins __sync_*are not supported on all platforms (ARM) and are not supported by the Intel compiler.
  • AFAIK <asm/atomic.h>should not be used in user space and I haven't successfully used it at all. Also, I'm not sure if it would work with Intel compiler.
  • __sync_*并非所有平台 (ARM) 都支持GCC 原子内置函数,并且英特尔编译器不支持。
  • AFAIK<asm/atomic.h>不应该在用户空间使用,我根本没有成功使用它。另外,我不确定它是否适用于英特尔编译器。

Any suggestions?

有什么建议?

I know that there are many related questions but some of them point to __sync*which is not feasible for me (ARM) and some point to asm/atomic.h.

我知道有很多相关的问题,但其中一些指向__sync*我(ARM)不可行的,有些指向asm/atomic.h.

Maybe there is an inline assembly library that does this for GCC (ICC supports gcc assembly)?

也许有一个内联汇编库可以为 GCC 执行此操作(ICC 支持 gcc 汇编)?

Edit:

编辑:

There is a very partial solution for add operations only (allows implementing atomic counter but not lock free-structures that require CAS):

有一个仅用于添加操作的非常部分的解决方案(允许实现原子计数器但不允许需要 CAS 的无锁结构):

If you use libstc++(Intel Compiler uses libstdc++) then you can use __gnu_cxx::__exchange_and_addthat defined in <ext/atomicity.h>or <bits/atomicity.h>. Depends on compiler version.

如果您使用libstc++(英特尔编译器使用libstdc++),那么您可以使用或__gnu_cxx::__exchange_and_add中定义的。取决于编译器版本。<ext/atomicity.h><bits/atomicity.h>

However I'd still like to see something that supports CAS.

但是我仍然希望看到支持 CAS 的东西。

采纳答案by Noah Watkins

Projects are using this:

项目正在使用这个:

http://packages.debian.org/source/sid/libatomic-ops

http://packages.debian.org/source/sid/libatomic-ops

If you want simple operations such as CAS, can't you just just use the arch-specific implementations out of the kernel, and do arch checks in user-space with autotools/cmake? As far as licensing goes, although the kernel is GPL, I think it's arguable that the inline assembly for these operations is provided by Intel/AMD, not that the kernel has a license on them. They just happen to be in an easily accessible form in the kernel source.

如果你想要简单的操作,比如 CAS,你不能只在内核之外使用特定于 arch 的实现,并在用户空间中使用 autotools/cmake 进行 arch 检查吗?就许可而言,虽然内核是 GPL,但我认为这些操作的内联程序集是由 Intel/AMD 提供的,而不是内核对它们有许可证是有争议的。它们恰好在内核源代码中以一种易于访问的形式存在。

回答by kevinarpe

Recent standards (from 2011) of C & C++ now specify atomic operations:

C 和 C++ 的最新标准(从 2011 年开始)现在指定了原子操作:

Regardless, your platform or compiler may not support these newer headers & features.

无论如何,您的平台或编译器可能不支持这些较新的头文件和功能。

回答by asveikau

Darn. I was going to suggest the GCC primitives, then you said they were off limits. :-)

该死的。我打算推荐 GCC 原语,然后你说它们是不受限制的。:-)

In that case, I would do an #ifdeffor each architecture/compiler combination you care about and code up the inline asm. And maybe check for __GNUC__or some similar macro and use the GCC primitives if they are available, because it feels so much more right to use those. :-)

在这种情况下,我会#ifdef为您关心的每个架构/编译器组合做一个,并编写内联 asm。也许检查__GNUC__或一些类似的宏并使用 GCC 原语(如果它们可用),因为使用它们感觉更合适。:-)

You are going to have a lot of duplication and it might be difficult to verify correctness, but this seems to be the way a lot of projects do this, and I've had good results with it.

你会有很多重复,可能很难验证正确性,但这似乎是很多项目这样做的方式,我已经取得了很好的结果。

Some gotchas that have bit me in the past: when using GCC, don't forget "asm volatile" and clobbers for "memory"and "cc", etc.

过去曾困扰过我的一些问题:在使用 GCC 时,不要忘记 " " 和对and等的破坏。asm volatile"memory""cc"

回答by Luc Hermitte

Boost, which has a non intrusive license, and other frameworks already offer portable atomic counters -- as long as they are supported on the target platform.

Boost 具有非侵入性许可证,并且其他框架已经提供便携式原子计数器——只要它们在目标平台上受支持。

Third party libraries are good for us. And if for strange reasons your company forbid you from using them, you can still have a look at how they proceed (as long as the licence permit it for your use) to implement what your are looking for.

第三方库对我们有好处。如果您的公司出于奇怪的原因禁止您使用它们,您仍然可以查看它们如何进行(只要许可证允许您使用)以实现您正在寻找的东西。

回答by Jens Gustedt

I recently did an implementation of such a thing and I was confronted to the same difficulties as you are. My solution was basically the following:

我最近做了一个这样的事情,我遇到了和你一样的困难。我的解决方案基本上如下:

  • try to detect the gcc builtins with the feature macro
  • if not available just implement something like cmpxchwith __asm__for the other architectures (ARM is a bit more complicated than that). Just do that for one possible size, e.g sizeof(int).
  • implement all other functionality on top of that one or two primitives with inlinefunctions
  • 尝试使用功能宏检测 gcc 内置函数
  • 如果没有可用的只是实现类似cmpxch__asm__对其他架构(ARM是复杂得多,一个位)。只需为一种可能的大小执行此操作,例如sizeof(int).
  • 在具有inline函数的一两个原语之上实现所有其他功能

回答by Justin Cormack

There is a patch for GCC here to support ARM atomic operations. WIll not help you on Intel, but you could examine the code - there is recent kernel support for older ARM architectures, and newer ones have the instructions built in, so you should be able to build something that works.

这里有一个 GCC 补丁来支持 ARM 原子操作。在英特尔上不会帮助你,但你可以检查代码 - 最近有对旧 ARM 架构的内核支持,而较新的内核有内置的指令,所以你应该能够构建一些有效的东西。

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00050.html

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00050.html

回答by Mecki

__sync*certainly is (and has been) supported by the Intel compiler, because GCC adopted these build-ins from there. Read the first paragraph on this page. Also see "Intel? C++ Compiler for Linux* Intrinsics Reference", page 198. It's from 2006and describes exactly those built-ins.

__sync*英特尔编译器肯定(并且已经)支持,因为 GCC 从那里采用了这些内置。阅读本页的第一段。另请参阅“英特尔?C++ 编译器 Linux* 内在参考”,第 198 页。它来自2006 年,并准确描述了那些内置函数。

Regarding ARM support, for older ARM CPUs: it cannot be done entirely in userspace, but it can be done in kernelspace (by disabling interrupts during the operation), and I think I read somewhere that it is supported for quite a while now.

关于 ARM 支持,对于较旧的 ARM CPU:它不能完全在用户空间中完成,但可以在内核空间中完成(通过在操作期间禁用中断),我想我在某处读到它现在已经支持了很长一段时间。

According to this PHP bug, dated 2011-10-08, __sync_*will only fail on

根据这个 PHP 错误,日期为 2011-10-08,__sync_*只会失败

  • PA-RISC with anything other than Linux
  • SPARCv7 and lower
  • ARM with GCC < 4.3
  • ARMv5 and lower with anything other than Linux
  • MIPS1
  • PA-RISC 与 Linux 以外的任何东西
  • SPARCv7 及更低版本
  • ARM 与 GCC < 4.3
  • ARMv5 及更低版本,适用于 Linux 以外的任何系统
  • MIPS1

So with GCC > 4.3 (and 4.7 is the current one), you shouldn't have a problem with ARMv6 and newer. You shouldn't have no problem with ARMv5 either as long as compiling for Linux.

因此,对于 GCC > 4.3(并且 4.7 是当前版本),您在使用 ARMv6 和更新版本时应该没有问题。只要为 Linux 编译,您对 ARMv5 也不应该有任何问题。

回答by Mecki

On Debian/Ubuntu recommend...

在 Debian/Ubuntu 上推荐...

sudo apt-get install libatomic-ops-dev

须藤 apt-get 安装 libatomic-ops-dev

examples: http://www.hpl.hp.com/research/linux/atomic_ops/example.php4

示例:http: //www.hpl.hp.com/research/linux/atomic_ops/example.php4

GCC & ICC compatible.

GCC 和 ICC 兼容。

compared to Intel Thread Building Blocks (TBB), using atomic< T >, libatomic-ops-dev is over twice as fast! (Intel compiler)

与使用 atomic<T> 的英特尔线程构建块 (TBB) 相比,libatomic-ops-dev 的速度提高了两倍多!(英特尔编译器)

Testing on Ubuntu i7 producer-consumer threads piping 10 million ints down a ring buffer connection in 0.5secs as opposed to 1.2secs for TBB

在 Ubuntu i7 生产者-消费者线程上测试在 0.5 秒内通过环形缓冲区连接传输 1000 万个整数,而 TBB 为 1.2 秒

And easy to use e.g.

并且易于使用,例如

volatile AO_t head;

挥发性 AO_t 头;

AO_fetch_and_add1(&head);

AO_fetch_and_add1(&head);

回答by artless noise

See: kernel_user_helpers.txtor entry-arm.cand look for __kuser_cmpxchg. As seen in comments of other ARM Linux versions,

请参阅:kernel_user_helpers.txtentry-arm.c并查找__kuser_cmpxchg. 正如其他 ARM Linux 版本的评论中所见,

kuser_cmpxchg

kuser_cmpxchg

Location:       0xffff0fc0

Reference prototype:

  int __kuser_cmpxchg(int32_t oldval, int32_t newval, volatile int32_t *ptr);

Input:

  r0 = oldval
  r1 = newval
  r2 = ptr
  lr = return address

Output:

  r0 = success code (zero or non-zero)
  C flag = set if r0 == 0, clear if r0 != 0

Clobbered registers:

  r3, ip, flags

Definition:

  Atomically store newval in *ptr only if *ptr is equal to oldval.
  Return zero if *ptr was changed or non-zero if no exchange happened.
  The C flag is also set if *ptr was changed to allow for assembly
  optimization in the calling code.

Usage example:
 typedef int (__kuser_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
 #define __kuser_cmpxchg (*(__kuser_cmpxchg_t *)0xffff0fc0)

 int atomic_add(volatile int *ptr, int val)
 {
        int old, new;

        do {
                old = *ptr;
                new = old + val;
        } while(__kuser_cmpxchg(old, new, ptr));

        return new;
}

Notes:

笔记:

  • This routine already includes memory barriers as needed.
  • Valid only if __kuser_helper_version >= 2 (from kernel version 2.6.12).
  • 此例程已根据需要包含内存屏障。
  • 仅当 __kuser_helper_version >= 2(来自内核版本 2.6.12)时有效。

This is for use with Linux with ARMv3 using the swpprimitive. You must have a very ancient ARM not to support this. Only a data abortor interruptcan cause the spinning to fail, so the kernel monitors for this address ~0xffff0fc0and performs a user spacePCfix-up when either a data abortor an interruptoccurs. All user-space libraries that support ARMv5 and lower will use this facility.

这适用于使用swp原语的带有 ARMv3 的 Linux 。你必须有一个非常古老的 ARM 不支持这个。只有数据中止中断会导致旋转失败,因此内核会监视此地址~0xffff0fc0并在发生数据中止中断时执行用户空间PC修复。所有支持 ARMv5 及更低版本的用户空间库都将使用此功能。

For instance, QtConcurrentuses this.

例如,QtConcurrent使用它。