C++ 中循环移位(旋转)操作的最佳实践
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/776508/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best practices for circular shift (rotate) operations in C++
提问by Elroy
Left and right shift operators (<< and >>) are already available in C++. However, I couldn't find out how I could perform circular shift or rotate operations.
左移和右移运算符(<< 和 >>)在 C++ 中已经可用。但是,我不知道如何执行循环移位或旋转操作。
How can operations like "Rotate Left" and "Rotate Right" be performed?
如何执行“向左旋转”和“向右旋转”等操作?
Rotating right twice here
在这里向右旋转两次
Initial --> 1000 0011 0100 0010
should result in:
应该导致:
Final --> 1010 0000 1101 0000
An example would be helpful.
一个例子会有所帮助。
(editor's note: Many common ways of expressing rotates in C suffer from undefined behaviour if the rotate count is zero, or compile to more than just a single rotate machine instruction. This question's answer should document best practices.)
(编者注:如果旋转计数为零,或者编译为多个旋转机器指令,则在 C 中表达旋转的许多常见方法都会遭受未定义的行为。这个问题的答案应该记录最佳实践。)
回答by AndreasT
See also an earlier version of this answer on another rotate questionwith some more details about what asm gcc/clang produce for x86.
另请参阅有关另一个旋转问题的此答案的早期版本,其中包含有关 asm gcc/clang 为 x86 生成的内容的更多详细信息。
The most compiler-friendly way to express a rotate in C and C++ that avoids any Undefined Behaviour seems to be John Regehr's implementation. I've adapted it to rotate by the width of the type (using fixed-width types like uint32_t
).
在 C 和 C++ 中表达旋转以避免任何未定义行为的最编译器友好的方法似乎是John Regehr 的实现。我已将其调整为按类型的宽度旋转(使用固定宽度类型,如uint32_t
)。
#include <stdint.h> // for uint32_t
#include <limits.h> // for CHAR_BIT
// #define NDEBUG
#include <assert.h>
static inline uint32_t rotl32 (uint32_t n, unsigned int c)
{
const unsigned int mask = (CHAR_BIT*sizeof(n) - 1); // assumes width is a power of 2.
// assert ( (c<=mask) &&"rotate by type width or more");
c &= mask;
return (n<<c) | (n>>( (-c)&mask ));
}
static inline uint32_t rotr32 (uint32_t n, unsigned int c)
{
const unsigned int mask = (CHAR_BIT*sizeof(n) - 1);
// assert ( (c<=mask) &&"rotate by type width or more");
c &= mask;
return (n>>c) | (n<<( (-c)&mask ));
}
Works for any unsigned integer type, not just uint32_t
, so you could make versions for other sizes.
适用于任何无符号整数类型,而不仅仅是uint32_t
,因此您可以制作其他大小的版本。
See also a C++11 template versionwith lots of safety checks (including a static_assert
that the type width is a power of 2), which isn't the case on some 24-bit DSPs or 36-bit mainframes, for example.
另请参阅带有大量安全检查的 C++11 模板版本(包括static_assert
类型宽度是 2 的幂),例如,在某些 24 位 DSP 或 36 位大型机上并非如此。
I'd recommend only using the template as a back-end for wrappers with names that include the rotate width explicitly. Integer-promotion rules mean that rotl_template(u16 & 0x11UL, 7)
would do a 32 or 64-bit rotate, not 16(depending on the width of unsigned long
). Even uint16_t & uint16_t
is promoted to signed int
by C++'s integer-promotion rules, except on platforms where int
is no wider than uint16_t
.
我建议仅将模板用作名称显式包含旋转宽度的包装器的后端。 整数提升规则意味着rotl_template(u16 & 0x11UL, 7)
将进行 32 位或 64 位轮换,而不是 16 位(取决于 的宽度unsigned long
)。Evenuint16_t & uint16_t
被signed int
C++ 的整数提升规则提升,除了在int
不超过uint16_t
.
On x86, this version inlines to a single rol r32, cl
(or rol r32, imm8
) with compilers that grok it, because the compiler knows that x86 rotate and shift instructionsmask the shift-count the same way the C source does.
在 x86 上,此版本内联到单个rol r32, cl
(或rol r32, imm8
)与理解它的编译器,因为编译器知道x86 旋转和移位指令以与 C 源代码相同的方式屏蔽移位计数。
Compiler support for this UB-avoiding idiom on x86, for uint32_t x
and unsigned int n
for variable-count shifts:
编译器在 x86 上支持这种避免 UB 的习惯用法,用于uint32_t x
和unsigned int n
用于可变计数移位:
- clang: recognized for variable-count rotates since clang3.5, multiple shifts+or insns before that.
- gcc: recognized for variable-count rotates since gcc4.9, multiple shifts+or insns before that. gcc5 and later optimize away the branch and mask in the wikipedia version, too, using just a
ror
orrol
instruction for variable counts. - icc: supported for variable-count rotates since ICC13 or earlier. Constant-count rotates use
shld edi,edi,7
which is slower and takes more bytes thanrol edi,7
on some CPUs (especially AMD, but also some Intel), when BMI2 isn't available forrorx eax,edi,25
to save a MOV. - MSVC: x86-64 CL19: Only recognized for constant-count rotates. (The wikipedia idiom is recognized, but the branch and AND aren't optimized away). Use the
_rotl
/_rotr
intrinsics from<intrin.h>
on x86 (including x86-64).
- clang:从 clang3.5 开始识别可变计数轮换,在此之前多次轮班+或 insns。
- gcc:自 gcc4.9 以来识别可变计数轮换,在此之前多次轮班+或insns。gcc5 和更高版本也优化了维基百科版本中的分支和掩码,仅使用
ror
orrol
指令进行变量计数。 - icc:自 ICC13 或更早版本起支持可变计数轮换。当 BMI2 不可用于保存 MOV时,恒定计数轮换使用
shld edi,edi,7
比rol edi,7
某些 CPU(尤其是 AMD,但也包括某些英特尔)慢且占用更多字节rorx eax,edi,25
。 - MSVC:x86-64 CL19:仅识别恒定计数旋转。(维基百科习语被识别,但分支和 AND 没有被优化掉)。使用x86(包括 x86-64)上的
_rotl
/_rotr
内在函数<intrin.h>
。
gcc for ARM uses an and r1, r1, #31
for variable-count rotates, but still does the actual rotate with a single instruction: ror r0, r0, r1
. So gcc doesn't realize that rotate-counts are inherently modular. As the ARM docs say, "ROR with shift length, n
, more than 32 is the same as ROR with shift length n-32
". I think gcc gets confused here because left/right shifts on ARM saturate the count, so a shift by 32 or more will clear the register. (Unlike x86, where shifts mask the count the same as rotates). It probably decides it needs an AND instruction before recognizing the rotate idiom, because of how non-circular shifts work on that target.
gcc for ARM 使用and r1, r1, #31
for 可变计数轮换,但仍然使用单个指令进行实际轮换: ror r0, r0, r1
。所以 gcc 没有意识到旋转计数本质上是模块化的。正如 ARM 文档所说,“具有移位长度的 ROR n
,超过 32 与具有移位长度的 ROR 相同n-32
”。我认为 gcc 在这里会感到困惑,因为 ARM 上的左/右移位会使计数饱和,因此移位 32 或更多将清除寄存器。(与 x86 不同,在 x86 中,移位屏蔽与旋转相同的计数)。它可能决定在识别旋转习语之前需要一个 AND 指令,因为非循环移位在该目标上是如何工作的。
Current x86 compilers still use an extra instruction to mask a variable count for 8 and 16-bit rotates, probably for the same reason they don't avoid the AND on ARM. This is a missed optimization, because performance doesn't depend on the rotate count on any x86-64 CPU. (Masking of counts was introduced with 286 for performance reasons because it handled shifts iteratively, not with constant-latency like modern CPUs.)
当前的 x86 编译器仍然使用额外的指令来屏蔽 8 位和 16 位循环的变量计数,这可能与它们不避免 ARM 上的 AND 的原因相同。这是一个错过的优化,因为性能不依赖于任何 x86-64 CPU 上的旋转计数。(出于性能原因,使用 286 引入了计数屏蔽,因为它以迭代方式处理移位,而不是像现代 CPU 那样具有恒定延迟。)
BTW, prefer rotate-right for variable-count rotates, to avoid making the compiler do 32-n
to implement a left rotate on architectures like ARM and MIPS that only provide a rotate-right. (This optimizes away with compile-time-constant counts.)
顺便说一句,对于可变计数旋转,更喜欢右旋转,以避免让编译器32-n
在仅提供右旋转的 ARM 和 MIPS 等架构上实现左旋转。(这通过编译时常量计数进行了优化。)
Fun fact: ARM doesn't really have dedicated shift/rotate instructions, it's just MOV with the source operand going through the barrel-shifter in ROR mode: mov r0, r0, ror r1
. So a rotate can fold into a register-source operand for an EOR instruction or something.
有趣的事实:ARM 并没有真正的专用移位/旋转指令,它只是 MOV,源操作数在 ROR 模式下通过桶式移位器:mov r0, r0, ror r1
。因此,旋转可以折叠为 EOR 指令或其他指令的寄存器源操作数。
Make sure you use unsigned types for n
and the return value, or else it won't be a rotate. (gcc for x86 targets does arithmetic right shifts, shifting in copies of the sign-bit rather than zeroes, leading to a problem when you OR
the two shifted values together. Right-shifts of negative signed integers is implementation-defined behaviour in C.)
确保您使用无符号类型 forn
和返回值,否则它不会是 rotate。(用于 x86 目标的 gcc 进行算术右移,在符号位的副本中移位而不是零,当您OR
将两个移位值放在一起时会导致问题。负有符号整数的右移是 C 中实现定义的行为。)
Also, make sure the shift count is an unsigned type, because (-n)&31
with a signed type could be one's complement or sign/magnitude, and not the same as the modular 2^n you get with unsigned or two's complement. (See comments on Regehr's blog post). unsigned int
does well on every compiler I've looked at, for every width of x
. Some other types actually defeat the idiom-recognition for some compilers, so don't just use the same type as x
.
此外,请确保移位计数是无符号类型,因为(-n)&31
有符号类型可能是一个补码或符号/数量级,与您使用无符号或二进制补码得到的模块化 2^n 不同。(请参阅 Regehr 博客文章的评论)。 unsigned int
在我看过的每个编译器上都做得很好,对于x
. 其他一些类型实际上会破坏某些编译器的习惯用法识别,所以不要只使用与x
.
Some compilers provide intrinsics for rotates, which is far better than inline-asm if the portable version doesn't generate good code on the compiler you're targeting. There aren't cross-platform intrinsics for any compilers that I know of. These are some of the x86 options:
一些编译器为 rotates 提供了内在函数,如果可移植版本不能在您所针对的编译器上生成好的代码,则它比 inline-asm 好得多。我所知道的任何编译器都没有跨平台的内在函数。这些是一些 x86 选项:
- Intel documents that
<immintrin.h>
provides_rotl
and_rotl64
intrinsics, and same for right shift. MSVC requires<intrin.h>
, while gcc require<x86intrin.h>
. An#ifdef
takes care of gcc vs. icc, but clang doesn't seem to provide them anywhere, except in MSVC compatibility mode with-fms-extensions -fms-compatibility -fms-compatibility-version=17.00
. And the asm it emits for them sucks (extra masking and a CMOV). - MSVC:
_rotr8
and_rotr16
. - gcc and icc (not clang):
<x86intrin.h>
also provides__rolb
/__rorb
for 8-bit rotate left/right,__rolw
/__rorw
(16-bit),__rold
/__rord
(32-bit),__rolq
/__rorq
(64-bit, only defined for 64-bit targets). For narrow rotates, the implementation uses__builtin_ia32_rolhi
or...qi
, but the 32 and 64-bit rotates are defined using shift/or (with no protection against UB, because the code inia32intrin.h
only has to work on gcc for x86). GNU C appears not to have any cross-platform__builtin_rotate
functions the way it does for__builtin_popcount
(which expands to whatever's optimal on the target platform, even if it's not a single instruction). Most of the time you get good code from idiom-recognition.
- 英特尔文档
<immintrin.h>
提供_rotl
和_rotl64
内在函数,右移也一样。MSVC 需要<intrin.h>
,而 gcc 需要<x86intrin.h>
. An#ifdef
负责 gcc 与 icc,但 clang 似乎没有在任何地方提供它们,除了在 MSVC 兼容模式下与-fms-extensions -fms-compatibility -fms-compatibility-version=17.00
. 它为他们发出的 asm 很糟糕(额外的掩蔽和 CMOV)。 - MSVC:
_rotr8
和_rotr16
。 - gcc 和 icc(不是 clang):
<x86intrin.h>
还提供__rolb
/__rorb
用于 8 位左/右旋转、__rolw
/__rorw
(16 位)、__rold
/__rord
(32 位)、__rolq
/__rorq
(64 位,仅针对 64 位目标定义)。对于窄轮换,实现使用__builtin_ia32_rolhi
or...qi
,但 32 位和 64 位轮换是使用 shift/or 定义的(没有针对 UB 的保护,因为 中的代码ia32intrin.h
只需要在 x86 的 gcc 上工作)。GNU C 似乎没有__builtin_rotate
像它那样具有任何跨平台功能__builtin_popcount
(即使它不是单个指令,它也可以扩展到目标平台上的最佳功能)。大多数情况下,您可以从 idiom-recognition 中获得好的代码。
// For real use, probably use a rotate intrinsic for MSVC, or this idiom for other compilers. This pattern of #ifdefs may be helpful
#if defined(__x86_64__) || defined(__i386__)
#ifdef _MSC_VER
#include <intrin.h>
#else
#include <x86intrin.h> // Not just <immintrin.h> for compilers other than icc
#endif
uint32_t rotl32_x86_intrinsic(rotwidth_t x, unsigned n) {
//return __builtin_ia32_rorhi(x, 7); // 16-bit rotate, GNU C
return _rotl(x, n); // gcc, icc, msvc. Intel-defined.
//return __rold(x, n); // gcc, icc.
// can't find anything for clang
}
#endif
Presumably some non-x86 compilers have intrinsics, too, but let's not expand this community-wiki answer to include them all. (Maybe do that in the existing answer about intrinsics).
据推测,一些非 x86 编译器也具有内在函数,但我们不要扩展此社区 wiki 答案以将它们全部包含在内。(也许在关于内在函数的现有答案中这样做)。
(The old version of this answer suggested MSVC-specific inline asm (which only works for 32bit x86 code), or http://www.devx.com/tips/Tip/14043for a C version. The comments are replying to that.)
(此答案的旧版本建议特定于 MSVC 的内联 asm(仅适用于 32 位 x86 代码),或http://www.devx.com/tips/Tip/14043用于 C 版本。评论正在回复.)
Inline asm defeats many optimizations, especially MSVC-style because it forces inputs to be stored/reloaded. A carefully-written GNU C inline-asm rotate would allow the count to be an immediate operand for compile-time-constant shift counts, but it still couldn't optimize away entirely if the value to be shifted is also a compile-time constant after inlining. https://gcc.gnu.org/wiki/DontUseInlineAsm.
内联汇编失败了许多优化,尤其是 MSVC 风格的优化,因为它强制存储/重新加载输入。精心编写的 GNU C inline-asm 旋转将允许计数成为编译时常量移位计数的立即操作数,但如果要移位的值也是编译时常量,它仍然无法完全优化掉内联后。 https://gcc.gnu.org/wiki/DontUseInlineAsm。
回答by MSalters
Since it's C++, use an inline function:
因为是 C++,所以使用内联函数:
template <typename INT>
INT rol(INT val) {
return (val << 1) | (val >> (sizeof(INT)*CHAR_BIT-1));
}
C++11 variant:
C++11 变体:
template <typename INT>
constexpr INT rol(INT val) {
static_assert(std::is_unsigned<INT>::value,
"Rotate Left only makes sense for unsigned types");
return (val << 1) | (val >> (sizeof(INT)*CHAR_BIT-1));
}
回答by VolkerK
Most compilers have intrinsics for that. Visual Studio for example _rotr8, _rotr16
大多数编译器都有内在函数。Visual Studio 例如_rotr8、_rotr16
回答by Didac Perez Parera
Definitively:
明确地说:
template<class T>
T ror(T x, unsigned int moves)
{
return (x >> moves) | (x << sizeof(T)*8 - moves);
}
回答by Farhadix
If x is an 8 bit value, you can use this:
如果 x 是 8 位值,则可以使用以下命令:
x=(x>>1 | x<<7);
回答by Abhay
How abt something like this, using the standard bitset ...
使用标准位集,这样的事情如何......
#include <bitset>
#include <iostream>
template <std::size_t N>
inline void
rotate(std::bitset<N>& b, unsigned m)
{
b = b << m | b >> (N-m);
}
int main()
{
std::bitset<8> b(15);
std::cout << b << '\n';
rotate(b, 2);
std::cout << b << '\n';
return 0;
}
HTH,
哈,
回答by S M Kamran
In details you can apply the following logic.
具体来说,您可以应用以下逻辑。
If Bit Pattern is 33602 in Integer
如果位模式是整数 33602
1000 0011 0100 0010
and you need to Roll over with 2 right shifs then: first make a copy of bit pattern and then left shift it: Length - RightShift i.e. length is 16 right shift value is 2 16 - 2 = 14
并且您需要使用 2 个右移进行翻转,然后:首先复制位模式,然后将其左移: Length - RightShift 即长度为 16 右移值是 2 16 - 2 = 14
After 14 times left shifting you get.
左移 14 次后你得到。
1000 0000 0000 0000
Now right shift the value 33602, 2 times as required. You get
现在根据需要将值 33602 右移 2 次。你得到
0010 0000 1101 0000
Now take an OR between 14 time left shifted value and 2 times right shifted value.
现在在 14 次左移值和 2 次右移值之间取一个 OR。
1000 0000 0000 0000 0010 0000 1101 0000 =================== 1010 0000 1101 0000 ===================
And you get your shifted rollover value. Remember bit wise operations are faster and this don't even required any loop.
你会得到你转移的翻转价值。请记住,按位操作更快,这甚至不需要任何循环。
回答by nimrodm
Assuming you want to shift right by L
bits, and the input x
is a number with N
bits:
假设您要按L
位右移,并且输入x
是一个带N
位的数字:
unsigned ror(unsigned x, int L, int N)
{
unsigned lsbs = x & ((1 << L) - 1);
return (x >> L) | (lsbs << (N-L));
}
回答by user3102555
The correct answer is following:
正确答案如下:
#define BitsCount( val ) ( sizeof( val ) * CHAR_BIT )
#define Shift( val, steps ) ( steps % BitsCount( val ) )
#define ROL( val, steps ) ( ( val << Shift( val, steps ) ) | ( val >> ( BitsCount( val ) - Shift( val, steps ) ) ) )
#define ROR( val, steps ) ( ( val >> Shift( val, steps ) ) | ( val << ( BitsCount( val ) - Shift( val, steps ) ) ) )
回答by Andrew
Below is a slightly improved version of Dídac Pérez's answer, with both directions implemented, along with a demo of these functions' usages using unsigned char and unsigned long long values. Several notes:
下面是Dídac Pérez 答案的略微改进版本,实现了两个方向,以及使用 unsigned char 和 unsigned long long 值的这些函数用法的演示。几个注意事项:
- The functions are inlined for compiler optimizations
- I used a
cout << +value
trick for tersely outputting an unsigned char numerically that I found here: https://stackoverflow.com/a/28414758/1599699 - I recommend using the explicit
<put the type here>
syntax for clarity and safety. - I used unsigned char for the shiftNum parameter because of what I found in the Additional Details section here:
- 这些函数被内联用于编译器优化
- 我使用了一个
cout << +value
技巧来简洁地输出我在这里找到的无符号字符:https: //stackoverflow.com/a/28414758/1599699 <put the type here>
为了清晰和安全,我建议使用显式语法。- 我在 shiftNum 参数中使用了无符号字符,因为我在此处的“附加详细信息”部分中找到了以下内容:
The result of a shift operation is undefined if additive-expressionis negative or if additive-expressionis greater than or equal to the number of bits in the (promoted) shift-expression.
如果加法表达式为负或加法表达式大于或等于(提升的)移位表达式中的位数,则移位运算的结果未定义。
Here's the code I'm using:
这是我正在使用的代码:
#include <iostream>
using namespace std;
template <typename T>
inline T rotateAndCarryLeft(T rotateMe, unsigned char shiftNum)
{
static const unsigned char TBitCount = sizeof(T) * 8U;
return (rotateMe << shiftNum) | (rotateMe >> (TBitCount - shiftNum));
}
template <typename T>
inline T rotateAndCarryRight(T rotateMe, unsigned char shiftNum)
{
static const unsigned char TBitCount = sizeof(T) * 8U;
return (rotateMe >> shiftNum) | (rotateMe << (TBitCount - shiftNum));
}
void main()
{
//00010100 == (unsigned char)20U
//00000101 == (unsigned char)5U == rotateAndCarryLeft(20U, 6U)
//01010000 == (unsigned char)80U == rotateAndCarryRight(20U, 6U)
cout << "unsigned char " << 20U << " rotated left by 6 bits == " << +rotateAndCarryLeft<unsigned char>(20U, 6U) << "\n";
cout << "unsigned char " << 20U << " rotated right by 6 bits == " << +rotateAndCarryRight<unsigned char>(20U, 6U) << "\n";
cout << "\n";
for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned char) * 8U; ++shiftNum)
{
cout << "unsigned char " << 21U << " rotated left by " << +shiftNum << " bit(s) == " << +rotateAndCarryLeft<unsigned char>(21U, shiftNum) << "\n";
}
cout << "\n";
for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned char) * 8U; ++shiftNum)
{
cout << "unsigned char " << 21U << " rotated right by " << +shiftNum << " bit(s) == " << +rotateAndCarryRight<unsigned char>(21U, shiftNum) << "\n";
}
cout << "\n";
for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned long long) * 8U; ++shiftNum)
{
cout << "unsigned long long " << 3457347ULL << " rotated left by " << +shiftNum << " bit(s) == " << rotateAndCarryLeft<unsigned long long>(3457347ULL, shiftNum) << "\n";
}
cout << "\n";
for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned long long) * 8U; ++shiftNum)
{
cout << "unsigned long long " << 3457347ULL << " rotated right by " << +shiftNum << " bit(s) == " << rotateAndCarryRight<unsigned long long>(3457347ULL, shiftNum) << "\n";
}
cout << "\n\n";
system("pause");
}