C++ std::bitset 的性能如何?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30295174/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the performance of std::bitset?
提问by quant
I recently asked a question on Programmersregarding reasons to use manual bit manipulation of primitive types over std::bitset
.
我最近在程序员上问了一个关于在std::bitset
.
From that discussion I have concluded that the main reason is its comparatively poorer performance, although I'm not aware of any measured basis for this opinion. So next question is:
从那次讨论中我得出结论,主要原因是它的性能相对较差,尽管我不知道这种意见有任何衡量依据。那么下一个问题是:
what isthe performance hit, if any, likely to be incurred by using std::bitset
over bit-manipulation of a primitive?
使用对原语的过度位操作可能会导致性能下降(如果有的话)是什么std::bitset
?
The question is intentionally broad, because after looking online I haven't been able to find anything, so I'll take what I can get. Basically I'm after a resource that provides some profiling of std::bitset
vs 'pre-bitset' alternatives to the same problems on some common machine architecture using GCC, Clang and/or VC++. There is a very comprehensive paper which attempts to answer this question for bit vectors:
这个问题故意宽泛,因为在网上查找后我找不到任何东西,所以我会尽我所能。基本上,我在std::bitset
寻找一个资源,该资源提供了对使用 GCC、Clang 和/或 VC++ 的一些常见机器架构上的相同问题的“pre-bitset”替代方案的一些分析。有一篇非常全面的论文试图回答位向量的这个问题:
http://www.cs.up.ac.za/cs/vpieterse/pub/PieterseEtAl_SAICSIT2010.pdf
http://www.cs.up.ac.za/cs/vpieterse/pub/PieterseEtAl_SAICSIT2010.pdf
Unfortunately, it either predates or considered out of scope std::bitset
, so it focuses on vectors/dynamic array implementations instead.
不幸的是,它要么早于,要么被认为超出了范围std::bitset
,所以它专注于向量/动态数组实现。
I really just want to know whether std::bitset
is betterthan the alternatives for the use cases it is intended to solve. I already know that it is easierand clearerthan bit-fiddling on an integer, but is it as fast?
我真的只是想知道是否std::bitset
是更好的比使用情况下,它是为了解决方案。我已经知道它比对整数进行位摆弄更容易和更清晰,但它是否一样快?
采纳答案by Dragon Energy
Update
更新
It's been ages since I posted this one, but:
自从我发布这个已经很久了,但是:
I already know that it is easier and clearer than bit-fiddling on an integer, but is it as fast?
我已经知道它比对整数进行位摆弄更容易和更清晰,但它是否一样快?
If you are using bitset
in a way that does actually make it clearer and cleaner than bit-fiddling, like checking for one bit at a time instead of using a bit mask, then inevitably you lose all those benefits that bitwise operations provide, like being able to check to see if 64 bits are set at one time against a mask, or using FFS instructions to quickly determine which bit is set among 64-bits.
如果您使用bitset
的方式确实比按位操作更清晰、更干净,例如一次检查一个位而不是使用位掩码,那么您将不可避免地失去按位运算提供的所有好处,例如能够检查是否针对掩码一次设置了 64 位,或使用 FFS 指令快速确定 64 位中设置了哪一位。
I'm not sure that bitset
incurs a penalty to use in all ways possible (ex: using its bitwise operator&
), but if you use it likea fixed-size boolean array which is pretty much the way I always see people using it, then you generally lose all those benefits described above. We unfortunately can't get that level of expressiveness of just accessing one bit at a time with operator[]
and have the optimizer figure out all the bitwise manipulations and FFS and FFZ and so forth going on for us, at least not since the last time I checked (otherwise bitset
would be one of my favorite structures).
我不确定bitset
以所有可能的方式使用会产生惩罚(例如:使用它的按位operator&
),但是如果你像固定大小的布尔数组一样使用它,这几乎是我经常看到人们使用它的方式,那么你通常会失去上述所有这些好处。不幸的是,我们无法获得一次只访问一个位的表现力,operator[]
并让优化器为我们找出所有按位操作以及 FFS 和 FFZ 等,至少从我上次检查以来没有(否则bitset
将是我最喜欢的结构之一)。
Now if you are going to use bitset<N> bits
interchangeably with like, say, uint64_t bits[N/64]
as in accessing both the same way using bitwise operations, it might be on par (haven't checked since this ancient post). But then you lose many of the benefits of using bitset
in the first place.
现在,如果您要bitset<N> bits
与 like 互换使用,例如,uint64_t bits[N/64]
在使用按位运算以相同的方式访问两者时,它可能是相同的(自从这篇古老的帖子以来就没有检查过)。但是,您首先失去了使用的许多好处bitset
。
for_each
method
for_each
方法
In the past I got into some misunderstandings, I think, when I proposed a for_each
method to iterate through things like vector<bool>
, deque
, and bitset
. The point of such a method is to utilize the internal knowledge of the container to iterate through elements more efficiently while invoking a functor, just as some associative containers offer a find
method of their own instead of using std::find
to do a better than linear-time search.
在过去,我钻进了一些误解,我想,当我提出了一个for_each
贯穿东西像方法循环vector<bool>
,deque
和bitset
。这种方法的重点是利用容器的内部知识在调用函子时更有效地遍历元素,就像一些关联容器提供find
自己的方法而不是std::find
用来做比线性时间搜索更好的方法。
For example, you can iterate through all set bits of a vector<bool>
or bitset
if you had internal knowledge of these containers by checking for 64 elements at a time using a 64-bit mask when 64 contiguous indices are occupied, and likewise use FFS instructions when that's not the case.
例如,当 64 个连续索引被占用时,您可以通过使用 64 位掩码一次检查 64 个元素来迭代 a 的所有设置位,vector<bool>
或者bitset
如果您有这些容器的内部知识,同样使用 FFS 指令,当这不是案子。
But an iterator design having to do this type of scalar logic in operator++
would inevitably have to do something considerably more expensive, just by the nature in which iterators are designed in these peculiar cases. bitset
lacks iterators outright and that often makes people wanting to use it to avoid dealing with bitwise logic to use operator[]
to check each bit individually in a sequential loop that just wants to find out which bits are set. That too is not nearly as efficient as what a for_each
method implementation could do.
但是,迭代器设计必须执行这种类型的标量逻辑,operator++
因此不可避免地必须做一些相当昂贵的事情,这取决于在这些特殊情况下设计迭代器的性质。bitset
完全缺乏迭代器,这通常使人们想要使用它来避免处理按位逻辑,operator[]
以便在顺序循环中单独检查每个位,而只是想找出设置了哪些位。这也远不如for_each
方法实现的效率高。
Double/Nested Iterators
双/嵌套迭代器
Another alternative to the for_each
container-specific method proposed above would be to use double/nested iterators: that is, an outer iterator which points to a sub-range of a different type of iterator. Client code example:
for_each
上面提出的特定于容器的方法的另一种替代方法是使用双/嵌套迭代器:即指向不同类型迭代器的子范围的外部迭代器。客户端代码示例:
for (auto outer_it = bitset.nbegin(); outer_it != bitset.nend(); ++outer_it)
{
for (auto inner_it = outer_it->first; inner_it != outer_it->last; ++inner_it)
// do something with *inner_it (bit index)
}
While not conforming to the flat type of iterator design available now in standard containers, this can allow some very interesting optimizations. As an example, imagine a case like this:
虽然不符合现在在标准容器中可用的扁平类型的迭代器设计,但这可以允许一些非常有趣的优化。举个例子,想象一个这样的案例:
bitset<64> bits = 0x1fbf; // 0b1111110111111;
In that case, the outer iterator can, with just a few bitwise iterations ((FFZ/or/complement), deduce that the first range of bits to process would be bits [0, 6), at which point we can iterate through that sub-range very cheaply through the inner/nested iterator (it would just increment an integer, making ++inner_it
equivalent to just ++int
). Then when we increment the outer iterator, it can then very quickly, and again with a few bitwise instructions, determine that the next range would be [7, 13). After we iterate through that sub-range, we're done. Take this as another example:
在这种情况下,外部迭代器可以通过几次按位迭代((FFZ/or/complement),推断出要处理的第一个位范围将是位 [0, 6),此时我们可以遍历那个通过内部/嵌套迭代器非常便宜地划分子范围(它只会增加一个整数,++inner_it
相当于 just ++int
)。然后,当我们增加外部迭代器时,它可以非常快速地,再次使用一些位指令,确定下一个范围是 [7, 13)。在我们遍历那个子范围之后,我们就完成了。再举一个例子:
bitset<16> bits = 0xffff;
In such a case, the first and last sub-range would be [0, 16)
, and the bitset could determine that with a single bitwise instruction at which point we can iterate through all set bits and then we're done.
在这种情况下,第一个和最后一个子范围将是[0, 16)
,并且 bitset 可以通过单个按位指令确定,此时我们可以遍历所有设置的位,然后我们就完成了。
This type of nested iterator design would map particularly well to vector<bool>
, deque
, and bitset
as well as other data structures people might create like unrolled lists.
这种类型的嵌套迭代器设计将映射特别好地vector<bool>
,deque
以及bitset
还有其他数据结构的人可能创造这样展开的列表。
I say that in a way that goes beyond just armchair speculation, since I have a set of data structures which resemble the likes of deque
which are actually on par with sequential iteration of vector
(still noticeably slower for random-access, especially if we're just storing a bunch of primitives and doing trivial processing). However, to achieve the comparable times to vector
for sequential iteration, I had to use these types of techniques (for_each
method and double/nested iterators) to reduce the amount of processing and branching going on in each iteration. I could not rival the times otherwise using just the flat iterator design and/or operator[]
. And I'm certainly not smarter than the standard library implementers but came up with a deque
-like container which can be sequentially iterated much faster, and that strongly suggests to me that it's an issue with the standard interface design of iterators in this case which come with some overhead in these peculiar cases that the optimizer cannot optimize away.
我这样说的方式不仅仅是扶手椅的猜测,因为我有一组类似的数据结构,deque
实际上与顺序迭代相同vector
(对于随机访问仍然明显较慢,特别是如果我们只是存储一堆原语并进行微不足道的处理)。然而,为了达到与vector
顺序迭代相当的时间,我不得不使用这些类型的技术(for_each
方法和双/嵌套迭代器)来减少每次迭代中进行的处理和分支的数量。我无法与时代匹敌,否则仅使用平面迭代器设计和/或operator[]
. 我当然不比标准库实现者更聪明,但想出了一个deque
-like 容器可以更快地按顺序迭代,这强烈向我表明,在这种情况下,迭代器的标准接口设计存在问题,在优化器无法优化的这些特殊情况下会带来一些开销。
Old Answer
旧答案
I'm one of those who would give you a similar performance answer, but I'll try to give you something a bit more in-depth than "just because"
. It is something I came across through actual profiling and timing, not merely distrust and paranoia.
我是那些会给你类似性能答案的人之一,但我会尝试给你一些比"just because"
. 这是我通过实际分析和时机遇到的事情,而不仅仅是不信任和偏执。
One of the biggest problems with bitset
and vector<bool>
is that their interface design is "too convenient" if you want to use them like an array of booleans. Optimizers are great at obliterating all that structure you establish to provide safety, reduce maintenance cost, make changes less intrusive, etc. They do an especially fine job with selecting instructions and allocating the minimal number of registers to make such code run as fast as the not-so-safe, not-so-easy-to-maintain/change alternatives.
bitset
and的最大问题之一vector<bool>
是,如果您想像使用布尔数组一样使用它们,它们的界面设计“太方便了”。优化器非常擅长消除您建立的所有结构以提供安全性、降低维护成本、减少更改的侵入性等。它们在选择指令和分配最少数量的寄存器方面做得特别出色,以使此类代码运行得与不太安全,不太容易维护/更改的替代方案。
The part that makes the bitset interface "too convenient" at the cost of efficiency is the random-access operator[]
as well as the iterator design for vector<bool>
. When you access one of these at index n
, the code has to first figure out which byte the nth bit belongs to, and then the sub-index to the bit within that. That first phase typically involves a division/rshifts against an lvalue along with modulo/bitwise and which is more costly than the actual bit operation you're trying to perform.
以效率为代价使 bitset 接口“太方便”的部分是随机访问operator[]
以及vector<bool>
. 当您在 index 处访问其中一个时n
,代码必须首先确定第 n 位属于哪个字节,然后是该位的子索引。第一阶段通常涉及对左值进行除法/rshifts 以及模/按位运算,这比您尝试执行的实际位操作成本更高。
The iterator design for vector<bool>
faces a similar awkward dilemma where it either has to branch into different code every 8+ times you iterate through it or pay that kind of indexing cost described above. If the former is done, it makes the logic asymmetrical across iterations, and iterator designs tend to take a performance hit in those rare cases. To exemplify, if vector
had a for_each
method of its own, you could iterate through, say, a range of 64 elements at once by just masking the bits against a 64-bit mask for vector<bool>
if all the bits are set without checking each bit individually. It could even use FFSto figure out the range all at once. An iterator design would tend to inevitably have to do it in a scalar fashion or store more state which has to be redundantly checked every iteration.
迭代器设计vector<bool>
面临着类似的尴尬困境,要么每迭代 8 次以上就必须分支到不同的代码中,要么支付上述那种索引成本。如果前者完成,它会使迭代之间的逻辑不对称,并且迭代器设计在这些罕见的情况下往往会降低性能。举例来说,如果vector
有for_each
自己的方法,您可以一次遍历 64 个元素的范围,只需根据 64 位掩码对位进行掩码,以vector<bool>
确定是否设置了所有位,而无需单独检查每个位。它甚至可以使用FFS一次找出所有范围。迭代器设计往往不可避免地必须以标量方式进行,或者存储更多状态,每次迭代都必须对其进行冗余检查。
For random access, optimizers can't seem to optimize away this indexing overhead to figure out which byte and relative bit to access (perhaps a bit too runtime-dependent) when it's not needed, and you tend to see significant performance gains with that more manual code processing bits sequentially with advanced knowledge of which byte/word/dword/qword it's working on. It's somewhat of an unfair comparison, but the difficulty with std::bitset
is that there's no way to make a fair comparison in such cases where the code knows what byte it wants to access in advance, and more often than not, you tend to have this info in advance. It's an apples to orange comparison in the random-access case, but you often only need oranges.
对于随机访问,优化器似乎无法优化掉这种索引开销,以便在不需要时确定要访问的字节和相关位(可能有点过于依赖运行时),并且您往往会看到显着的性能提升更多手动代码按顺序处理位,并具有其正在处理的字节/字/双字/qword 的高级知识。这有点不公平的比较,但困难std::bitset
在于,在代码事先知道它想要访问的字节的情况下,没有办法进行公平的比较,而且通常情况下,您倾向于将这些信息放在进步。这是随机访问情况下的苹果与橙子的比较,但您通常只需要橙子。
Perhaps that wouldn't be the case if the interface design involved a bitset
where operator[]
returned a proxy, requiring a two-index access pattern to use. For example, in such a case, you would access bit 8 by writing bitset[0][6] = true; bitset[0][7] = true;
with a template parameter to indicate the size of the proxy (64-bits, e.g.). A good optimizer may be able to take such a design and make it rival the manual, old school kind of way of doing the bit manipulation by hand by translating that into: bitset |= 0x60;
如果接口设计涉及bitset
whereoperator[]
返回代理,需要使用双索引访问模式,则情况可能并非如此。例如,在这种情况下,您可以通过写入bitset[0][6] = true; bitset[0][7] = true;
模板参数来访问第 8 位以指示代理的大小(例如 64 位)。一个好的优化器可能能够采用这样的设计,并通过将其转换为:bitset |= 0x60;
Another design that might help is if bitsets
provided a for_each_bit
kind of method, passing a bit proxy to the functor you provide. That might actually be able to rival the manual method.
另一种可能有帮助的设计是如果bitsets
提供for_each_bit
一种方法,将位代理传递给您提供的函子。这实际上可能可以与手动方法相媲美。
std::deque
has a similar interface problem. Its performance shouldn't be thatmuch slower than std::vector
for sequential access. Yet unfortunately we access it sequentially using operator[]
which is designed for random access or through an iterator, and the internal rep of deques simply don't map very efficiently to an iterator-based design. If deque provided a for_each
kind of method of its own, then there it could potentially start to get a lot closer to std::vector's
sequential access performance. These are some of the rare cases where that Sequence interface design comes with some efficiency overhead that optimizers often can't obliterate. Often good optimizers can make convenience come free of runtime cost in a production build, but unfortunately not in all cases.
std::deque
有类似的接口问题。它的性能不应该是比慢得多std::vector
顺序访问。然而不幸的是,我们使用operator[]
which 为随机访问或通过迭代器设计的顺序访问它,并且双端队列的内部表示根本不能非常有效地映射到基于迭代器的设计。如果 deque 提供了for_each
一种自己的方法,那么它可能会开始更接近std::vector's
顺序访问性能。这些是 Sequence 接口设计带来一些优化器通常无法消除的效率开销的罕见情况。通常,好的优化器可以使生产构建中的便利免于运行时成本,但不幸的是并非在所有情况下。
Sorry!
对不起!
Also sorry, in retrospect I wandered a bit with this post talking about vector<bool>
and deque
in addition to bitset
. It's because we had a codebase where the use of these three, and particularly iterating through them or using them with random-access, were often hotspots.
同样遗憾,回想起来我徘徊了一下这条信息中谈论vector<bool>
和deque
除bitset
。这是因为我们有一个代码库,其中这三个的使用,特别是迭代它们或使用它们进行随机访问,通常是热点。
Apples to Oranges
苹果到橙子
As emphasized in the old answer, comparing straightforward usage of bitset
to primitive types with low-level bitwise logic is comparing apples to oranges. It's not like bitset
is implemented very inefficiently for what it does. If you genuinely need to access a bunch of bits with a random access pattern which, for some reason or other, needs to check and set just one bit a time, then it might be ideally implemented for such a purpose. But my point is that almost all use cases I've encountered didn't require that, and when it's not required, the old school way involving bitwise operations tends to be significantly more efficient.
正如旧答案中所强调的那样,将bitset
原始类型的直接用法与低级按位逻辑进行比较是将苹果与橙子进行比较。它不像bitset
它所做的那样执行效率很低。如果您真的需要使用随机访问模式访问一堆位,出于某种原因,需要一次检查和设置一个位,那么它可能非常适合用于此类目的。但我的观点是,我遇到的几乎所有用例都不需要这样做,当不需要时,涉及按位运算的老派方法往往效率更高。
回答by metamorphosis
Did a short test profiling std::bitset vs bool arrays for sequential and random access - you can too:
做了一个简短的测试分析 std::bitset 与 bool 数组的顺序和随机访问 - 你也可以:
#include <iostream>
#include <bitset>
#include <cstdlib> // rand
#include <ctime> // timer
inline unsigned long get_time_in_ms()
{
return (unsigned long)((double(clock()) / CLOCKS_PER_SEC) * 1000);
}
void one_sec_delay()
{
unsigned long end_time = get_time_in_ms() + 1000;
while(get_time_in_ms() < end_time)
{
}
}
int main(int argc, char **argv)
{
srand(get_time_in_ms());
using namespace std;
bitset<5000000> bits;
bool *bools = new bool[5000000];
unsigned long current_time, difference1, difference2;
double total;
one_sec_delay();
total = 0;
current_time = get_time_in_ms();
for (unsigned int num = 0; num != 200000000; ++num)
{
bools[rand() % 5000000] = rand() % 2;
}
difference1 = get_time_in_ms() - current_time;
current_time = get_time_in_ms();
for (unsigned int num2 = 0; num2 != 100; ++num2)
{
for (unsigned int num = 0; num != 5000000; ++num)
{
total += bools[num];
}
}
difference2 = get_time_in_ms() - current_time;
cout << "Bool:" << endl << "sum total = " << total << ", random access time = " << difference1 << ", sequential access time = " << difference2 << endl << endl;
one_sec_delay();
total = 0;
current_time = get_time_in_ms();
for (unsigned int num = 0; num != 200000000; ++num)
{
bits[rand() % 5000000] = rand() % 2;
}
difference1 = get_time_in_ms() - current_time;
current_time = get_time_in_ms();
for (unsigned int num2 = 0; num2 != 100; ++num2)
{
for (unsigned int num = 0; num != 5000000; ++num)
{
total += bits[num];
}
}
difference2 = get_time_in_ms() - current_time;
cout << "Bitset:" << endl << "sum total = " << total << ", random access time = " << difference1 << ", sequential access time = " << difference2 << endl << endl;
delete [] bools;
cin.get();
return 0;
}
Please note: the outputting of the sum total is necessary so the compiler doesn't optimise out the for loop - which some do if the result of the loop isn't used.
请注意:总和的输出是必要的,因此编译器不会优化 for 循环 - 如果不使用循环的结果,有些会这样做。
Under GCC x64 with the following flags: -O2;-Wall;-march=native;-fomit-frame-pointer;-std=c++11; I get the following results:
在具有以下标志的 GCC x64 下:-O2;-Wall;-march=native;-fomit-frame-pointer;-std=c++11; 我得到以下结果:
Bool array: random access time = 4695, sequential access time = 390
Bool 数组:随机访问时间 = 4695,顺序访问时间 = 390
Bitset: random access time = 5382, sequential access time = 749
Bitset:随机访问时间=5382,顺序访问时间=749
回答by cmaster - reinstate monica
In addition to what the other answers said about the performance of access, there may also be a significant space overhead: Typical bitset<>
implementations simply use the longest integer type to back their bits. Thus, the following code
除了其他答案所说的关于访问性能的内容之外,还可能存在大量空间开销:典型的bitset<>
实现只是使用最长的整数类型来支持它们的位。于是,下面的代码
#include <bitset>
#include <stdio.h>
struct Bitfield {
unsigned char a:1, b:1, c:1, d:1, e:1, f:1, g:1, h:1;
};
struct Bitset {
std::bitset<8> bits;
};
int main() {
printf("sizeof(Bitfield) = %zd\n", sizeof(Bitfield));
printf("sizeof(Bitset) = %zd\n", sizeof(Bitset));
printf("sizeof(std::bitset<1>) = %zd\n", sizeof(std::bitset<1>));
}
produces the following output on my machine:
在我的机器上产生以下输出:
sizeof(Bitfield) = 1
sizeof(Bitset) = 8
sizeof(std::bitset<1>) = 8
As you see, my compiler allocates a whopping 64 bits to store a single one, with the bitfield approach, I only need to round up to eight bits.
如您所见,我的编译器分配了高达 64 位来存储单个位,使用位域方法,我只需要四舍五入到 8 位。
This factor eight in space usage can become important if you have a lot of small bitsets.
如果您有很多小位集,那么空间使用的这个因素八可能会变得很重要。
回答by Stewart
Not a great answer here, but rather a related anecdote:
这里不是一个很好的答案,而是一个相关的轶事:
A few years ago I was working on real-time software and we ran into scheduling problems. There was a module which was way over time-budget, and this was very surprising because the module was only responsible for some mapping and packing/unpacking of bits into/from 32-bit words.
几年前,我正在研究实时软件,我们遇到了调度问题。有一个模块超出了时间预算,这非常令人惊讶,因为该模块只负责将位映射和打包/解包到 32 位字/从 32 位字。
It turned out that the module was using std::bitset. We replaced this with manual operations and the execution time decreased from 3 milliseconds to 25 microseconds. That was a significant performance issue and a significant improvement.
事实证明,该模块正在使用 std::bitset。我们将其替换为手动操作,执行时间从 3 毫秒减少到 25 微秒。这是一个重大的性能问题和重大改进。
The point is, the performance issues caused by this class can be very real.
关键是,此类引起的性能问题可能非常真实。
回答by Yankes
Rhetorical question: Why std::bitset
is written in that inefficacy way?
Answer: It is not.
反问:为什么std::bitset
写成这种无效的方式?答:不是。
Another rhetorical question: What is difference between:
另一个反问:有什么区别:
std::bitset<128> a = src;
a[i] = true;
a = a << 64;
and
和
std::bitset<129> a = src;
a[i] = true;
a = a << 63;
Answer: 50 times difference in performance http://quick-bench.com/iRokweQ6JqF2Il-T-9JSmR0bdyw
答:50 倍的性能差异http://quick-bench.com/iRokweQ6JqF2Il-T-9JSmR0bdyw
You need be very careful what you ask for, bitset
support lot of things but each have it own cost. With correct handling you will have exactly same behavior as raw code:
你需要非常小心你的要求,bitset
支持很多事情,但每个都有自己的成本。通过正确处理,您将拥有与原始代码完全相同的行为:
void f(std::bitset<64>& b, int i)
{
b |= 1L << i;
b = b << 15;
}
void f(unsigned long& b, int i)
{
b |= 1L << i;
b = b << 15;
}
Both generate same assembly: https://godbolt.org/g/PUUUyd(64 bit GCC)
两者都生成相同的程序集:https: //godbolt.org/g/PUUUyd(64 位 GCC)
Another thing is that bitset
is more portable but this have cost too:
另一件事是bitset
更便携,但这也有成本:
void h(std::bitset<64>& b, unsigned i)
{
b = b << i;
}
void h(unsigned long& b, unsigned i)
{
b = b << i;
}
If i > 64
then bit set will be zero and in case of unsigned we have UB.
如果i > 64
那么位设置为零,并且在无符号的情况下我们有 UB。
void h(std::bitset<64>& b, unsigned i)
{
if (i < 64) b = b << i;
}
void h(unsigned long& b, unsigned i)
{
if (i < 64) b = b << i;
}
With check preventing UB both generate same code.
检查防止 UB 都生成相同的代码。
Another place is set
and []
, first one is safe and mean you will never get UB but this will cost you a branch. []
have UB if you use wrong value but is fast as using var |= 1L<< i;
. Of corse if std::bitset
do not need have more bits than biggest int available on system because other wise you need split value to get correct element in internal table. This mean for std::bitset<N>
size N
is very important for performance. If is bigger or smaller than optimal one you will pay cost of it.
另一个地方是set
and []
,第一个是安全的,这意味着你永远不会得到 UB 但这会花费你一个分支。[]
如果您使用错误的值,则使用 UB 但使用var |= 1L<< i;
. 当然,如果std::bitset
不需要比系统上可用的最大整数更多的位,因为否则你需要拆分值来获得内部表中的正确元素。这意味着std::bitset<N>
大小N
对性能非常重要。如果大于或小于最佳值,您将支付它的成本。
Overall I find that best way is use something like that:
总的来说,我发现最好的方法是使用类似的东西:
constexpr size_t minBitSet = sizeof(std::bitset<1>)*8;
template<size_t N>
using fasterBitSet = std::bitset<minBitSet * ((N + minBitSet - 1) / minBitSet)>;
This will remove cost of trimming exceeding bits: http://quick-bench.com/Di1tE0vyhFNQERvucAHLaOgucAY
这将消除修剪超过位的成本:http: //quick-bench.com/Di1tE0vyhFNQERvucAHLaOgucAY