C++ std::bitset 如何比 std::vector<bool> 更快?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4156538/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can std::bitset be faster than std::vector<bool>?
提问by Martin Ba
According to this answerthe poster expects a std::bitset
of size 100k bits to be faster than a std::vector<bool>
when querying individual bits. How can this be possible?
根据此答案,发布者预计在查询单个位时std::bitset
,大小为 100k 位的 a 比 a 快std::vector<bool>
。这怎么可能?
How could they even differ significantly in their implementation, if std::bitset
apparently allows for arbitrary sizes just like std::vector
?
如果std::bitset
显然允许像std::vector
?
采纳答案by Martin Ba
Measurements on Visual Studio 2010 show that std::bitset
is notgenerally faster than std::vector<bool>
. What the exact reason for this is I cannot say -- only that bitset is implemented significantly different from the std::vector full specialization.
Visual Studio 2010 上的测量表明,std::bitset
它通常不会比std::vector<bool>
. 我不能说具体的原因是什么——只是 bitset 的实现与 std::vector 完全专业化显着不同。
std::bitset stores it's full content in the object via a
std::bitset 通过一个对象将它的全部内容存储在对象中
template<size_t _Bits>
class bitset .....
_Ty _Array[_Words + 1]; // the set of bits
};
array and that makes large bitset unsuitable to be put on the stack -- which isn't a performance argument per se.
数组,这使得大 bitset 不适合放在堆栈上——这本身不是一个性能参数。
vector<bool>
doesn't suffer from the stack problem, and testing with a size of 1e6 and 1e7 it seems that on my box here querying values in a loop is actually 2x faster with a vector.
vector<bool>
不会受到堆栈问题的影响,并且使用 1e6 和 1e7 的大小进行测试时,似乎在我的盒子上,这里在循环中查询值实际上是使用向量的 2 倍。
Well. I guess the usual timing caveats apply and YMMV, but here's the test code I used should anyone care to try himself:
好。我想通常的时间警告适用和 YMMV,但这是我使用的测试代码,如果有人愿意尝试的话:
The output on my box is:
我的盒子上的输出是:
1
vector<bool> loop with a size of 10000000 and 10 iterations*n: 11187 ms
bitset<10000000> loop with 10 iterations*n: 22719 ms
101250010
Press any key to continue . . .
BitMap.cpp
位图文件
#include "stdafx.h"
#include "BitMap.h"
using namespace std;
// Global var to prevent optimizer from messing things up
volatile size_t ext;
volatile clock_t t1;
volatile clock_t t2;
double delta1;
double delta2;
int main(int argc, _TCHAR* argv[])
{
ext = 1;
printf("%d\n", ext);
vb_t *const vec = new vb_t(bssz);
bs_t *const bits = new bs_t(); // must put large bitset on heap
const int iter = 10;
delta1=0;
delta2=0;
for(int o=0; o<5; ++o) {
t1 = clock();
for(int i=0; i!=5; ++i)
bs_loop(iter, *vec);
t2 = clock();
delta1 += t2-t1;
t1 = clock();
for(int i=0; i!=5; ++i)
bs_loop(iter, *bits);
t2 = clock();
delta2 += t2-t1;
}
delta1 /= CLOCKS_PER_SEC;
delta2 /= CLOCKS_PER_SEC;
delta1 *= 1000;
delta2 *= 1000;
cout << "vector<bool> loop with a size of " << bssz << " and " << iter << " iterations*n: " << delta1 << " ms\n";
cout << "bitset<" << bssz << "> loop with " << iter << " iterations*n: " << delta2 << " ms\n";
printf("%d\n", ext);
delete vec;
delete bits;
return 0;
}
BitMap.h
位图.h
#pragma once
#include <vector>
#include <bitset>
extern volatile size_t ext;
const size_t bssz = size_t(1e7); // 1e7 ca 10m
using namespace std; // Test code, using here is OK.
typedef vector<bool> vb_t;
typedef bitset<bssz> bs_t;
template<class COLL>
void bs_loop(const int iterations, COLL const& v);
bs_loop.cpp
bs_loop.cpp
#include "stdafx.h"
#include "BitMap.h"
template<class COLL>
void bs_loop(const int iterations, COLL const& v)
{
ext = sizeof(COLL);
for(size_t i=0; i!=iterations; ++i) {
++ext;
for(size_t j=0, e=v.size(); j!=e; ++j) {
if(v[j]) {
--ext;
}
else {
++ext;
}
}
}
}
template
void bs_loop(const int iterations, vb_t const& v);
template
void bs_loop(const int iterations, bs_t const& v);
Compiler command line:
编译器命令行:
/Zi /nologo /W3 /WX- /O2 /Oi /Oy- /D "WIN32" /D "NDEBUG"
/D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm- /EHsc /GS /Gy
/fp:precise /Zc:wchar_t /Zc:forScope /Yu"StdAfx.h" /Fp"Release\BitMap.pch"
/Fa"Release\" /Fo"Release\" /Fd"Release\vc100.pdb" /Gd /analyze-
/errorReport:queue
note the /O2 and the missing/GL (no whole prg opt).
注意 /O2 和缺失的/GL(没有整个 prg 选项)。
回答by Steve M
Well, since I'm the guy you're basing this question on, here's where I got that idea from:
好吧,因为我是你提出这个问题的人,所以我从这里得到了这个想法:
"…it packs the bools and stores them as individual bits (inside, say, chars) in its internal representation. One consequence of this is that it can't just return a normal bool& from its operator[] or its dereferenced iterators[2]; instead, it has to play games with a helper "proxy" class that is bool-like but is definitely not a bool. Unfortunately, that also means that access into a vector<bool>
is slower, because we have to deal with proxies instead of direct pointers and references.
“……它将 bool 打包并在其内部表示中将它们存储为单独的位(例如,字符)。这样做的一个后果是它不能从其 operator[] 或其取消引用的迭代器 [2] 中返回一个普通的 bool& ]; 相反,它必须使用类似于 bool 但绝对不是 bool 的辅助“代理”类来玩游戏。不幸的是,这也意味着访问 avector<bool>
的速度较慢,因为我们必须处理代理而不是直接指针和引用。
…
…
Bottom line: If you care more about speed than you do about size, you shouldn't use std::vector<bool>
. Instead, you should hack around this optimization by using a std::vector<char>
or the like instead, which is unfortunate but still the best you can do."
底线:如果您更关心速度而不是大小,那么您不应该使用std::vector<bool>
. 相反,您应该通过使用 astd::vector<char>
或类似方法来绕过此优化,这很不幸,但仍然是您能做的最好的事情。”
Or, as I recommended, if you know the biggest size that your set will get, use std::bitset
.
或者,正如我所建议的,如果您知道您的集合将获得的最大尺寸,请使用std::bitset
.
回答by Raffaello
Honestly I think bitset it's better to use in the stack and not on the heap. moreover the two are not in conflict with one to another because an elegant solution can be something like this:
老实说,我认为 bitset 最好在堆栈中使用而不是在堆上使用。此外,两者并不相互冲突,因为一个优雅的解决方案可以是这样的:
vector< bitset<64> > v(100000) //or whatever...
it can be interesting a test in comparison of this two instead:
比较这两者的测试可能会很有趣:
vector<unsigned char> v1(1000000) //8 bits to manage them manually
vector< bitset<8> > v2(1000000) //8 bits managed by bitset
Moreover, just for adding to the answers here and remind how the compiler depend A LOT in the performance too, here's a simple test done with:
此外,只是为了添加到这里的答案并提醒编译器如何在性能中也有很多依赖,这里有一个简单的测试:
- VS2012
- mingw/g++ 4.7.0
- g++ 4.8.2 on Ubuntu
- VS2012
- 明威/克++ 4.7.0
- Ubuntu 上的 g++ 4.8.2
(but all these tests are a little bit tricky and maybe give us only the rough general idea for a DIRECT comparison. Profiling the project it's the only thing in the end to do.)
(但所有这些测试都有点棘手,也许只能为我们提供直接比较的粗略总体思路。对项目进行分析是最后要做的唯一事情。)
- VS2012 compiled in Release (default Release provided).
- g++ compiled with -O2
- g++ cimpiled with -O2 -std=c++11
- VS2012 在 Release 中编译(提供默认 Release)。
- g++ 用 -O2 编译
- g++ 用 -O2 -std=c++11 编译
NOTE:
笔记:
with size 10^7:
大小为 10^7:
- VS2012 crash at runtime. (So I can assume have a memory management different from g++)
- g++ it's ok.
- g++11 have a problem with test2() reporting time =0, I printed out some values just to trigger the code execution. (I suppose it's a compiler optimization).
- VS2012 在运行时崩溃。(所以我可以假设有一个不同于 g++ 的内存管理)
- g++ 没关系。
- g++11 有 test2() 报告时间 =0 的问题,我打印了一些值只是为了触发代码执行。(我想这是一个编译器优化)。
I included the overhead time of constructor and destructor of the objects too.
我也包括了对象的构造函数和析构函数的开销时间。
here the simple test code:
这里是简单的测试代码:
#include <iostream>
#include <vector>
#include <bitset>
#include <time.h>
using namespace std;
#define SIZE1 1000000000 //10e9
//#define SIZE2 10000000 //10e7 VS2012 crash at runtime, g++ OK
#define SIZE2 1000000 //10e6
void test1()
{
register bool j;
clock_t t1,t2;
cout.precision(10);
t1=clock();
vector<bool> *v = new vector<bool>(SIZE1);
for(register long int i=0; i<SIZE1;i++)
(*v)[i] = i%2 == 0? true :false;
for(register long int i=0; i<SIZE1;i++)
j=(*v)[i];
delete v;
t2=clock();
cout << "vector speed = " << (t2-t1) / (float) CLOCKS_PER_SEC << " (" << t2 << "," << t1 << ")" << endl;
t1=clock();
bitset<SIZE1> *b = new bitset<SIZE1>();
for(register long int i=0; i<SIZE1;i++)
(*b)[i] = i%2 == 0? true :false;
for(register long int i=0; i<SIZE1;i++)
j=(*b)[i];
delete b;
t2=clock();
cout << "bitset speed = " << (t2-t1) / (float) CLOCKS_PER_SEC << " (" << t2 << "," << t1 << ")" << endl;
}
void test2()
{
register bool j;
clock_t t1,t2;
cout.precision(10);
t1=clock();
vector<bool> v(SIZE2);
for(register int k=0; k<SIZE1/SIZE2; k++)
for(register long int i=0; i<SIZE2;i++)
(v)[i] = i%2 == 0? true :false;
for(register int k=0; k<SIZE1/SIZE2; k++)
for(register long int i=0; i<SIZE2;i++)
j=(v)[i];
t2=clock();
cout << "vector speed = " << (t2-t1) / (float) CLOCKS_PER_SEC << " (" << t2 << "," << t1 << ")" << endl;
cout << "v[1], v[2] " << (int) v[1] << ", "<< (int)v[2] << endl;
t1=clock();
bitset<SIZE2> b;
for(register int k=0; k<SIZE1/SIZE2; k++)
for(register long int i=0; i<SIZE2;i++)
(b)[i] = i%2 == 0? true :false;
for(register int k=0; k<SIZE1/SIZE2; k++)
for(register long int i=0; i<SIZE2;i++)
j=(b)[i];
t2=clock();
cout << "bitset speed = " << (t2-t1) / (float) CLOCKS_PER_SEC << " (" << t2 << "," << t1 << ")" << endl;
cout << "b[1], b[2] " << (int) b[1] << ", "<< (int)b[2] << endl;
}
int main(int argc, char* argv[])
{
test1();
test2();
return 0;
}
VS2012 output:
VS2012 输出:
vector speed = 3.105000019 (3105,0)
bitset speed = 10.44400024 (13551,3107)
vector speed = 3.987999916 (17542,13554)
v[1], v[2] 0, 1
bitset speed = 9.772999763 (27318,17545)
b[1], b[2] 0, 1
mingw/g++ output -O2:
mingw/g++ 输出 -O2:
vector speed = 1.519 (1520,1)
bitset speed = 1.647 (3168,1521)
vector speed = 1.383999944 (4554,3170)
v[1], v[2] 0, 1
bitset speed = 1.610000014 (6166,4556)
b[1], b[2] 0, 1
mingw/g++ output -O2 -std=c++11:
mingw/g++ 输出 -O2 -std=c++11:
vector speed = 1.528 (1529,1)
bitset speed = 1.685 (3215,1530)
vector speed = 1.409999967 (4626,3216)
v[1], v[2] 0, 1
bitset speed = 1.763000011 (6392,4629)
b[1], b[2] 0, 1
g++ 4.8.2 output -O2:
g++ 4.8.2 输出 -O2:
vector speed = 1.561391 (1564139,2748)
bitset speed = 1.681818 (3246051,1564233)
vector speed = 1.487877011 (4733975,3246098)
v[1], v[2] 0, 1
bitset speed = 1.685297012 (6419328,4734031)
b[1], b[2] 0, 1
g++ 4.8.2 output -O2 -std=c++11:
g++ 4.8.2 输出 -O2 -std=c++11:
vector speed = 1.561391 (1564139,2748)
bitset speed = 1.681818 (3246051,1564233)
vector speed = 1.487877011 (4733975,3246098)
v[1], v[2] 0, 1
bitset speed = 1.685297012 (6419328,4734031)
b[1], b[2] 0, 1
CONCLUSION:
结论:
As a rough idea vector seems faster for these use cases.
对于这些用例,粗略的想法向量似乎更快。
I don't run multiple istances and average the results, but more or less the values are always the same.
我不会运行多个实例并平均结果,但值或多或少总是相同的。
note on VS: I think it use a different memory management mechanism respect to gcc and for these use cases seems slower in the generated code.
关于 VS 的注意事项:我认为它使用与 gcc 不同的内存管理机制,对于这些用例,在生成的代码中似乎更慢。
回答by estan
Here's my unscientific benchmark of accessing/inserting 3 billion elements from/into bitset<>
and vector<bool>
of sizes 100K, 1M and 5M. The compiler is GCC 4.8.2 on 64 bit Linux machine (Core i7):
这是我从/向其中访问/插入 30 亿个元素bitset<>
以及vector<bool>
大小为 100K、1M 和 5M的不科学基准。编译器是 64 位 Linux 机器(Core i7)上的 GCC 4.8.2:
With optimization (compiler flags: -O2 -std=c++11
):
优化(编译器标志:)-O2 -std=c++11
:
[estan@pyret bitset_vs_vector]$ ./bitset_vs_vector
bitset<100000> (3 billion accesses/inserts): 132.424 ms
vector<bool>(100000) (3 billion accesses/inserts): 270.577 ms
bitset<1000000> (3 billion accesses/inserts): 67.752 ms
vector<bool>(1000000) (3 billion accesses/inserts): 268.193 ms
bitset<5000000> (3 billion accesses/inserts): 67.426 ms
vector<bool>(5000000) (3 billion accesses/inserts): 267.566 ms
Without optimization (compiler flags: -std=c++11
):
没有优化(编译器标志:)-std=c++11
:
[estan@pyret bitset_vs_vector]$ make
g++ -std=c++11 -o bitset_vs_vector *.cpp
[estan@pyret bitset_vs_vector]$ ./bitset_vs_vector
bitset<100000> (3 billion accesses/inserts): 1900.13 ms
vector<bool>(100000) (3 billion accesses/inserts): 1784.76 ms
bitset<1000000> (3 billion accesses/inserts): 1825.09 ms
vector<bool>(1000000) (3 billion accesses/inserts): 1768.03 ms
bitset<5000000> (3 billion accesses/inserts): 1846.73 ms
vector<bool>(5000000) (3 billion accesses/inserts): 1763.48 ms
So it seems under these conditions, bitset is faster than vector when the code is optimized, while vector actually comes out on top by a (very) small margin when it's not.
所以在这些条件下,当代码被优化时,bitset 比 vector 更快,而当它不是时,vector 实际上以(非常)小的幅度排在首位。
That said, if your code is time critical you should probably perform benchmarks yourself, since I suspect these numbers are highly compiler/environment specific.
也就是说,如果您的代码对时间要求严格,您可能应该自己执行基准测试,因为我怀疑这些数字是高度特定于编译器/环境的。
Benchmark code:
基准代码:
#include <iostream>
#include <functional>
#include <bitset>
#include <vector>
#include <ctime>
// Performs N access/insert on container.
template<class T>
void access_and_insert(T &container, int N)
{
const std::size_t size = container.size();
for (int i = 0; i < N; ++i) {
bool v = container[i % size];
container[i % size] = true;
}
}
// Measure the time in milliseconds required to call f.
double measure(std::function<void (void)> f)
{
clock_t start = std::clock();
f();
return 1000.0 * (std::clock() - start)/CLOCKS_PER_SEC;
}
int main (void)
{
// Benchmark with 100K elements.
std::bitset<100000> bitset100K;
std::vector<bool> vector100K(100000);
std::cout << "bitset<100000> (3 billion accesses/inserts): ";
std::cout << measure([&]() { access_and_insert(bitset100K, 3E7); }) << " ms " << std::endl;
std::cout << "vector<bool>(100000) (3 billion accesses/inserts): ";
std::cout << measure([&]() { access_and_insert(vector100K, 3E7); }) << " ms" << std::endl;
std::cout << std::endl;
// Benchmark with 1M elements.
std::bitset<1000000> bitset1M;
std::vector<bool> vector1M(1000000);
std::cout << "bitset<1000000> (3 billion accesses/inserts): ";
std::cout << measure([&]() { access_and_insert(bitset1M, 3E7); }) << " ms " << std::endl;
std::cout << "vector<bool>(1000000) (3 billion accesses/inserts): ";
std::cout << measure([&]() { access_and_insert(vector1M, 3E7); }) << " ms" << std::endl;
std::cout << std::endl;
// Benchmark with 5M elements.
std::bitset<5000000> bitset5M;
std::vector<bool> vector5M(5000000);
std::cout << "bitset<5000000> (3 billion accesses/inserts): ";
std::cout << measure([&]() { access_and_insert(bitset5M, 3E7); }) << " ms " << std::endl;
std::cout << "vector<bool>(5000000) (3 billion accesses/inserts): ";
std::cout << measure([&]() { access_and_insert(vector5M, 3E7); }) << " ms" << std::endl;
return 0;
}
回答by Armen Tsirunyan
the vector accesses its elements with iterators, which can't be a simple typedef for bool*, , which makes it slower than bitset, which doesn't provide iterators. Another thing that makes it fast is that its size is known compile-time and therefore it does no allocation with new, which is slower than stack allocation. Just random thoughts
该向量使用迭代器访问其元素,这不能是 bool*, 的简单 typedef,这使得它比不提供迭代器的 bitset 慢。另一个使它快速的原因是它的大小在编译时是已知的,因此它不使用 new 进行分配,这比堆栈分配慢。只是随意的想法
回答by Marcin
Also, note that a vector<bool>
is a specialization of the vector template, and is implemented quite differently than you might think.
另请注意, avector<bool>
是矢量模板的特化,其实现方式与您想象的完全不同。