C++ 哪个是查找速度最快的 STL 容器?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6985572/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 21:04:28  来源:igfitidea点击:

Which is the fastest STL container for find?

c++optimizationstlfind

提问by AJG85

Alright as a preface I have a need to cache a relatively small subset of rarely modified data to avoid querying the database as frequently for performance reasons. This data is heavily used in a read-only sense as it is referenced often by a much larger set of data in other tables.

好的,作为序言,我需要缓存很少修改的数据的相对较小的子集,以避免出于性能原因频繁查询数据库。此数据在只读意义上被大量使用,因为它经常被其他表中的大量数据引用。

I've written a class which will have the ability to store basically the entirety of the two tables in question in memory while listening for commit changes in conjunction with a thread safe callback mechanism for updating the cached objects.

我编写了一个类,它能够将两个有问题的表的全部存储在内存中,同时侦听提交更改以及用于更新缓存对象的线程安全回调机制。

My current implementation has two std::vectorsone for the elements of each table. The class provides both access to the entirety of each vector as well as convenience methods for searching for a specific element of table data via std::find, std::find_if, etc.

我当前的实现std::vectors对每个表的元素有两个。该类提供两个访问每个矢量的整体以及方便的方法用于搜索通过表数据的特定元素std::findstd::find_if等。

Does anyone know if using std::list, std::set, or std::mapover std::vectorfor searching would be preferable? Most of the time that is what will be requested of these containers after populating once from the database when a new connection is made.

有谁知道使用std::list,std::setstd::mapoverstd::vector进行搜索是否更可取?大多数情况下,这是在建立新连接时从数据库填充一次后对这些容器的请求。

I'm also open to using C++0x features supported by VS2010 or Boost.

我也愿意使用 VS2010 或 Boost 支持的 C++0x 功能。

回答by R. Martinho Fernandes

For searching a particular value, with std::setand std::mapit takes O(log N) time, while with the other two it takes O(N) time; So, std::setor std::mapare probably better. Since you have access to C++0x, you could also use std::unordered_setor std::unordered_mapwhich take constant time on average.

对于搜索特定值,withstd::setstd::map它需要 O(log N) 时间,而其他两个则需要 O(N) 时间;所以,std::set或者std::map可能更好。由于您可以访问 C++0x,因此您还可以使用std::unordered_setstd::unordered_map平均花费恒定时间。

For find_if, there's little difference between them, because it takes an arbitrary predicate and containers cannot optimize arbitrarily, of course.

对于find_if,它们之间几乎没有区别,因为它需要任意谓词,当然容器不能任意优化。

However if you will be calling find_iffrequently with a certain predicate, you can optimize yourself: use a std::mapor std::setwith a custom comparator or special keys and use findinstead.

但是,如果您find_if经常使用某个谓词进行调用,则可以优化自己:使用std::maporstd::set与自定义比较器或特殊键并使用find

回答by Mark Ransom

A sorted vector using std::lower_boundcan be just as fast as std::setif you're not updating very often; they're both O(log n). It's worth trying both to see which is better for your own situation.

使用排序向量std::lower_bound可以像std::set您不经常更新一样快;它们都是 O(log n)。两者都值得尝试,看看哪个更适合您自己的情况。

回答by Matthieu M.

Since from your (extended) requirements you need to search on multiple fields, I would point you to Boost.MultiIndex.

由于您的(扩展)要求需要在多个字段上进行搜索,因此我将向您指出 Boost.MultiIndex。

This Boost library lets you build onecontainer (with only one exemplary of each element it contains) and index it over an arbitrary numberof indices. It also lets you precise which indices to use.

这个 Boost 库允许您构建一个容器(它包含的每个元素只有一个示例)并在任意数量的索引上对其进行索引。它还可以让您精确地使用哪些索引。

To determine the kind of index to use, you'll need extensive benchmarks. 500is a relatively low number of entries, so constant factors won't play nicely. Furthermore, there can be a noticeable difference between single-thread and multi-thread usage (most hash-table implementations can collapse on MT usage because they do not use linear-rehashing, and thus a single thread ends up rehashing the table, blocking all others).

要确定要使用的索引类型,您需要大量的基准测试。500条目数量相对较少,因此常数因子不会很好地发挥作用。此外,单线程和多线程使用之间可能存在显着差异(大多数哈希表实现可能会因 MT 使用而崩溃,因为它们不使用线性重新哈希,因此单个线程最终重新哈希表,阻塞所有其他)。

I would recommend a sorted index (skip-list like, if possible) to accomodate range requests (all names beginning by Abc?) if the performance difference is either unnoticeable or simply does not matter.

Abc如果性能差异不明显或根本无关紧要,我会推荐一个排序索引(如果可能的话,像跳过列表一样)来适应范围请求(所有名称都以?开头)。

回答by Piotr

Test it. It is very easy, containers are almost interchangeable in STL.

测试一下。这很容易,容器在 STL 中几乎可以互换。

回答by Anders Abel

If you only want to search for distinct values, one specific column in the table, then std::hashis fastest.

如果您只想搜索不同的值,则表中的特定列std::hash是最快的。

If you want to be able to search using several different predicates, you will need some kind of index structure. It can be implemented by extending your current vector based approach with several hash tables or maps, one for each field to search for, where the value is either an index into the vector, or a direct pointer to the element in the vector.

如果您希望能够使用多个不同的谓词进行搜索,您将需要某种索引结构。它可以通过使用多个哈希表或映射扩展当前基于向量的方法来实现,每个字段一个用于搜索,其中值是向量的索引,或指向向量中元素的直接指针。

Going further, if you want to be able to search for ranges, such as all occasions having a date in July you need an ordered data structure, where you can extract a range.

更进一步,如果您希望能够搜索范围,例如所有日期在 7 月的场合,您需要一个有序的数据结构,您可以在其中提取一个范围。

回答by Jonathan Yue

Not stl, but a commercial c++ container is abax container which has O(1) in search, delete, modify, O(logn) in insert.

不是 stl,而是商业 C++ 容器是 abax 容器,它在搜索、删除、修改中具有 O(1),在插入中具有 O(logn)。

回答by Rob K

Not an answer per se, but be sure to use a typedef to refer to the container type you do use, something like typedef std::vector< itemtype > data_table_cache;Then use your typedef type everywhere.

本身不是答案,但一定要使用 typedef 来指代您使用的容器类型,例如typedef std::vector< itemtype > data_table_cache;Then 到处使用您的 typedef 类型。