C++ STL 映射与矢量速度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2572678/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 23:57:44  来源:igfitidea点击:

C++ STL Map vs Vector speed

c++stlvectormap

提问by sub

In the interpreter for my experimental programming language I have a symbol table. Each symbol consists of a name and a value (the value can be e.g.: of type string, int, function, etc.).

在我的实验性编程语言的解释器中,我有一个符号表。每个符号由名称和值组成(值可以是例如:字符串、整数、函数等类型)。

At first I represented the table with a vector and iterated through the symbols checking if the given symbol name fitted.

起初,我用一个向量表示表格,并遍历符号检查给定的符号名称是否合适。

Then I though using a map, in my case map<string,symbol>, would be better than iterating through the vector all the time but:

然后我虽然使用地图,在我的情况下map<string,symbol>,会比一直迭代向量更好,但是

It's a bit hard to explain this part but I'll try.

解释这部分有点困难,但我会尝试。

If a variable is retrieved the first time in a program in my language, of course its position in the symbol table has to be found (using vector now). If I would iterate through the vector every time the line gets executed (think of a loop), it would be terribly slow (as it currently is, nearly as slow as microsoft's batch).

如果在我的语言的程序中第一次检索变量,当然必须找到它在符号表中的位置(现在使用向量)。如果我每次执行该行时都遍历向量(想想一个循环),它会非常慢(就像目前一样,几乎和 microsoft 的批处理一样慢)。

So I could use a map to retrieve the variable: SymbolTable[ myVar.Name ]

所以我可以使用地图来检索变量: SymbolTable[ myVar.Name ]

But think of the following: If the variable, still using vector, is found the first time, I can store its exact integer position in the vector with it. That means: The next time it is needed, my interpreter knows that it has been "cached" and doesn't search the symbol table for it but does something like SymbolTable.at( myVar.CachedPosition ).

但是请考虑以下几点:如果第一次找到仍在使用向量的变量,我可以用它存储它在向量中的确切整数位置。这意味着:下次需要它时,我的解释器知道它已被“缓存”并且不会搜索符号表,而是执行类似SymbolTable.at( myVar.CachedPosition ).

Now my (rather hard?) question:

现在我的(相当困难?)问题:

  • Should I use a vector for the symbol table together with caching the position of the variable in the vector?

  • Should I rather use a map? Why? How fast is the [] operator?

  • Should I use something completely different?

  • 我应该为符号表使用向量并缓存向量中变量的位置吗?

  • 我应该使用地图吗?为什么?[] 操作符有多快?

  • 我应该使用完全不同的东西吗?

采纳答案by Matthieu M.

You effectively have a number of alternatives.

您实际上有许多选择。

Libraries exist:

图书馆存在

Critics

评论家

  • Map look up and retrieval take O(log N), but the items may be scattered throughout the memory, thus not playing well with caching strategies.
  • Vector are more cache friendly, however unless you sort it you'll have O(N)performance on find, is it acceptable ?
  • Why not using a unordered_map? They provide O(1)lookup and retrieval (though the constant may be high) and are certainly suited to this task. If you have a look at Wikipedia's article on Hash Tablesyou'll realize that there are many strategies available and you can certainly pick one that will suit your particular usage pattern.
  • 地图查找和检索需要O(log N),但项目可能分散在整个内存中,因此不能很好地与缓存策略配合使用。
  • Vector 对缓存更友好,但是除非您对其进行排序,否则您将获得O(N)性能find,是否可以接受?
  • 为什么不使用unordered_map? 它们提供O(1)查找和检索(尽管常数可能很高)并且当然适合此任务。如果您查看维基百科关于哈希表的文章,您会发现有许多可用的策略,您当然可以选择一种适合您特定使用模式的策略。

回答by sub

A map is a good thing to use for a symbol table. but operator[]for maps is not. In general, unless you are writing some trivial code, you should use the map's member functions insert()and find()instead of operator[]. The semantics of operator[]are somewhat complicated, and almost certainly don't do what you want if the symbol you are looking for is not in the map.

映射是用于符号表的好东西。但operator[]对于地图不是。通常,除非您正在编写一些琐碎的代码,否则您应该使用映射的成员函数insert()find()不是operator[]. 的语义operator[]有些复杂,如果您要查找的符号不在地图中,则几乎可以肯定不会执行您想要的操作。

As for the choice between mapand unordered_map, the difference in performance is highly unlikely to be significant when implementing a simple interpretive language. If you use map, you are guaranteed it will be supported by all current Standard C++ implementations.

至于在map和之间的选择unordered_map,在实现简单的解释性语言时,性能的差异极不可能是显着的。如果您使用 map,则可以保证所有当前的标准 C++ 实现都支持它。

回答by Mike Dinsdale

Normally you'd use a symbol table to look up the variable given its name as it appears in the source. In this case, you only have the name to work with, so there's nowhere to store the cached position of the variable in the symbol table. So I'd say a mapis a good choice. The []operator takes time proportional to the log of the number of elements in the map - if it turns out to be slow, you could use a hash map like std::tr1::unordered_map.

通常,您会使用符号表来查找给定名称的变量,因为它出现在源代码中。在这种情况下,您只有要使用的名称,因此无处可存储变量在符号表中的缓存位置。所以我会说amap是一个不错的选择。该[]运营商花费的时间与日志中的地图元素的数量-如果它原来是缓慢的,你可以使用一个哈希表一样std::tr1::unordered_map

回答by Tronic

std::map's operator[] takes O(log(n)) time. This means that it is quite efficient, but you still should avoid doing the lookups over and over again. Instead of storing an index, perhaps you can store a reference to the value, or an iterator to the container? This avoids having to do lookup entirely.

std::map 的 operator[] 需要 O(log(n)) 时间。这意味着它非常有效,但您仍然应该避免一遍又一遍地查找。也许您可以存储对值的引用或对容器的迭代器,而不是存储索引?这避免了必须完全进行查找。

回答by Dietrich Epp

When most interpreters interpret code, they compile it into an intermediate language first. These intermediate languages often refer to variables by index or by pointer, instead of by name.

当大多数解释器解释代码时,他们首先将其编译成中间语言。这些中间语言通常通过索引或指针而不是名称来引用变量。

For example, Python (the C implementation) changes local variables into references by index, but global variables and class variables get referenced by name using a hash table.

例如,Python(C 实现)将局部变量更改为按索引引用,但全局变量和类变量使用哈希表按名称引用。

I suggest looking at an introductory text on compilers.

我建议查看有关编译器的介绍性文本。

回答by peterchen

a std::map(O(log(n))) or a hashtable ("amortized" O(1)) would be the first choice - use custom mechanisms if you determin it's a bottleneck. Generally, using a hash or tokenizing the input is the first optimization.

a std::map(O(log(n))) 或哈希表(“摊销”O(1)) 将是首选 - 如果您确定它是瓶颈,请使用自定义机制。通常,使用散列或标记化输入是第一个优化。

Before you have profiled it, it's most important that you isolate lookup, so you can easily replace and profile it.

在分析它之前,最重要的是隔离查找,以便您可以轻松地替换和分析它。



std::mapis likely a tad slower for a small number of elements (but then, it doesn't really matter).

std::map对于少量元素,可能会慢一点(但是,这并不重要)。

回答by baol

You say: "If the variable, still using vector, is found the first time, I can store its exact integer position in the vector with it.".

你说:“如果第一次找到仍然使用向量的变量,我可以用它存储它在向量中的确切整数位置。”。

You can do the same with the map: search the variable using findand store the iteratorpointing to it instead of the position.

您可以对地图执行相同操作:使用find并存储iterator指向它的变量而不是位置来搜索变量。

回答by Nick Dandoulakis

For looking up values, by a string key, map data type is the appropriate one, as mentioned by other users.

对于通过字符串键查找值,地图数据类型是合适的,正如其他用户所提到的。

STL map implementations usually are implemented with self-balancing trees, like the red black treedata structure, and their operations take O(logn) time.

STL map 的实现通常是用自平衡树来实现的,比如红黑树数据结构,它们的操作需要 O(logn) 的时间。

My advice is to wrap the table manipulation code in functions,
like table_has(name), table_put(name)and table_get(name).

我的建议是将表格操作代码包装在函数中,
例如table_has(name),table_put(name)table_get(name)

That way you can change the inner symbol table representation easily if you experience
slow run time performance, plus you can embed in those routines cache functionality later.

这样,如果您遇到
运行时性能缓慢的情况,您可以轻松更改内部符号表表示,而且您可以稍后嵌入这些例程缓存功能。

回答by Puppy

A map will scale much better, which will be an important feature. However, don't forget that when using a map, you can (unlike a vector) take pointers and references. In this case, you could easily "cache" variables with a map just as validly as a vector. A map is almost certainly the right choice here.

地图会更好地缩放,这将是一个重要的功能。但是,不要忘记在使用映射时,您可以(与向量不同)获取指针和引用。在这种情况下,您可以轻松地使用地图“缓存”变量,就像矢量一样有效。地图几乎肯定是这里的正确选择。

回答by Daniel Earwicker

Map is O(log N), so not as fast as positional lookup in an array. But the exact results will depend on a lot of factors, and so the best approach is to interface with the container in a way that allows you to swap between implementation later on. That is, write a "lookup" function that can be efficiently implemented by any suitable container, to allow yourself to switch and compare speeds of different implementation.

Map 是 O(log N),所以不如数组中的位置查找快。但是确切的结果将取决于很多因素,因此最好的方法是以允许您稍后在实现之间交换的方式与容器进行交互。也就是说,编写一个可以由任何合适的容器有效实现的“查找”功能,以允许您自己切换和比较不同实现的速度。