Python 字典键。“在”复杂性

Question

提问by tknickman

Quick question to mainly satisfy my curiosity on the topic.

快速问题主要是满足我对这个话题的好奇心。

I am writing some large python programs with an SQlite database backend and will be dealing with a large number of records in the future, so I need to optimize as much as I can.

我正在编写一些带有 SQlite 数据库后端的大型 python 程序，并且将来会处理大量记录，因此我需要尽可能多地优化。

For a few functions, I am searching through keys in a dictionary. I have been using the "in" keyword for prototyping and was planning on going back and optimizing those searches later as I know the "in" keyword is generally O(n) (as this just translates to python iterating over an entire list and comparing each element). But, as a python dict is basically just a hash map, is the python interpreter smart enough to interpret:

对于一些功能，我正在字典中搜索键。我一直在使用“in”关键字进行原型设计，并计划稍后返回并优化这些搜索，因为我知道“in”关键字通常是 O(n)（因为这只是转换为 python 迭代整个列表并进行比较每个元素）。但是，由于 python dict 基本上只是一个哈希映射，python 解释器是否足够聪明来解释：

if(key in dict.keys()):
    ...code...

to:

到：

if(dict[key] != None):
    ...code...

It is basically the same operation but the top would be O(n) and the bottom would be O(1).

它基本上是相同的操作，但顶部为 O(n)，底部为 O(1)。

It's easy for me to use the bottom version in my code, but then I was just curious and thought I would ask.

在我的代码中使用底部版本对我来说很容易，但后来我只是好奇并想我会问。

Answer 1

采纳答案by abarnert

First, key in d.keys()is guaranteed to give you the same value as key in dfor any dict d.

首先，key in d.keys()保证为您提供与key in d任何 dict相同的值d。

And the inoperation on a dict, or the dict_keysobject you get back from calling keys()on it (in 3.x), is notO(N), it's O(1).

in对 a的操作dict，或者dict_keys调用keys()它返回的对象（在 3.x 中），不是O(N)，而是 O(1)。

There's no real "optimization" going on; it's just that using the hash is the obvious way to implement __contains__on a hash table, just as it's the obvious way to implement __getitem__.

没有真正的“优化”正在进行；只是使用散列是在__contains__散列表上实现的明显方法，正如它是实现__getitem__.

You may ask where this is guaranteed.

你可能会问哪里有保证。

Well, it's not. Mapping Typesdefines dictas, basically, a hash table implementation of collections.abc.Mapping. There's nothing stopping someone from creating a hash table implementation of a Mapping, but still providing O(N) searches. But it would be extra work to make such a bad implementation, so why would they?

好吧，它不是。映射类型dict基本上定义为的哈希表实现collections.abc.Mapping。没有什么可以阻止某人创建映射的哈希表实现，但仍然提供 O(N) 搜索。但是做出如此糟糕的实现需要额外的工作，那么他们为什么要这样做呢？

If you really need to prove it to yourself, you can test every implementation you care about (with a profiler, or by using some type with a custom __hash__and __eq__that logs calls, or…), or read the source.

如果你真的需要向自己证明，你可以测试你关心的每一个实现（使用分析器，或者使用某种类型的自定义__hash__并__eq__记录调用，或者……），或者阅读源代码。

In 2.x, you do not want to call keys, because that generates a listof the keys, instead of a KeysView. You could use iterkeys, but that may generate an iterator or something else that's not O(1). So, just use the dict itself as a sequence.

在 2.x 中，您不想调用keys，因为这会生成 alist键，而不是 a KeysView。您可以使用iterkeys，但这可能会生成一个迭代器或其他不是 O(1) 的东西。因此，只需将 dict 本身用作序列。

Even in 3.x, you don't want to call keys, because there's no need to. Iterating a dict, checking its __contains__, and in general treating it like a sequence is alwaysequivalent to doing the same thing to its keys, so why bother? (And of course building the trivial KeyView, and accessing through it, are going to add a few nanoseconds to your running time and a few keystrokes to your program.)

即使在 3.x 中，您也不想调用keys，因为没有必要。迭代 a dict，检查它的__contains__，并且通常把它当作一个序列来对待总是等价于对它的键做同样的事情，所以为什么要麻烦呢？（当然，构建琐碎的KeyView，并通过它进行访问，将增加几纳秒的运行时间，并为您的程序增加一些按键。）

(It's not quite clear that using sequence operations is equivalent for d.keys()/d.iterkeys()and din 2.x. Other than performance issues, they areequivalent in every CPython, Jython, IronPython, and PyPy implementation, but it doesn't seem to be stated anywhere the way it is in 3.x. And it doesn't matter; just use key in d.)

（不太清楚使用序列操作对d.keys()/d.iterkeys()和d2.x是否等效。除了性能问题，它们在每个 CPython、Jython、IronPython 和 PyPy 实现中都是等效的，但似乎没有在任何地方说明它在 3.x. 中的方式并且没关系；只需使用key in d.)

While we're at it, note that this:

当我们这样做时，请注意：

if(dict[key] != None):

… is not going to work. If the keyis not in the dict, this will raise KeyError, not return None.

......不会起作用。如果key不在dict，这将提高KeyError，而不是返回None。

Also, you should never check Nonewith ==or !=; always use is.

此外，你永远不应该检查None与==或!=; 总是使用is.

You can do this with a try—or, more simply, do if dict.get(key, None) is not None. But again, there's no reason to do so. Also, that won't handle cases where Noneis a perfectly valid item. If that's the case, you need to do something like sentinel = object(); if dict.get(key, sentinel) is not sentinel:.

您可以使用try- 或者更简单地使用 do来完成此操作if dict.get(key, None) is not None。但同样，没有理由这样做。此外，这不会处理None完全有效的项目的情况。如果是这种情况，您需要执行类似sentinel = object(); if dict.get(key, sentinel) is not sentinel:.

So, the right thing to write is:

所以，正确的写法是：

if key in d:

More generally, this is not true:

更一般地说，这不是真的：

I know the "in" keyword is generally O(n) (as this just translates to python iterating over an entire list and comparing each element

我知道“in”关键字通常是 O(n) （因为这只是转换为 python 迭代整个列表并比较每个元素

The inoperator, like most other operators, is just a call to a __contains__method (or the equivalent for a C/Java/.NET/RPython builtin). listimplements it by iterating the list and comparing each element; dictimplements it by hashing the value and looking up the hash; blist.blistimplements it by walking a B+Tree; etc. So, it could be O(n), O(1), O(log n), or something completely different.

的in操作者，像大多数其他的运营商，只是所涉及的呼叫__contains__方法（或等效为一个C /爪哇/ .NET / RPython内建）。list通过迭代列表并比较每个元素来实现它；dict通过对值进行散列并查找散列来实现它；blist.blist通过遍历 B+Tree 来实现它；等等。所以，它可能是 O(n)、O(1)、O(log n) 或完全不同的东西。

Answer 2

回答by Ashwini Chaudhary

In Python 2 dict.keys()creates the whole list of keys first that's why it is an O(N)operation, while key in dictis an O(1)operation.

在 Python 2dict.keys()中，首先创建整个键列表，这就是为什么它是一个O(N)操作，而 whilekey in dict是一个O(1)操作。

if(dict[key] != None)will raise KeyErrorif key is not found in the dict, so it is not equivalent to the first code.

if(dict[key] != None)KeyError如果在 dict 中找不到 key，则会引发，因此它不等同于第一个代码。

Python 2 results:

Python 2 结果：

>>> dic = dict.fromkeys(range(10**5))
>>> %timeit 10000 in dic
1000000 loops, best of 3: 170 ns per loop
>>> %timeit 10000 in dic.keys()
100 loops, best of 3: 4.98 ms per loop
>>> %timeit 10000 in dic.iterkeys()
1000 loops, best of 3: 402 us per loop
>>> %timeit 10000 in dic.viewkeys()
1000000 loops, best of 3: 457 ns per loop

In Python 3 dict.keys()returns a view object which is quite faster than Python 2's keys()but still slower simple normal key in dict:

在 Python 3 中dict.keys()返回一个视图对象，它比 Python 2 快得多，keys()但仍然较慢，简单普通key in dict：

Python 3 results:

Python 3 结果：

>>> dic = dict.fromkeys(range(10**5))
>>> %timeit 10000 in dic
1000000 loops, best of 3: 295 ns per loop
>>> %timeit 10000 in dic.keys()
1000000 loops, best of 3: 475 ns per loop

Use just:

仅使用：

if key in dict:
   #code

Answer 3

回答by Matt Bryant

The proper way to do this would be

这样做的正确方法是

if key in dict:
    do stuff

the inoperator is O(1) for dictionaries and sets in python.

的在操作者是在python字典和集O（1）。

Answer 4

回答by prafi

The in operator for dict has average case time-complexity of O(1). For detailed information about time complexity of other dict() methods, visit this link.

dict 的 in 运算符的平均情况时间复杂度为 O(1)。有关其他 dict() 方法的时间复杂度的详细信息，请访问此链接。

Python 字典键。“在”复杂性

提问by tknickman

采纳答案by abarnert

回答by Ashwini Chaudhary

回答by Matt Bryant

回答by prafi

相关推荐

最近更新

标签

Python 字典键。“在”复杂性

提问by tknickman

采纳答案by abarnert

回答by Ashwini Chaudhary

回答by Matt Bryant

回答by prafi

相关推荐

Python 错误：“'dict' 对象没有属性 'iteritems'”

Python 如何在模型 Django 中创建密码字段

IPython Notebook 中的 [*] 是什么意思以及如何关闭它？

Python Pandas：固定宽度的直方图

相关推荐

最近更新

标签