Python 字典键。“在”复杂性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17539367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python dictionary keys. "In" complexity
提问by tknickman
Quick question to mainly satisfy my curiosity on the topic.
快速问题主要是满足我对这个话题的好奇心。
I am writing some large python programs with an SQlite database backend and will be dealing with a large number of records in the future, so I need to optimize as much as I can.
我正在编写一些带有 SQlite 数据库后端的大型 python 程序,并且将来会处理大量记录,因此我需要尽可能多地优化。
For a few functions, I am searching through keys in a dictionary. I have been using the "in" keyword for prototyping and was planning on going back and optimizing those searches later as I know the "in" keyword is generally O(n) (as this just translates to python iterating over an entire list and comparing each element). But, as a python dict is basically just a hash map, is the python interpreter smart enough to interpret:
对于一些功能,我正在字典中搜索键。我一直在使用“in”关键字进行原型设计,并计划稍后返回并优化这些搜索,因为我知道“in”关键字通常是 O(n)(因为这只是转换为 python 迭代整个列表并进行比较每个元素)。但是,由于 python dict 基本上只是一个哈希映射,python 解释器是否足够聪明来解释:
if(key in dict.keys()):
...code...
to:
到:
if(dict[key] != None):
...code...
It is basically the same operation but the top would be O(n) and the bottom would be O(1).
它基本上是相同的操作,但顶部为 O(n),底部为 O(1)。
It's easy for me to use the bottom version in my code, but then I was just curious and thought I would ask.
在我的代码中使用底部版本对我来说很容易,但后来我只是好奇并想我会问。
采纳答案by abarnert
First, key in d.keys()
is guaranteed to give you the same value as key in d
for any dict d
.
首先,key in d.keys()
保证为您提供与key in d
任何 dict相同的值d
。
And the in
operation on a dict
, or the dict_keys
object you get back from calling keys()
on it (in 3.x), is notO(N), it's O(1).
in
对 a的操作dict
,或者dict_keys
调用keys()
它返回的对象(在 3.x 中),不是O(N),而是 O(1)。
There's no real "optimization" going on; it's just that using the hash is the obvious way to implement __contains__
on a hash table, just as it's the obvious way to implement __getitem__
.
没有真正的“优化”正在进行;只是使用散列是在__contains__
散列表上实现的明显方法,正如它是实现__getitem__
.
You may ask where this is guaranteed.
你可能会问哪里有保证。
Well, it's not. Mapping Typesdefines dict
as, basically, a hash table implementation of collections.abc.Mapping
. There's nothing stopping someone from creating a hash table implementation of a Mapping, but still providing O(N) searches. But it would be extra work to make such a bad implementation, so why would they?
好吧,它不是。映射类型dict
基本上定义为 的哈希表实现collections.abc.Mapping
。没有什么可以阻止某人创建映射的哈希表实现,但仍然提供 O(N) 搜索。但是做出如此糟糕的实现需要额外的工作,那么他们为什么要这样做呢?
If you really need to prove it to yourself, you can test every implementation you care about (with a profiler, or by using some type with a custom __hash__
and __eq__
that logs calls, or…), or read the source.
如果你真的需要向自己证明,你可以测试你关心的每一个实现(使用分析器,或者使用某种类型的自定义__hash__
并__eq__
记录调用,或者……),或者阅读源代码。
In 2.x, you do not want to call keys
, because that generates a list
of the keys, instead of a KeysView
. You could use iterkeys
, but that may generate an iterator or something else that's not O(1). So, just use the dict itself as a sequence.
在 2.x 中,您不想调用keys
,因为这会生成 alist
键,而不是 a KeysView
。您可以使用iterkeys
,但这可能会生成一个迭代器或其他不是 O(1) 的东西。因此,只需将 dict 本身用作序列。
Even in 3.x, you don't want to call keys
, because there's no need to. Iterating a dict
, checking its __contains__
, and in general treating it like a sequence is alwaysequivalent to doing the same thing to its keys, so why bother? (And of course building the trivial KeyView
, and accessing through it, are going to add a few nanoseconds to your running time and a few keystrokes to your program.)
即使在 3.x 中,您也不想调用keys
,因为没有必要。迭代 a dict
,检查它的__contains__
,并且通常把它当作一个序列来对待总是等价于对它的键做同样的事情,所以为什么要麻烦呢?(当然,构建琐碎的KeyView
,并通过它进行访问,将增加几纳秒的运行时间,并为您的程序增加一些按键。)
(It's not quite clear that using sequence operations is equivalent for d.keys()
/d.iterkeys()
and d
in 2.x. Other than performance issues, they areequivalent in every CPython, Jython, IronPython, and PyPy implementation, but it doesn't seem to be stated anywhere the way it is in 3.x. And it doesn't matter; just use key in d
.)
(不太清楚使用序列操作对d.keys()
/d.iterkeys()
和d
2.x是否等效。除了性能问题,它们在每个 CPython、Jython、IronPython 和 PyPy 实现中都是等效的,但似乎没有在任何地方说明它在 3.x. 中的方式并且没关系;只需使用key in d
.)
While we're at it, note that this:
当我们这样做时,请注意:
if(dict[key] != None):
… is not going to work. If the key
is not in the dict
, this will raise KeyError
, not return None
.
......不会起作用。如果key
不在dict
,这将提高KeyError
,而不是返回None
。
Also, you should never check None
with ==
or !=
; always use is
.
此外,你永远不应该检查None
与==
或!=
; 总是使用is
.
You can do this with a try
—or, more simply, do if dict.get(key, None) is not None
. But again, there's no reason to do so. Also, that won't handle cases where None
is a perfectly valid item. If that's the case, you need to do something like sentinel = object(); if dict.get(key, sentinel) is not sentinel:
.
您可以使用try
- 或者更简单地使用 do来完成此操作if dict.get(key, None) is not None
。但同样,没有理由这样做。此外,这不会处理None
完全有效的项目的情况。如果是这种情况,您需要执行类似sentinel = object(); if dict.get(key, sentinel) is not sentinel:
.
So, the right thing to write is:
所以,正确的写法是:
if key in d:
More generally, this is not true:
更一般地说,这不是真的:
I know the "in" keyword is generally O(n) (as this just translates to python iterating over an entire list and comparing each element
我知道“in”关键字通常是 O(n) (因为这只是转换为 python 迭代整个列表并比较每个元素
The in
operator, like most other operators, is just a call to a __contains__
method (or the equivalent for a C/Java/.NET/RPython builtin). list
implements it by iterating the list and comparing each element; dict
implements it by hashing the value and looking up the hash; blist.blist
implements it by walking a B+Tree; etc. So, it could be O(n), O(1), O(log n), or something completely different.
的in
操作者,像大多数其他的运营商,只是所涉及的呼叫__contains__
方法(或等效为一个C /爪哇/ .NET / RPython内建)。list
通过迭代列表并比较每个元素来实现它;dict
通过对值进行散列并查找散列来实现它;blist.blist
通过遍历 B+Tree 来实现它;等等。所以,它可能是 O(n)、O(1)、O(log n) 或完全不同的东西。
回答by Ashwini Chaudhary
In Python 2 dict.keys()
creates the whole list of keys first that's why it is an O(N)
operation, while key in dict
is an O(1)
operation.
在 Python 2dict.keys()
中,首先创建整个键列表,这就是为什么它是一个O(N)
操作,而 whilekey in dict
是一个O(1)
操作。
if(dict[key] != None)
will raise KeyError
if key is not found in the dict, so it is not equivalent to the first code.
if(dict[key] != None)
KeyError
如果在 dict 中找不到 key,则会引发,因此它不等同于第一个代码。
Python 2 results:
Python 2 结果:
>>> dic = dict.fromkeys(range(10**5))
>>> %timeit 10000 in dic
1000000 loops, best of 3: 170 ns per loop
>>> %timeit 10000 in dic.keys()
100 loops, best of 3: 4.98 ms per loop
>>> %timeit 10000 in dic.iterkeys()
1000 loops, best of 3: 402 us per loop
>>> %timeit 10000 in dic.viewkeys()
1000000 loops, best of 3: 457 ns per loop
In Python 3 dict.keys()
returns a view object which is quite faster than Python 2's keys()
but still slower simple normal key in dict
:
在 Python 3 中dict.keys()
返回一个视图对象,它比 Python 2 快得多,keys()
但仍然较慢,简单普通key in dict
:
Python 3 results:
Python 3 结果:
>>> dic = dict.fromkeys(range(10**5))
>>> %timeit 10000 in dic
1000000 loops, best of 3: 295 ns per loop
>>> %timeit 10000 in dic.keys()
1000000 loops, best of 3: 475 ns per loop
Use just:
仅使用:
if key in dict:
#code
回答by Matt Bryant
The proper way to do this would be
这样做的正确方法是
if key in dict:
do stuff
the inoperator is O(1) for dictionaries and sets in python.
的在操作者是在python字典和集O(1)。