Python:“除了KeyError”比“if key in dict”快吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20308567/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: is "except KeyError" faster than "if key in dict"?
提问by user985366
Edit 2: It was suggested that this is a copy of a similar question. I'd disagree since my question focuses on speed, while the other question asks what is more "readable" or "better" (without defining better). While the questions are similar, there is a big difference in the discussion/answers given.
编辑 2:有人建议这是一个类似问题的副本。我不同意,因为我的问题侧重于速度,而另一个问题则询问什么更“可读”或“更好”(没有定义更好)。虽然问题相似,但给出的讨论/答案却有很大不同。
EDIT: I realise from the questions that I could have been clearer. Sorry for code typos, yes it should be using the proper python operator for addition.
Regarding the input data, I just chose a list of random numbers since that's a common sample. In my case I'm using a dict where I expect a lot of keyerrors, probably 95% of the keys will not exist, and the few that exist will contain clusters of data.
I'm interested in a general discussion though, regardless of the input data set, but of course samples with running times are interesting.
编辑:我从问题中意识到我本来可以更清楚。抱歉代码错别字,是的,它应该使用正确的 python 运算符进行添加。
关于输入数据,我只是选择了一个随机数列表,因为这是一个常见的样本。在我的例子中,我使用了一个 dict,我预计会有很多键错误,可能 95% 的键不存在,而存在的少数键将包含数据簇。
我对一般性讨论很感兴趣,不管输入数据集如何,但当然,具有运行时间的样本很有趣。
My standard approach would be like so many other posts to write something like
我的标准方法会像许多其他帖子一样写一些类似的东西
list = (100 random numbers)
d = {}
for x in list:
if x in d:
d[x]+=1
else:
d[x]=1
But I just came to think of this being faster, since we dont have to check if the dictionary contains the key. We just assume it does, and if not, we handle that. Is there any difference or is Python smarter than I am?
但我刚刚想到这会更快,因为我们不必检查字典是否包含键。我们只是假设它确实存在,如果没有,我们会处理它。有什么区别或者Python比我更聪明吗?
list = (100 random numbers)
d = {}
for x in list:
try:
d[x]+=1
except KeyError:
d[x] = 1
The same approach with indexes in an array, out of bounds, negative indexes etc.
对数组中的索引、越界、负索引等使用相同的方法。
采纳答案by dawg
Your claim is absolutely falsedepends on the input.
您的声明绝对错误取决于输入。
If you have a diverse set of keys, and hits the exceptblock often, the performance is not good. If the tryblock is dominant the try/exceptidiom can be performant on smaller lists.
如果您有一组不同的键,并且except经常击中块,则性能不佳。如果try块占主导地位,则该try/except习语可以在较小的列表上执行。
Here is a benchmark showing several ways to do the same thing:
这是一个基准,显示了执行同一件事的几种方法:
from __future__ import print_function
import timeit
import random
import collections
def f1():
d={}
for x in tgt:
if x in d:
d[x]+=1
else:
d[x]=1
return d
def f2():
d = {}
for x in tgt:
try:
d[x]+=1
except KeyError:
d[x] = 1
return d
def f3():
d={}.fromkeys(tgt, 0)
for x in tgt:
d[x]+=1
return d
def f4():
d=collections.defaultdict(int)
for x in tgt:
d[x]+=1
return d
def f5():
return collections.Counter(tgt)
def f6():
d={}
for x in tgt:
d[x]=d.setdefault(x, 0)+1
return d
def f7():
d={}
for x in tgt:
d[x]=d.get(x,0)+1
return d
def cmpthese(funcs, c=10000, rate=True, micro=False):
"""Generate a Perl style function benchmark"""
def pprint_table(table):
"""Perl style table output"""
def format_field(field, fmt='{:,.0f}'):
if type(field) is str: return field
if type(field) is tuple: return field[1].format(field[0])
return fmt.format(field)
def get_max_col_w(table, index):
return max([len(format_field(row[index])) for row in table])
col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
for i,row in enumerate(table):
# left col
row_tab=[row[0].ljust(col_paddings[0])]
# rest of the cols
row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))]
print(' '.join(row_tab))
results={k.__name__:timeit.Timer(k).timeit(c) for k in funcs}
fastest=sorted(results,key=results.get, reverse=True)
table=[['']]
if rate: table[0].append('rate/sec')
if micro: table[0].append('usec/pass')
table[0].extend(fastest)
for e in fastest:
tmp=[e]
if rate:
tmp.append('{:,}'.format(int(round(float(c)/results[e]))))
if micro:
tmp.append('{:.3f}'.format(1000000*results[e]/float(c)))
for x in fastest:
if x==e: tmp.append('--')
else: tmp.append('{:.1%}'.format((results[x]-results[e])/results[e]))
table.append(tmp)
pprint_table(table)
if __name__=='__main__':
import sys
print(sys.version)
for j in [100,1000]:
for t in [(0,5), (0,50), (0,500)]:
tgt=[random.randint(*t) for i in range(j)]
print('{} rand ints between {}:'.format(j,t))
print('=====')
cmpthese([f1,f2,f3,f4,f5,f6,f7])
print()
I have included a small benchmark function based on timeitthat prints the functions from Slowest to Fastest with a percent difference between them.
我已经包含了一个基于timeit该函数的小型基准函数,该函数打印从最慢到最快的函数,它们之间存在百分比差异。
Here is the results for Python 3:
以下是 Python 3 的结果:
3.4.1 (default, May 19 2014, 13:10:29)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
100 rand ints between (0, 5):
=====
rate/sec f6 f7 f1 f2 f3 f4 f5
f6 52,756 -- -1.6% -26.2% -27.9% -30.7% -36.7% -46.8%
f7 53,624 1.6% -- -25.0% -26.7% -29.6% -35.7% -46.0%
f1 71,491 35.5% 33.3% -- -2.3% -6.1% -14.2% -28.0%
f2 73,164 38.7% 36.4% 2.3% -- -3.9% -12.2% -26.3%
f3 76,148 44.3% 42.0% 6.5% 4.1% -- -8.7% -23.3%
f4 83,368 58.0% 55.5% 16.6% 13.9% 9.5% -- -16.0%
f5 99,247 88.1% 85.1% 38.8% 35.6% 30.3% 19.0% --
100 rand ints between (0, 50):
=====
rate/sec f2 f6 f7 f4 f3 f1 f5
f2 39,405 -- -17.9% -18.7% -19.1% -41.8% -47.8% -56.3%
f6 47,980 21.8% -- -1.1% -1.6% -29.1% -36.5% -46.8%
f7 48,491 23.1% 1.1% -- -0.5% -28.4% -35.8% -46.2%
f4 48,737 23.7% 1.6% 0.5% -- -28.0% -35.5% -46.0%
f3 67,678 71.7% 41.1% 39.6% 38.9% -- -10.4% -24.9%
f1 75,511 91.6% 57.4% 55.7% 54.9% 11.6% -- -16.3%
f5 90,175 128.8% 87.9% 86.0% 85.0% 33.2% 19.4% --
100 rand ints between (0, 500):
=====
rate/sec f2 f4 f6 f7 f3 f1 f5
f2 25,748 -- -22.0% -41.4% -42.6% -57.5% -66.2% -67.8%
f4 32,996 28.1% -- -24.9% -26.4% -45.6% -56.7% -58.8%
f6 43,930 70.6% 33.1% -- -2.0% -27.5% -42.4% -45.1%
f7 44,823 74.1% 35.8% 2.0% -- -26.1% -41.2% -44.0%
f3 60,624 135.5% 83.7% 38.0% 35.3% -- -20.5% -24.2%
f1 76,244 196.1% 131.1% 73.6% 70.1% 25.8% -- -4.7%
f5 80,026 210.8% 142.5% 82.2% 78.5% 32.0% 5.0% --
1000 rand ints between (0, 5):
=====
rate/sec f7 f6 f1 f3 f2 f4 f5
f7 4,993 -- -6.7% -34.6% -39.4% -44.4% -50.1% -71.1%
f6 5,353 7.2% -- -29.9% -35.0% -40.4% -46.5% -69.0%
f1 7,640 53.0% 42.7% -- -7.3% -14.9% -23.6% -55.8%
f3 8,242 65.1% 54.0% 7.9% -- -8.2% -17.6% -52.3%
f2 8,982 79.9% 67.8% 17.6% 9.0% -- -10.2% -48.1%
f4 10,004 100.4% 86.9% 30.9% 21.4% 11.4% -- -42.1%
f5 17,293 246.4% 223.0% 126.3% 109.8% 92.5% 72.9% --
1000 rand ints between (0, 50):
=====
rate/sec f7 f6 f1 f2 f3 f4 f5
f7 5,051 -- -7.1% -26.5% -29.0% -34.1% -45.7% -71.2%
f6 5,435 7.6% -- -20.9% -23.6% -29.1% -41.5% -69.0%
f1 6,873 36.1% 26.5% -- -3.4% -10.3% -26.1% -60.8%
f2 7,118 40.9% 31.0% 3.6% -- -7.1% -23.4% -59.4%
f3 7,661 51.7% 41.0% 11.5% 7.6% -- -17.6% -56.3%
f4 9,297 84.0% 71.1% 35.3% 30.6% 21.3% -- -47.0%
f5 17,531 247.1% 222.6% 155.1% 146.3% 128.8% 88.6% --
1000 rand ints between (0, 500):
=====
rate/sec f2 f4 f6 f7 f3 f1 f5
f2 3,985 -- -11.0% -13.6% -14.8% -25.7% -40.4% -66.9%
f4 4,479 12.4% -- -2.9% -4.3% -16.5% -33.0% -62.8%
f6 4,613 15.8% 3.0% -- -1.4% -14.0% -31.0% -61.6%
f7 4,680 17.4% 4.5% 1.4% -- -12.7% -30.0% -61.1%
f3 5,361 34.5% 19.7% 16.2% 14.6% -- -19.8% -55.4%
f1 6,683 67.7% 49.2% 44.9% 42.8% 24.6% -- -44.4%
f5 12,028 201.8% 168.6% 160.7% 157.0% 124.3% 80.0% --
And Python 2:
和 Python 2:
2.7.6 (default, Dec 1 2013, 13:26:15)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)]
100 rand ints between (0, 5):
=====
rate/sec f5 f7 f6 f2 f1 f3 f4
f5 24,955 -- -41.8% -42.5% -51.3% -55.7% -61.6% -65.2%
f7 42,867 71.8% -- -1.2% -16.4% -23.9% -34.0% -40.2%
f6 43,382 73.8% 1.2% -- -15.4% -23.0% -33.2% -39.5%
f2 51,293 105.5% 19.7% 18.2% -- -9.0% -21.0% -28.5%
f1 56,357 125.8% 31.5% 29.9% 9.9% -- -13.2% -21.4%
f3 64,924 160.2% 51.5% 49.7% 26.6% 15.2% -- -9.5%
f4 71,709 187.3% 67.3% 65.3% 39.8% 27.2% 10.5% --
100 rand ints between (0, 50):
=====
rate/sec f2 f5 f7 f6 f4 f3 f1
f2 22,439 -- -4.7% -45.1% -45.5% -50.7% -63.3% -64.5%
f5 23,553 5.0% -- -42.4% -42.8% -48.3% -61.5% -62.8%
f7 40,878 82.2% 73.6% -- -0.7% -10.2% -33.2% -35.4%
f6 41,164 83.4% 74.8% 0.7% -- -9.6% -32.7% -34.9%
f4 45,525 102.9% 93.3% 11.4% 10.6% -- -25.6% -28.0%
f3 61,167 172.6% 159.7% 49.6% 48.6% 34.4% -- -3.3%
f1 63,261 181.9% 168.6% 54.8% 53.7% 39.0% 3.4% --
100 rand ints between (0, 500):
=====
rate/sec f2 f5 f4 f6 f7 f3 f1
f2 13,122 -- -39.9% -56.2% -63.2% -63.8% -75.8% -80.0%
f5 21,837 66.4% -- -27.1% -38.7% -39.8% -59.6% -66.7%
f4 29,945 128.2% 37.1% -- -16.0% -17.4% -44.7% -54.3%
f6 35,633 171.6% 63.2% 19.0% -- -1.7% -34.2% -45.7%
f7 36,257 176.3% 66.0% 21.1% 1.8% -- -33.0% -44.7%
f3 54,113 312.4% 147.8% 80.7% 51.9% 49.2% -- -17.5%
f1 65,570 399.7% 200.3% 119.0% 84.0% 80.8% 21.2% --
1000 rand ints between (0, 5):
=====
rate/sec f5 f7 f6 f1 f2 f3 f4
f5 2,787 -- -37.7% -38.4% -53.3% -59.9% -60.4% -67.0%
f7 4,477 60.6% -- -1.1% -25.0% -35.6% -36.3% -47.0%
f6 4,524 62.3% 1.1% -- -24.2% -34.9% -35.6% -46.5%
f1 5,972 114.3% 33.4% 32.0% -- -14.1% -15.0% -29.3%
f2 6,953 149.5% 55.3% 53.7% 16.4% -- -1.1% -17.7%
f3 7,030 152.2% 57.0% 55.4% 17.7% 1.1% -- -16.8%
f4 8,452 203.3% 88.8% 86.8% 41.5% 21.6% 20.2% --
1000 rand ints between (0, 50):
=====
rate/sec f5 f7 f6 f2 f1 f3 f4
f5 2,667 -- -37.8% -38.7% -53.0% -55.9% -61.1% -65.3%
f7 4,286 60.7% -- -1.5% -24.5% -29.1% -37.5% -44.2%
f6 4,351 63.1% 1.5% -- -23.4% -28.0% -36.6% -43.4%
f2 5,677 112.8% 32.4% 30.5% -- -6.1% -17.3% -26.1%
f1 6,045 126.6% 41.0% 39.0% 6.5% -- -11.9% -21.4%
f3 6,862 157.3% 60.1% 57.7% 20.9% 13.5% -- -10.7%
f4 7,687 188.2% 79.3% 76.7% 35.4% 27.2% 12.0% --
1000 rand ints between (0, 500):
=====
rate/sec f2 f5 f7 f6 f4 f3 f1
f2 2,018 -- -16.1% -44.1% -46.2% -53.4% -61.8% -63.0%
f5 2,405 19.1% -- -33.4% -35.9% -44.5% -54.4% -55.9%
f7 3,609 78.8% 50.1% -- -3.8% -16.7% -31.6% -33.8%
f6 3,753 85.9% 56.1% 4.0% -- -13.4% -28.9% -31.2%
f4 4,334 114.7% 80.2% 20.1% 15.5% -- -17.9% -20.5%
f3 5,277 161.5% 119.5% 46.2% 40.6% 21.8% -- -3.2%
f1 5,454 170.2% 126.8% 51.1% 45.3% 25.8% 3.3% --
So -- it depends.
所以 - 这取决于。
Conclusions:
结论:
TheCountermethod is almost always among the slowest- The
Countermethod is among the slowest on Python 2 but by far the fastest on Python 3.4 - The
try/exceptversion is usuallyamong the slowest - The
if key in dictversion is predictably one of the best/fastest regardless of the size or key count - The
{}.fromkeys(tgt, 0)is very predictable - The
defaultdictversion is fastest on larger lists. Smaller lists the longer setup time is amortized over too few elements.
该Counter方法几乎总是最慢的- 该
Counter方法在 Python 2 上是最慢的,但在 Python 3.4 上是最快的 - 该
try/except版本通常是最慢的 if key in dict无论大小或键数如何,该版本都是最好/最快的版本之一- 这
{}.fromkeys(tgt, 0)是非常可预测的 - 该
defaultdict版本在较大的列表中速度最快。列表越小,设置时间越长,元素就越少。
回答by Ramchandra Apte
NOTE: purely speculative
注:纯属推测
I think the first would be slower as it locates the key in the dictionary twice, first in the if statement, then in the C code for dictionary access. The try-except could be slower when many of the keys aren't in the dictionary, as handling the exception involves some overhead.
我认为第一个会更慢,因为它在字典中定位键两次,首先在 if 语句中,然后在用于字典访问的 C 代码中。当许多键不在字典中时,try-except 可能会更慢,因为处理异常涉及一些开销。
I set the list to range(100)and left the dictionary empty. The first using iftakes 8.003 seconds and the second using try-except takes 30.976 seconds! The overhead is quite significant in a case like this where there is nothing much else being done.
我将列表设置为range(100)并将字典留空。第一次使用if需要 8.003 秒,第二次使用 try-except 需要 30.976 秒!在这种情况下,没有其他任何事情要做,开销非常大。
回答by JDong
Update: Not sure if I was testing the right thing anymore, but still found the results interesting.
更新:不确定我是否在测试正确的东西,但仍然发现结果很有趣。
Python 2:
蟒蛇2:
0% missing keys, Standard access: 0.419198036194
0% missing keys, try/except access: 0.309811115265
50% missing keys, Standard access: 0.417014837265
50% missing keys, try/except access: 0.309100866318
100% missing keys, Standard access: 0.416236877441
100% missing keys, try/except access: 0.310797929764
I tested 3 dictionaries with varying amounts of keys, using the normal and the try/except method. The try/except method was faster each time for me.
我使用 normal 和 try/except 方法测试了 3 个具有不同键值的字典。对我来说,try/except 方法每次都更快。
My code:
我的代码:
from timeit import timeit
size = 2**10
allkeys = "0% missing keys", dict([(i, 0) for i in range(size)])
somekeys= "50% missing keys", dict([(i*2, 0) for i in range(size//2)])
nokeys = "100% missing keys", dict([])
def test_normal():
"""Standard access"""
for i in xrange(size):
if i in d:
d[i] += 1
else:
d[i] = 1
def test_try():
"""try/except access"""
for i in xrange(size):
try:
d[i] += 1
except KeyError:
d[i] = 1
for trial in (allkeys, somekeys, nokeys):
d = trial[1]
for test in (test_normal, test_try):
trial_time = timeit("test()",
setup="from __main__ import test",
number=2**10)
print "{0}, {1}: {2}".format(trial[0], test.__doc__, trial_time)
The code now uses timeit, which is probably more accurate.
代码现在使用 timeit,这可能更准确。
回答by Günther Jena
There is another point when it comes to coding style. As it's common python coding style to use EAFP(Easier to ask for forgiveness than permission) which assumes the existence of valid keys and catches exceptions if the assumption proves false.
在编码风格方面还有另外一点。因为使用EAFP(比许可更容易请求宽恕)是常见的 Python 编码风格,它假设存在有效密钥并在假设证明错误时捕获异常。
Due this common coding style I've always used the try/except approach and was sure that this is faster than LBYLstyle (Look before you leap). As I learned by the answers here it definitely depends. As long as you can expect an existing key I would go for the try/except approach.
由于这种常见的编码风格,我一直使用 try/except 方法,并确信这比LBYL风格更快(跳之前先看)。正如我从这里的答案中了解到的那样,这绝对取决于。只要您可以期待现有的密钥,我就会采用 try/except 方法。
回答by fgfsdg
import random
from pip._vendor.distlib.compat import raw_input
x=random.randint(1,99)
guess = int(raw_input("Enter a integer from 1 to 99:"))
while x !="guess":
print
if guess<x:
print ("guess is low")
guess= int(raw_input("Enter a integer from 1 to 99:"))
elif guess >x:
print ("guess is high")
guess = int(raw_input("Enter a integer from 1 to 99:"))
else:
print (" you guessed it !")
break
print

