Python OrderedDict vs defaultdict vs dict
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19629682/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
OrderedDict vs defaultdict vs dict
提问by Abhijit
In python's library, we now have two Python Implementation of dictionaries which subclasses dict
over and above the native dict
type.
在python的库中,我们现在有两个字典的Python实现,它们dict
在本机dict
类型之上和之上进行了子类化。
Python's advocates have always preferred to defaultdict
over using dict.setdefault
where possible. Even the docquotes that This technique is simpler and faster than an equivalent technique using dict.setdefault():
Python 的拥护者总是倾向于在可能的情况下defaultdict
过度使用dict.setdefault
。甚至文档也引用了This technique is simpler and faster than an equivalent technique using dict.setdefault():
In similar ways, as dictionaries does not maintain order, using OrderedDict
over using dict
followed by sorting the items is preferred when ever possible for the alternative usage.
以类似的方式,由于字典不保持顺序,因此在可能的情况下,首选使用OrderedDict
over usingdict
然后对项目进行排序。
In both the above case, the code is definitely cleaner but at the cost of performance penalty.
在上述两种情况下,代码肯定更干净,但以性能损失为代价。
While answering and commenting on one of the question python unique list based on item, I stumbled upon the performance penalty over the native dict
when using defaultdict
and OrderedDict
. It also seems the size of the data is also not immaterial to the performance advantage dict
solution has over others.
在回答和评论基于 item 的 python unique list问题之一时,我偶然发现了dict
使用defaultdict
和时对本机的性能损失OrderedDict
。似乎数据的大小对于dict
解决方案相对于其他解决方案的性能优势也并非无关紧要。
I believe There should be one-- and preferably only one --obvious way to do it.
, so what is the preferred way?
我相信There should be one-- and preferably only one --obvious way to do it.
,那么首选的方式是什么?
回答by ojdo
I feel that your assumption - only one preferable way- does not hold. I see at least twocases with different requirements:
我觉得你的假设 -只有一种更好的方式- 不成立。我看到至少有两种不同要求的案例:
In maintenance-intensivecode (e.g. an option parser of an evolving utility class) I would always go for cleaner code, so that others and I can implement new features more easily. The performance is not critical, as only small quantities (e.g. a dict of user settings) are processed.
在维护密集型代码(例如,一个不断发展的实用程序类的选项解析器)中,我总是选择更简洁的代码,以便其他人和我可以更轻松地实现新功能。性能并不重要,因为只处理少量(例如用户设置的字典)。
while in
而在
the implementation of a performance-criticalalgorithm in a data processing task, I would not mind writing a bit more verbose code for much faster execution. If the alorithm is unlikely to change, the little less readable code won't become an issue.
在数据处理任务中实现性能关键算法,我不介意编写更冗长的代码以更快地执行。如果算法不太可能改变,那么可读性稍差的代码就不会成为问题。
回答by dawg
There is not one single answer and not one true and only dict. Among many variables, it depends on:
没有一个单一的答案,也没有一个真实且唯一的口述。在众多变量中,它取决于:
- Size of the data set;
- Number of unique keys vs the number of duplicate keys in the set of data mappings;
- Speed of the underlying factory for defaultdict;
- Speed of OrderDict vs some later ordering step;
- Version of Python.
- 数据集的大小;
- 数据映射集中唯一键的数量与重复键的数量;
- defaultdict 底层工厂的速度;
- OrderDict 的速度与一些稍后的订购步骤;
- Python的版本。
I am loatheto generalize, but here are some generalities:
我不愿意一概而论,但这里有一些概括:
- The statement
This technique is simpler and faster than an equivalent technique using dict.setdefault()
is just flat wrong. It depends on the data; setdefault
is faster and simpler with small data sets;defaultdict
is faster for larger data sets with more homogenous key sets (ie, how short the dict is after adding elements);setdefault
has an advantage with more heterogeneous key sets;- these results are different for Python 3 vs Python 2;
OrderedDict
is slower in all cases other than an algorithm that depends on order and order is not easy to reconstruct or sort;- Python 3 is generally faster for most
dict
operations; - Python 3.6's dict is now ordered by insertion order (reducing the usefulness of
OrderedDict
).
- 该声明
This technique is simpler and faster than an equivalent technique using dict.setdefault()
完全错误。这取决于数据; setdefault
使用小数据集更快更简单;defaultdict
对于具有更多同质键集的更大数据集(即添加元素后字典有多短),速度更快;setdefault
具有更多异构密钥集的优势;- Python 3 与 Python 2 的这些结果不同;
OrderedDict
除了依赖顺序和顺序不容易重构或排序的算法之外,在所有情况下都较慢;- 对于大多数
dict
操作,Python 3 通常更快; - Python 3.6 的 dict 现在按插入顺序排序(减少了 的用处
OrderedDict
)。
The only truth: It Depends!All three technique are useful.
唯一的真相:这取决于!这三种技术都很有用。
Here is some timing code to show:
这是一些要显示的计时代码:
from __future__ import print_function
from collections import defaultdict
from collections import OrderedDict
try:
t=unichr(100)
except NameError:
unichr=chr
def f1(li):
'''defaultdict'''
d = defaultdict(list)
for k, v in li:
d[k].append(v)
return d.items()
def f2(li):
'''setdefault'''
d={}
for k, v in li:
d.setdefault(k, []).append(v)
return d.items()
def f3(li):
'''OrderedDict'''
d=OrderedDict()
for k, v in li:
d.setdefault(k, []).append(v)
return d.items()
if __name__ == '__main__':
import timeit
import sys
print(sys.version)
few=[('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
fmt='{:>12}: {:10.2f} micro sec/call ({:,} elements, {:,} keys)'
for tag, m, n in [('small',5,10000), ('medium',20,1000), ('bigger',1000,100), ('large',5000,10)]:
for f in [f1,f2,f3]:
s = few*m
res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n)
st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s)))
print(st)
s = [(unichr(i%0x10000),i) for i in range(1,len(s)+1)]
res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n)
st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s)))
print(st)
print()
Python 2.7 result:
Python 2.7 结果:
2.7.5 (default, Aug 25 2013, 00:04:04)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]
defaultdict: 10.20 micro sec/call (25 elements, 3 keys)
defaultdict: 21.08 micro sec/call (25 elements, 25 keys)
setdefault: 13.41 micro sec/call (25 elements, 3 keys)
setdefault: 18.24 micro sec/call (25 elements, 25 keys)
OrderedDict: 49.47 micro sec/call (25 elements, 3 keys)
OrderedDict: 102.16 micro sec/call (25 elements, 25 keys)
defaultdict: 28.28 micro sec/call (100 elements, 3 keys)
defaultdict: 79.78 micro sec/call (100 elements, 100 keys)
setdefault: 45.68 micro sec/call (100 elements, 3 keys)
setdefault: 68.66 micro sec/call (100 elements, 100 keys)
OrderedDict: 117.78 micro sec/call (100 elements, 3 keys)
OrderedDict: 343.17 micro sec/call (100 elements, 100 keys)
defaultdict: 1123.60 micro sec/call (5,000 elements, 3 keys)
defaultdict: 4250.44 micro sec/call (5,000 elements, 5,000 keys)
setdefault: 2089.86 micro sec/call (5,000 elements, 3 keys)
setdefault: 3803.03 micro sec/call (5,000 elements, 5,000 keys)
OrderedDict: 4399.16 micro sec/call (5,000 elements, 3 keys)
OrderedDict: 16279.14 micro sec/call (5,000 elements, 5,000 keys)
defaultdict: 5609.39 micro sec/call (25,000 elements, 3 keys)
defaultdict: 25351.60 micro sec/call (25,000 elements, 25,000 keys)
setdefault: 10267.00 micro sec/call (25,000 elements, 3 keys)
setdefault: 24091.51 micro sec/call (25,000 elements, 25,000 keys)
OrderedDict: 22091.98 micro sec/call (25,000 elements, 3 keys)
OrderedDict: 94028.00 micro sec/call (25,000 elements, 25,000 keys)
Python 3.3 result:
Python 3.3 结果:
3.3.2 (default, May 21 2013, 11:50:47)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))]
defaultdict: 8.58 micro sec/call (25 elements, 3 keys)
defaultdict: 21.18 micro sec/call (25 elements, 25 keys)
setdefault: 10.42 micro sec/call (25 elements, 3 keys)
setdefault: 14.58 micro sec/call (25 elements, 25 keys)
OrderedDict: 45.43 micro sec/call (25 elements, 3 keys)
OrderedDict: 92.69 micro sec/call (25 elements, 25 keys)
defaultdict: 20.47 micro sec/call (100 elements, 3 keys)
defaultdict: 77.48 micro sec/call (100 elements, 100 keys)
setdefault: 34.22 micro sec/call (100 elements, 3 keys)
setdefault: 54.86 micro sec/call (100 elements, 100 keys)
OrderedDict: 107.37 micro sec/call (100 elements, 3 keys)
OrderedDict: 318.98 micro sec/call (100 elements, 100 keys)
defaultdict: 714.70 micro sec/call (5,000 elements, 3 keys)
defaultdict: 3892.92 micro sec/call (5,000 elements, 5,000 keys)
setdefault: 1502.91 micro sec/call (5,000 elements, 3 keys)
setdefault: 2888.08 micro sec/call (5,000 elements, 5,000 keys)
OrderedDict: 3912.95 micro sec/call (5,000 elements, 3 keys)
OrderedDict: 14863.02 micro sec/call (5,000 elements, 5,000 keys)
defaultdict: 3649.02 micro sec/call (25,000 elements, 3 keys)
defaultdict: 22313.17 micro sec/call (25,000 elements, 25,000 keys)
setdefault: 7447.28 micro sec/call (25,000 elements, 3 keys)
setdefault: 18426.88 micro sec/call (25,000 elements, 25,000 keys)
OrderedDict: 19202.17 micro sec/call (25,000 elements, 3 keys)
OrderedDict: 85946.45 micro sec/call (25,000 elements, 25,000 keys)