Python OrderedDict vs defaultdict vs dict

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19629682/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:14:02  来源:igfitidea点击:

OrderedDict vs defaultdict vs dict

pythondictionary

提问by Abhijit

In python's library, we now have two Python Implementation of dictionaries which subclasses dictover and above the native dicttype.

在python的库中,我们现在有两个字典的Python实现,它们dict在本机dict类型之上和之上进行了子类化。

Python's advocates have always preferred to defaultdictover using dict.setdefaultwhere possible. Even the docquotes that This technique is simpler and faster than an equivalent technique using dict.setdefault():

Python 的拥护者总是倾向于在可能的情况下defaultdict过度使用dict.setdefault。甚至文档也引用了This technique is simpler and faster than an equivalent technique using dict.setdefault():

In similar ways, as dictionaries does not maintain order, using OrderedDictover using dictfollowed by sorting the items is preferred when ever possible for the alternative usage.

以类似的方式,由于字典不保持顺序,因此在可能的情况下,首选使用OrderedDictover usingdict然后对项目进行排序。

In both the above case, the code is definitely cleaner but at the cost of performance penalty.

在上述两种情况下,代码肯定更干净,但以性能损失为代价。

While answering and commenting on one of the question python unique list based on item, I stumbled upon the performance penalty over the native dictwhen using defaultdictand OrderedDict. It also seems the size of the data is also not immaterial to the performance advantage dictsolution has over others.

在回答和评论基于 item 的 python unique list问题之一时,我偶然发现了dict使用defaultdict和时对本机的性能损失OrderedDict。似乎数据的大小对于dict解决方案相对于其他解决方案的性能优势也并非无关紧要。

I believe There should be one-- and preferably only one --obvious way to do it., so what is the preferred way?

我相信There should be one-- and preferably only one --obvious way to do it.,那么首选的方式是什么?

回答by ojdo

I feel that your assumption - only one preferable way- does not hold. I see at least twocases with different requirements:

我觉得你的假设 -只有一种更好的方式- 不成立。我看到至少有两种不同要求的案例:

In maintenance-intensivecode (e.g. an option parser of an evolving utility class) I would always go for cleaner code, so that others and I can implement new features more easily. The performance is not critical, as only small quantities (e.g. a dict of user settings) are processed.

维护密集型代码(例如,一个不断发展的实用程序类的选项解析器)中,我总是选择更简洁的代码,以便其他人和我可以更轻松地实现新功能。性能并不重要,因为只处理少量(例如用户设置的字典)。

while in

而在

the implementation of a performance-criticalalgorithm in a data processing task, I would not mind writing a bit more verbose code for much faster execution. If the alorithm is unlikely to change, the little less readable code won't become an issue.

在数据处理任务中实现性能关键算法,我不介意编写更冗长的代码以更快地执行。如果算法不太可能改变,那么可读性稍差的代码就不会成为问题。

回答by dawg

There is not one single answer and not one true and only dict. Among many variables, it depends on:

没有一个单一的答案,也没有一个真实且唯一的口述。在众多变量中,它取决于:

  1. Size of the data set;
  2. Number of unique keys vs the number of duplicate keys in the set of data mappings;
  3. Speed of the underlying factory for defaultdict;
  4. Speed of OrderDict vs some later ordering step;
  5. Version of Python.
  1. 数据集的大小;
  2. 数据映射集中唯一键的数量与重复键的数量;
  3. defaultdict 底层工厂的速度;
  4. OrderDict 的速度与一些稍后的订购步骤;
  5. Python的版本。

I am loatheto generalize, but here are some generalities:

不愿意一概而论,但这里有一些概括:

  1. The statement This technique is simpler and faster than an equivalent technique using dict.setdefault()is just flat wrong. It depends on the data;
  2. setdefaultis faster and simpler with small data sets;
  3. defaultdictis faster for larger data sets with more homogenous key sets (ie, how short the dict is after adding elements);
  4. setdefaulthas an advantage with more heterogeneous key sets;
  5. these results are different for Python 3 vs Python 2;
  6. OrderedDictis slower in all cases other than an algorithm that depends on order and order is not easy to reconstruct or sort;
  7. Python 3 is generally faster for most dictoperations;
  8. Python 3.6's dict is now ordered by insertion order (reducing the usefulness of OrderedDict).
  1. 该声明This technique is simpler and faster than an equivalent technique using dict.setdefault()完全错误。这取决于数据;
  2. setdefault使用小数据集更快更简单;
  3. defaultdict对于具有更多同质键集的更大数据集(即添加元素后字典有多短),速度更快;
  4. setdefault具有更多异构密钥集的优势;
  5. Python 3 与 Python 2 的这些结果不同;
  6. OrderedDict除了依赖顺序和顺序不容易重构或排序的算法之外,在所有情况下都较慢;
  7. 对于大多数dict操作,Python 3 通常更快;
  8. Python 3.6 的 dict 现在按插入顺序排序(减少了 的用处OrderedDict)。

The only truth: It Depends!All three technique are useful.

唯一的真相:这取决于!这三种技术都很有用。

Here is some timing code to show:

这是一些要显示的计时代码:

from __future__ import print_function
from collections import defaultdict
from collections import OrderedDict

try:
    t=unichr(100)
except NameError:
    unichr=chr

def f1(li):
    '''defaultdict'''
    d = defaultdict(list)
    for k, v in li:
        d[k].append(v)
    return d.items()

def f2(li):
    '''setdefault'''
    d={}
    for k, v in li:
        d.setdefault(k, []).append(v)
    return d.items()

def f3(li):
    '''OrderedDict'''
    d=OrderedDict()
    for k, v in li:
        d.setdefault(k, []).append(v)
    return d.items()      


if __name__ == '__main__':
    import timeit
    import sys
    print(sys.version)
    few=[('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
    fmt='{:>12}: {:10.2f} micro sec/call ({:,} elements, {:,} keys)'
    for tag, m, n in [('small',5,10000), ('medium',20,1000), ('bigger',1000,100), ('large',5000,10)]:
        for f in [f1,f2,f3]:
            s = few*m
            res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n)
            st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s)))
            print(st)
            s = [(unichr(i%0x10000),i) for i in range(1,len(s)+1)]
            res=timeit.timeit("{}(s)".format(f.__name__), setup="from __main__ import {}, s".format(f.__name__), number=n)
            st=fmt.format(f.__doc__, res/n*1000000, len(s), len(f(s)))
            print(st)            
        print() 

Python 2.7 result:

Python 2.7 结果:

2.7.5 (default, Aug 25 2013, 00:04:04) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)]
 defaultdict:      10.20 micro sec/call (25 elements, 3 keys)
 defaultdict:      21.08 micro sec/call (25 elements, 25 keys)
  setdefault:      13.41 micro sec/call (25 elements, 3 keys)
  setdefault:      18.24 micro sec/call (25 elements, 25 keys)
 OrderedDict:      49.47 micro sec/call (25 elements, 3 keys)
 OrderedDict:     102.16 micro sec/call (25 elements, 25 keys)

 defaultdict:      28.28 micro sec/call (100 elements, 3 keys)
 defaultdict:      79.78 micro sec/call (100 elements, 100 keys)
  setdefault:      45.68 micro sec/call (100 elements, 3 keys)
  setdefault:      68.66 micro sec/call (100 elements, 100 keys)
 OrderedDict:     117.78 micro sec/call (100 elements, 3 keys)
 OrderedDict:     343.17 micro sec/call (100 elements, 100 keys)

 defaultdict:    1123.60 micro sec/call (5,000 elements, 3 keys)
 defaultdict:    4250.44 micro sec/call (5,000 elements, 5,000 keys)
  setdefault:    2089.86 micro sec/call (5,000 elements, 3 keys)
  setdefault:    3803.03 micro sec/call (5,000 elements, 5,000 keys)
 OrderedDict:    4399.16 micro sec/call (5,000 elements, 3 keys)
 OrderedDict:   16279.14 micro sec/call (5,000 elements, 5,000 keys)

 defaultdict:    5609.39 micro sec/call (25,000 elements, 3 keys)
 defaultdict:   25351.60 micro sec/call (25,000 elements, 25,000 keys)
  setdefault:   10267.00 micro sec/call (25,000 elements, 3 keys)
  setdefault:   24091.51 micro sec/call (25,000 elements, 25,000 keys)
 OrderedDict:   22091.98 micro sec/call (25,000 elements, 3 keys)
 OrderedDict:   94028.00 micro sec/call (25,000 elements, 25,000 keys)

Python 3.3 result:

Python 3.3 结果:

3.3.2 (default, May 21 2013, 11:50:47) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))]
 defaultdict:       8.58 micro sec/call (25 elements, 3 keys)
 defaultdict:      21.18 micro sec/call (25 elements, 25 keys)
  setdefault:      10.42 micro sec/call (25 elements, 3 keys)
  setdefault:      14.58 micro sec/call (25 elements, 25 keys)
 OrderedDict:      45.43 micro sec/call (25 elements, 3 keys)
 OrderedDict:      92.69 micro sec/call (25 elements, 25 keys)

 defaultdict:      20.47 micro sec/call (100 elements, 3 keys)
 defaultdict:      77.48 micro sec/call (100 elements, 100 keys)
  setdefault:      34.22 micro sec/call (100 elements, 3 keys)
  setdefault:      54.86 micro sec/call (100 elements, 100 keys)
 OrderedDict:     107.37 micro sec/call (100 elements, 3 keys)
 OrderedDict:     318.98 micro sec/call (100 elements, 100 keys)

 defaultdict:     714.70 micro sec/call (5,000 elements, 3 keys)
 defaultdict:    3892.92 micro sec/call (5,000 elements, 5,000 keys)
  setdefault:    1502.91 micro sec/call (5,000 elements, 3 keys)
  setdefault:    2888.08 micro sec/call (5,000 elements, 5,000 keys)
 OrderedDict:    3912.95 micro sec/call (5,000 elements, 3 keys)
 OrderedDict:   14863.02 micro sec/call (5,000 elements, 5,000 keys)

 defaultdict:    3649.02 micro sec/call (25,000 elements, 3 keys)
 defaultdict:   22313.17 micro sec/call (25,000 elements, 25,000 keys)
  setdefault:    7447.28 micro sec/call (25,000 elements, 3 keys)
  setdefault:   18426.88 micro sec/call (25,000 elements, 25,000 keys)
 OrderedDict:   19202.17 micro sec/call (25,000 elements, 3 keys)
 OrderedDict:   85946.45 micro sec/call (25,000 elements, 25,000 keys)