python copy.deepcopy 与泡菜

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1410615/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 22:10:27  来源:igfitidea点击:

copy.deepcopy vs pickle

pythonpickledeep-copy

提问by Anurag Uniyal

I have a tree structure of widgets e.g. collection contains models and model contains widgets. I want to copy whole collection, copy.deepcopyis faster in comparison to 'pickle and de-pickle'ing the object but cPickle as being written in C is much faster, so

我有一个小部件的树结构,例如集合包含模型,模型包含小部件。我想复制整个集合,copy.deepcopy与“pickle 和 de-pickle”对象相比更快,但用 C 编写的 cPickle 要快得多,所以

  1. Why shouldn't I(we) always be using cPickle instead of deepcopy?
  2. Is there any other copy alternative? because pickle is slower then deepcopy but cPickle is faster, so may be a C implementation of deepcopy will be the winner
  1. 为什么我(我们)不应该总是使用 cPickle 而不是 deepcopy?
  2. 有没有其他的副本选择?因为pickle比deepcopy慢但cPickle更快,所以deepcopy的C实现可能是赢家

Sample test code:

示例测试代码:

import copy
import pickle
import cPickle

class A(object): pass

d = {}
for i in range(1000):
    d[i] = A()

def copy1():
    return copy.deepcopy(d)

def copy2():
    return pickle.loads(pickle.dumps(d, -1))

def copy3():
    return cPickle.loads(cPickle.dumps(d, -1))

Timings:

时间:

>python -m timeit -s "import c" "c.copy1()"
10 loops, best of 3: 46.3 msec per loop

>python -m timeit -s "import c" "c.copy2()"
10 loops, best of 3: 93.3 msec per loop

>python -m timeit -s "import c" "c.copy3()"
100 loops, best of 3: 17.1 msec per loop

回答by Alex Martelli

Problem is, pickle+unpickle can be faster (in the C implementation) because it's less generalthan deepcopy: many objects can be deepcopied but not pickled. Suppose for example that your class Awere changed to...:

问题是,pickle+unpickle 可以更快(在 C 实现中),因为它不如 deepcopy通用:许多对象可以被深度复制但不能被pickled。例如,假设您的班级A已更改为...:

class A(object):
  class B(object): pass
  def __init__(self): self.b = self.B()

now, copy1still works fine (A's complexity slows it downs but absolutely doesn't stop it); copy2and copy3break, the end of the stack trace says...:

现在,copy1仍然可以正常工作(A 的复杂性会减慢它的速度,但绝对不会阻止它);copy2copy3中断,堆栈跟踪的末尾说...:

  File "./c.py", line 20, in copy3
    return cPickle.loads(cPickle.dumps(d, -1))
PicklingError: Can't pickle <class 'c.B'>: attribute lookup c.B failed

I.e., pickling always assumes that classes and functions are top-level entities in their modules, and so pickles them "by name" -- deepcopying makes absolutely no such assumptions.

即,pickling 总是假设类和函数是其模块中的顶级实体,因此“按名称”对它们进行 pickle —— 深度复制绝对没有这样的假设。

So if you have a situation where speed of "somewhat deep-copying" is absolutely crucial, every millisecond matters, AND you want to take advantage of special limitations that you KNOW apply to the objects you're duplicating, such as those that make pickling applicable, or ones favoring other forms yet of serializations and other shortcuts, by all means go ahead - but if you do you MUST be aware that you're constraining your system to live by those limitations forevermore, and document that design decision very clearly and explicitly for the benefit of future maintainers.

因此,如果您遇到“有点深复制”的速度绝对至关重要的情况,那么每一毫秒都很重要,并且您想利用您知道适用于您正在复制的对象的特殊限制,例如那些进行酸洗的对象适用的,或者偏爱其他形式的序列化和其他快捷方式的人,无论如何都可以继续 - 但如果你这样做,你必须意识到你正在限制你的系统永远遵守这些限制,并非常清楚地记录该设计决定并且明确为了未来维护者的利益。

For the NORMAL case, where you want generality, use deepcopy!-)

对于 NORMAL 情况,您需要通用性,请使用deepcopy!-)

回答by wds

You should be using deepcopy because it makes your code more readable. Using a serialization mechanism to copy objects in memory is at the very least confusing to another developer reading your code. Using deepcopy also means you get to reap the benefits of future optimizations in deepcopy.

您应该使用 deepcopy,因为它使您的代码更具可读性。使用序列化机制复制内存中的对象至少会让阅读您代码的其他开发人员感到困惑。使用 deepcopy 还意味着您可以在 deepcopy 中获得未来优化的好处。

First rule of optimization: don't.

优化的第一条规则:不要。

回答by hans_meine

It is notalways the case that cPickle is faster than deepcopy(). While cPickle is probably always faster than pickle, whether it is faster than deepcopy depends on

这是不是每次都这样的cPickle比deepcopy的更快()。虽然 cPickle 可能总是比 pickle 快,但它是否比 deepcopy 快取决于

  • the size and nesting level of the structures to be copied,
  • the type of contained objects, and
  • the size of the pickled string representation.
  • 要复制的结构的大小和嵌套级别,
  • 包含对象的类型,以及
  • 腌制字符串表示的大小。

If something can be pickled, it can obviously be deepcopied, but the opposite is not the case: In order to pickle something, it needs to be fully serialized; this is not the case for deepcopying. In particular, you can implement __deepcopy__very efficiently by copying a structure in memory (think of extension types), without being able to save everything to disk. (Think of suspend-to-RAM vs. suspend-to-disk.)

如果可以pickle,显然可以deepcopy,但反之则不然:要pickle,需要完全序列化;深度复制不是这种情况。特别是,您可以__deepcopy__通过在内存中复制结构(想想扩展类型)来非常有效地实现,而无需将所有内容保存到磁盘。(想想挂起到 RAM 与挂起到磁盘。)

A well-known extension type that fulfills the conditions above may be ndarray, and indeed, it serves as a good counterexample to your observation: With d = numpy.arange(100000000), your code gives different runtimes:

满足上述条件的众所周知的扩展类型可能是ndarray,实际上,它可以作为您观察的一个很好的反例:使用d = numpy.arange(100000000),您的代码提供不同的运行时:

In [1]: import copy, pickle, cPickle, numpy

In [2]: d = numpy.arange(100000000)

In [3]: %timeit pickle.loads(pickle.dumps(d, -1))
1 loops, best of 3: 2.95 s per loop

In [4]: %timeit cPickle.loads(cPickle.dumps(d, -1))
1 loops, best of 3: 2.37 s per loop

In [5]: %timeit copy.deepcopy(d)
1 loops, best of 3: 459 ms per loop

If __deepcopy__is not implemented, copyand pickleshare common infrastructure (cf. copy_regmodule, discussed in Relationship between pickle and deepcopy).

如果__deepcopy__未实现,copypickle共享公共基础设施(参见copy_reg模块,在所讨论的泡菜的关系和deepcopy的)。

回答by Ned Batchelder

Even faster would be to avoid the copy in the first place. You mention that you are doing rendering. Why does it need to copy objects?

更快的是首先避免复制。你提到你正在做渲染。为什么需要复制对象?

回答by Lars

Short and somewhat late:

简短而迟到:

  • If you have to cPickle an object anyway, you might as well use the cPickle method to deepcopy (but document)
  • 如果无论如何你必须 cPickle 一个对象,你不妨使用 cPickle 方法来深度复制(但文档)

e.g. You might consider:

例如,您可能会考虑:

def mydeepcopy(obj):
    try:
       return cPickle.loads(cPickle.dumps(obj, -1))
    except PicklingError:
       return deepcopy(obj)