由于您的代码,Python 中是否可能存在实际内存泄漏?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2017381/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 23:34:25  来源:igfitidea点击:

Is it possible to have an actual memory leak in Python because of your code?

pythonmemory-leaks

提问by orokusaki

I don't have a code example, but I'm curious whether it's possible to write Python code that results in essentially a memory leak.

我没有代码示例,但我很好奇是否可以编写导致内存泄漏的 Python 代码。

回答by Crast

It is possible, yes.

有可能,是的。

It depends on what kind of memory leak you are talking about. Within pure python code, it's not possible to "forget to free" memory such as in C, but it is possible to leave a reference hanging somewhere. Some examples of such:

这取决于您所谈论的内存泄漏类型。在纯 python 代码中,不可能像在 C 中那样“忘记释放”内存,但可以将引用挂在某处。一些这样的例子:

an unhandled traceback object that is keeping an entire stack frame alive, even though the function is no longer running

一个未处理的回溯对象,即使该函数不再运行,它也会使整个堆栈帧保持活动状态

while game.running():
    try:
        key_press = handle_input()
    except SomeException:
        etype, evalue, tb = sys.exc_info()
        # Do something with tb like inspecting or printing the traceback

In this silly example of a game loop maybe, we assigned 'tb' to a local. We had good intentions, but this tb contains frame information about the stack of whatever was happening in our handle_input all the way down to what this called. Presuming your game continues, this 'tb' is kept alive even in your next call to handle_input, and maybe forever. The docs for exc_infonow talk about this potential circular reference issue and recommend simply not assigning tbif you don't absolutely need it. If you need to get a traceback consider e.g. traceback.format_exc

在这个游戏循环的愚蠢示例中,我们可能将 'tb' 分配给本地。我们的意图是好的,但是这个 tb 包含关于在我们的 handle_input 中发生的任何事情的堆栈的帧信息,一直到这个调用。假设您的游戏继续进行,即使在您下一次调用 handle_input 时,这个 'tb' 也会保持活动状态,甚至可能永远存在。exc_info文档现在讨论了这个潜在的循环引用问题,tb如果你不是绝对需要它,建议不要分配。如果您需要获得回溯,请考虑例如traceback.format_exc

storing values in a class or global scope instead of instance scope, and not realizing it.

将值存储在类或全局范围而不是实例范围中,并且没有意识到这一点。

This one can happen in insidious ways, but often happens when you define mutable types in your class scope.

这种情况可能以阴险的方式发生,但通常发生在您在类范围内定义可变类型时。

class Money(object):
    name = ''
    symbols = []   # This is the dangerous line here

    def set_name(self, name):
        self.name = name

    def add_symbol(self, symbol):
        self.symbols.append(symbol)

In the above example, say you did

在上面的例子中,假设你做了

m = Money()
m.set_name('Dollar')
m.add_symbol('$')

You'll probably find thisparticular bug quickly, but in this case you put a mutable value at class scope and even though you correctly access it at instance scope, it's actually "falling through" to the class object's __dict__.

您可能很快就会发现这个特定的错误,但在这种情况下,您在类范围内放置了一个可变值,即使您在实例范围内正确访问它,它实际上还是“落入”到了类对象__dict__.

This used in certain contexts like holding objects could potentially cause things that cause your application's heap to grow forever, and would cause issues in say, a production web application that didn't restart its processes occasionally.

这在某些情况下使用,例如持有对象可能会导致应用程序的堆永远增长的事情,并且会导致问题,例如,生产 Web 应用程序不会偶尔重新启动其进程。

Cyclic references in classes which also have a __del__method.

类中的循环引用也有__del__方法。

Ironically, the existence of a __del__makes it impossible for the cyclic garbage collector to clean an instance up. Say you had something where you wanted to do a destructor for finalization purposes:

具有讽刺意味的是,a 的存在__del__使得循环垃圾收集器无法清理实例。假设你有一些东西想要做一个析构函数来完成:

class ClientConnection(...):
    def __del__(self):
        if self.socket is not None:
            self.socket.close()
            self.socket = None

Now this works fine on its own, and you may be led to believe it's being a good steward of OS resources to ensure the socket is 'disposed' of.

现在这本身就可以正常工作,并且您可能会相信它是操作系统资源的好管家,以确保套接字被“处理”。

However, if ClientConnection kept a reference to say, Userand User kept a reference to the connection, you might be tempted to say that on cleanup, let's have user de-reference the connection. This is actually the flaw, however: the cyclic GC doesn't know the correct order of operations and cannot clean it up.

但是,如果 ClientConnection 保留了对 say 的引用,User而 User 保留了对连接的引用,您可能会想说,在清理时,让用户取消引用连接。然而,这实际上是缺陷:循环 GC 不知道正确的操作顺序并且无法清理它。

The solution to this is to ensure you do cleanup on say, disconnect events by calling some sort of close, but name that method something other than __del__.

解决这个问题的方法是确保你通过调用某种关闭来清除事件,但将该方法命名为__del__.

poorly implemented C extensions, or not properly using C libraries as they are supposed to be.

实现不佳的 C 扩展,或者没有正确使用 C 库,因为它们应该是。

In Python, you trust in the garbage collector to throw away things you aren't using. But if you use a C extension that wraps a C library, the majority of the time you are responsible for making sure you explicitly close or de-allocate resources. Mostly this is documented, but a python programmer who is used to not having to do this explicit de-allocation might throw away the handle (like returning from a function or whatever) to that library without knowing that resources are being held.

在 Python 中,您相信垃圾收集器会丢弃您不使用的东西。但是,如果您使用包装了 C 库的 C 扩展,则在大多数情况下,您有责任确保明确关闭或取消分配资源。大多数情况下这是记录在案的,但是习惯于不必执行这种显式取消分配的 Python 程序员可能会在不知道资源被占用的情况下将句柄(如从函数返回或其他任何东西)扔到该库中。

Scopes which contain closures which contain a whole lot more than you could've anticipated

包含闭包的范围比您预期的要多得多

class User:
    def set_profile(self, profile):
        def on_completed(result):
            if result.success:
                self.profile = profile

        self._db.execute(
            change={'profile': profile},
            on_complete=on_completed
        )

In this contrived example, we appear to be using some sort of 'async' call that will call us back at on_completedwhen the DB call is done (the implementation could've been promises, it ends up with the same outcome).

在这个人为的例子中,我们似乎在使用某种“异步”调用,它会on_completed在数据库调用完成时给我们回电(实现可能是承诺,最终得到相同的结果)。

What you may not realize is that the on_completedclosure binds a reference to selfin order to execute the self.profileassignment. Now, perhaps the DB client keeps track of active queries and pointers to the closures to call when they're done (since it's async) and say it crashes for whatever reason. If the DB client doesn't correctly cleanup callbacks etc, in this case, the DB client now has a reference to on_completed which has a reference to User which keeps a _db- you've now created a circular reference that may never get collected.

您可能没有意识到,on_completed闭包绑定了一个引用self以执行self.profile赋值。现在,也许 DB 客户端会跟踪活动查询和指向闭包的指针以在它们完成时调用(因为它是异步的)并说它因任何原因崩溃。如果 DB 客户端没有正确清理回调等,在这种情况下,DB 客户端现在有一个对 on_completed 的引用,它有一个对 User 的引用,它保留了一个_db- 您现在已经创建了一个可能永远不会被收集的循环引用。

(Even without a circular reference, the fact that closures bind locals and even instances sometimes may cause values you thought were collected to be living for a long time, which could include sockets, clients, large buffers, and entire trees of things)

(即使没有循环引用,闭包绑定局部变量甚至实例的事实有时可能会导致您认为收集的值存在很长时间,其中可能包括套接字、客户端、大缓冲区和整个事物树)

Default parameters which are mutable types

可变类型的默认参数

def foo(a=[]):
    a.append(time.time())
    return a

This is a contrived example, but one could be led to believe that the default value of abeing an empty list means append to it, when it is in fact a reference to the samelist. This again might cause unbounded growth without knowing that you did that.

这是一个人为的例子,但人们可能会认为默认值a是一个空列表意味着附加到它,而实际上它是对同一个列表的引用。这可能会再次导致无限增长,而您不知道您这样做了。

回答by Ned Batchelder

The classic definition of a memory leak is memory that was used once, and now is not, but has not been reclaimed. That nearly impossible with pure Python code. But as Antoine points out, you can easily have the effect of consuming all your memory inadvertently by allowing data structures to grow without bound, even if you don't need to keep all of the data around.

内存泄漏的经典定义是曾经使用过一次但现在没有使用但尚未回收的内存。这对于纯 Python 代码几乎是不可能的。但正如 Antoine 指出的那样,即使您不需要保留所有数据,也可以通过允许数据结构无限制地增长而导致无意中消耗所有内存的效果。

With C extensions, of course, you are back in unmanaged territory, and anything is possible.

当然,使用 C 扩展,您又回到了不受管理的领域,一切皆有可能。

回答by Antoine P.

Of course you can. The typical example of a memory leak is if you build a cache that you never flush manually and that has no automatic eviction policy.

当然可以。内存泄漏的典型示例是,如果您构建了一个从不手动刷新且没有自动逐出策略的缓存。

回答by Rob Curtis

In the sense of orphaning allocated objects after they go out of scope because you forgot to deallocate them, no; Python will automatically deallocate out of scope objects (Garbage Collection). But in the sense that @Antione is talking about, yes.

在分配的对象超出范围后孤立的意义上,因为您忘记释放它们,不;Python 将自动释放超出范围的对象(垃圾收集)。但从@Antione 所说的意义上来说,是的。

回答by sancelot

Since many modules are written in C , yes, it is possible to have memory leaks. imagine you are using a gui paint drawing context (eg with wxpython) , you can create memory buffers but if you forgot to release it. you will have memory leaks... in this case, C++ functions of wx api are wrapped to python.

由于许多模块是用 C 编写的,是的,可能存在内存泄漏。假设您正在使用 gui 绘图上下文(例如使用 wxpython),您可以创建内存缓冲区,但如果您忘记释放它。你将有内存泄漏......在这种情况下,wx api的C++函数被包装到python中。

a bigger wrong usage , imagine you overload these wx widgets methods within python... memoryleaks assured.

一个更大的错误用法,想象一下你在 python 中重载了这些 wx 小部件方法......内存泄漏保证。

回答by peroksid

I create an object with a heavy attribute to show off in the process memory usage.

我创建了一个具有重属性的对象来炫耀进程内存使用情况。

Then I create a dictionary which refers itself for a big number of times.

然后我创建了一个字典,它多次引用自己。

Then I delete the object, and ask GC to collect garrbage. It collects none.

然后我删除对象,并要求GC收集垃圾。它没有收集。

Then I check the process RAM footprint - it is the same.

然后我检查进程 RAM 占用空间 - 它是相同的。

Here you go, memory leak!

来了,内存泄漏!

α python
Python 2.7.15 (default, Oct  2 2018, 11:47:18)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gc
>>> class B(object):
...     b = list(range(1 * 10 ** 8))
...
>>>
[1]+  Stopped                 python
~/Sources/plan9port [git branch:master]
α ps aux | grep python
alexander.pugachev 85164   0.0 19.0  7562952 3188184 s010  T     2:08pm   0:03.78 /usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
~/Sources/plan9port [git branch:master]
α fg
python

>>> b = B()
>>> for i in range(1000):
...     b.a = {'b': b}
...
>>>
[1]+  Stopped                 python
~/Sources/plan9port [git branch:master]
α ps aux | grep python
alexander.pugachev 85164   0.0 19.0  7579336 3188264 s010  T     2:08pm   0:03.79 /usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
~/Sources/plan9port [git branch:master]
α fg
python


>>> b.a['b'].a
{'b': <__main__.B object at 0x109204950>}
>>> del(b)
>>> gc.collect()
0
>>>
[1]+  Stopped                 python
~/Sources/plan9port [git branch:master]
α ps aux | grep python
alexander.pugachev 85164   0.0 19.0  7579336 3188268 s010  T     2:08pm   0:05.13 /usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python