Python:内存泄漏调试

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1339293/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 22:00:10  来源:igfitidea点击:

Python: Memory leak debugging

pythondjangodebuggingmemory-leaks

提问by Paul Tarjan

I have a small multithreaded script running in django and over time its starts using more and more memory. Leaving it for a full day eats about 6GB of RAM and I start to swap.

我有一个在 django 中运行的小型多线程脚本,随着时间的推移,它开始使用越来越多的内存。让它一整天吃掉大约 6GB 的 RAM,我开始交换。

Following http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaksI see this as the most common types (with only 800M of memory used):

http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks之后,我认为这是最常见的类型(仅使用 800M 内存):

(Pdb)  objgraph.show_most_common_types(limit=20)
dict                       43065
tuple                      28274
function                   7335
list                       6157
NavigableString            3479
instance                   2454
cell                       1256
weakref                    974
wrapper_descriptor         836
builtin_function_or_method 766
type                       742
getset_descriptor          562
module                     423
method_descriptor          373
classobj                   256
instancemethod             255
member_descriptor          218
property                   185
Comment                    183
__proxy__                  155

which doesn't show anything weird. What should I do now to help debug the memory problems?

这没有显示任何奇怪的东西。我现在应该做什么来帮助调试内存问题?

Update:Trying some things people are recommending. I ran the program overnight, and when I work up, 50% * 8G == 4G of RAM used.

更新:尝试一些人们推荐的东西。我连夜运行程序,当我工作时,使用了 50% * 8G == 4G 的 RAM。

(Pdb) from pympler import muppy
(Pdb) muppy.print_summary()
                                     types |   # objects |   total size
========================================== | =========== | ============
                                   unicode |      210997 |     97.64 MB
                                      list |        1547 |     88.29 MB
                                      dict |       41630 |     13.21 MB
                                       set |          50 |      8.02 MB
                                       str |      109360 |      7.11 MB
                                     tuple |       27898 |      2.29 MB
                                      code |        6907 |      1.16 MB
                                      type |         760 |    653.12 KB
                                   weakref |        1014 |     87.14 KB
                                       int |        3552 |     83.25 KB
                    function (__wrapper__) |         702 |     82.27 KB
                        wrapper_descriptor |         998 |     77.97 KB
                                      cell |        1357 |     74.21 KB
  <class 'pympler.asizeof.asizeof._Claskey |        1113 |     69.56 KB
                       function (__init__) |         574 |     67.27 KB

That doesn't sum to 4G, nor really give me any big data structured to go fix. The unicode is from a set() of "done" nodes, and the list's look like just random weakrefs.

这不等于 4G,也不会给我任何结构化的大数据来修复。unicode 来自“完成”节点的 set(),并且列表看起来就像 random weakrefs。

I didn't use guppy since it required a C extension and I didn't have root so it was going to be a pain to build.

我没有使用 guppy,因为它需要一个 C 扩展名,而且我没有 root,所以构建起来会很痛苦。

None of the objectI was using have a __del__method, and looking through the libraries, it doesn't look like django nor the python-mysqldb do either. Any other ideas?

我使用的对象都没有__del__方法,查看库,它看起来既不像 django,也不像 python-mysqldb。还有其他想法吗?

回答by Jameson Quinn

See http://opensourcehacker.com/2008/03/07/debugging-django-memory-leak-with-trackrefs-and-guppy/. Short answer: if you're running django but not in a web-request-based format, you need to manually run db.reset_queries()(and of course have DEBUG=False, as others have mentioned). Django automatically does reset_queries()after a web request, but in your format, that never happens.

请参阅http://opensourcehacker.com/2008/03/07/debugging-django-memory-leak-with-trackrefs-and-guppy/。简短回答:如果您正在运行 django 但不是基于 Web 请求的格式,则需要手动运行db.reset_queries()(当然还有其他人提到的 DEBUG=False)。Django 会reset_queries()在 Web 请求后自动执行,但在您的格式中,这永远不会发生。

回答by Paul Tarjan

Is DEBUG=False in settings.py?

在settings.py 中是DEBUG=False 吗?

If not Django will happily store all the SQL queries you make which adds up.

如果没有,Django 会很高兴地存储您所做的所有 SQL 查询。

回答by Nicolas Dumazet

Have you tried gc.set_debug()?

你试过gc.set_debug()吗?

You need to ask yourself simple questions:

你需要问自己一些简单的问题:

  • Am I using objects with __del__methods? Do I absolutely, unequivocally, need them?
  • Can I get reference cycles in my code? Can't we break these circles before getting rid of the objects?
  • 我是否使用带有__del__方法的对象?我绝对、明确地需要它们吗?
  • 我可以在我的代码中获得引用循环吗?我们不能在摆脱物体之前打破这些圆圈吗?

See, the main issue would be a cycle of objects containing __del__methods:

看,主要问题是包含__del__方法的对象循环:

import gc

class A(object):
    def __del__(self):
        print 'a deleted'
        if hasattr(self, 'b'):
            delattr(self, 'b')

class B(object):
    def __init__(self, a):
        self.a = a
    def __del__(self):
        print 'b deleted'
        del self.a


def createcycle():
    a = A()
    b = B(a)
    a.b = b
    return a, b

gc.set_debug(gc.DEBUG_LEAK)

a, b = createcycle()

# remove references
del a, b

# prints:
## gc: uncollectable <A 0x...>
## gc: uncollectable <B 0x...>
## gc: uncollectable <dict 0x...>
## gc: uncollectable <dict 0x...>
gc.collect()

# to solve this we break explicitely the cycles:
a, b = createcycle()
del a.b

del a, b

# objects are removed correctly:
## a deleted
## b deleted
gc.collect()

I would really encourage you to flag objects / concepts that are cycling in your application and focus on their lifetime: when you don't need them anymore, do we have anything referencing it?

我真的鼓励你标记在你的应用程序中循环的对象/概念,并关注它们的生命周期:当你不再需要它们时,我们有任何引用它的东西吗?

Even for cycles without __del__methods, we can have an issue:

即使对于没有__del__方法的循环,我们也会遇到一个问题:

import gc

# class without destructor
class A(object): pass

def createcycle():
    # a -> b -> c 
    # ^         |
    # ^<--<--<--|
    a = A()
    b = A()
    a.next = b
    c = A()
    b.next = c
    c.next = a
    return a, b, b

gc.set_debug(gc.DEBUG_LEAK)

a, b, c = createcycle()
# since we have no __del__ methods, gc is able to collect the cycle:

del a, b, c
# no panic message, everything is collectable:
##gc: collectable <A 0x...>
##gc: collectable <A 0x...>
##gc: collectable <dict 0x...>
##gc: collectable <A 0x...>
##gc: collectable <dict 0x...>
##gc: collectable <dict 0x...>
gc.collect()

a, b, c = createcycle()

# but as long as we keep an exterior ref to the cycle...:
seen = dict()
seen[a] = True

# delete the cycle
del a, b, c
# nothing is collected
gc.collect()

If you have to use "seen"-like dictionaries, or history, be careful that you keep only the actual data you need, and no external references to it.

如果您必须使用“看过”之类的词典或历史记录,请注意仅保留所需的实际数据,而不要对其进行外部引用。

I'm a bit disappointed now by set_debug, I wish it could be configured to output data somewhere else than to stderr, but hopefully that should change soon.

我现在有点失望set_debug,我希望它可以配置为将数据输出到 stderr 以外的其他地方,但希望这应该很快改变

回答by zgoda

See this excellent blog post from Ned Batchelderon how they traced down real memory leak in HP's Tabblo. A classic and worth reading.

请参阅Ned Batchelder 的这篇出色的博客文章,了解他们如何在 HP 的 Tabblo 中追踪实际内存泄漏。经典,值得一读。

回答by Martin v. L?wis

I think you should use different tools. Apparently, the statistics you got is only about GC objects (i.e. objects which may participate in cycles); most notably, it lacks strings.

我认为你应该使用不同的工具。显然,你得到的统计数据只是关于 GC 对象(即可能参与循环的对象);最值得注意的是,它没有字符串。

I recommend to use Pympler; this should provide you with more detailed statistics.

我建议使用Pympler;这应该为您提供更详细的统计数据。

回答by rob

Do you use any extension? They are a wonderful place for memory leaks, and will not be tracked by python tools.

你使用任何扩展吗?它们是内存泄漏的好地方,不会被 python 工具跟踪。

回答by iElectric

Try Guppy.

试试孔雀鱼

Basicly, you need more information or be able to extract some. Guppy even provides graphical representation of data.

基本上,您需要更多信息或能够提取一些信息。Guppy 甚至提供数据的图形表示。