Python:内存泄漏调试
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1339293/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: Memory leak debugging
提问by Paul Tarjan
I have a small multithreaded script running in django and over time its starts using more and more memory. Leaving it for a full day eats about 6GB of RAM and I start to swap.
我有一个在 django 中运行的小型多线程脚本,随着时间的推移,它开始使用越来越多的内存。让它一整天吃掉大约 6GB 的 RAM,我开始交换。
Following http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaksI see this as the most common types (with only 800M of memory used):
在http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks之后,我认为这是最常见的类型(仅使用 800M 内存):
(Pdb) objgraph.show_most_common_types(limit=20)
dict 43065
tuple 28274
function 7335
list 6157
NavigableString 3479
instance 2454
cell 1256
weakref 974
wrapper_descriptor 836
builtin_function_or_method 766
type 742
getset_descriptor 562
module 423
method_descriptor 373
classobj 256
instancemethod 255
member_descriptor 218
property 185
Comment 183
__proxy__ 155
which doesn't show anything weird. What should I do now to help debug the memory problems?
这没有显示任何奇怪的东西。我现在应该做什么来帮助调试内存问题?
Update:Trying some things people are recommending. I ran the program overnight, and when I work up, 50% * 8G == 4G of RAM used.
更新:尝试一些人们推荐的东西。我连夜运行程序,当我工作时,使用了 50% * 8G == 4G 的 RAM。
(Pdb) from pympler import muppy
(Pdb) muppy.print_summary()
types | # objects | total size
========================================== | =========== | ============
unicode | 210997 | 97.64 MB
list | 1547 | 88.29 MB
dict | 41630 | 13.21 MB
set | 50 | 8.02 MB
str | 109360 | 7.11 MB
tuple | 27898 | 2.29 MB
code | 6907 | 1.16 MB
type | 760 | 653.12 KB
weakref | 1014 | 87.14 KB
int | 3552 | 83.25 KB
function (__wrapper__) | 702 | 82.27 KB
wrapper_descriptor | 998 | 77.97 KB
cell | 1357 | 74.21 KB
<class 'pympler.asizeof.asizeof._Claskey | 1113 | 69.56 KB
function (__init__) | 574 | 67.27 KB
That doesn't sum to 4G, nor really give me any big data structured to go fix. The unicode is from a set() of "done" nodes, and the list's look like just random weakref
s.
这不等于 4G,也不会给我任何结构化的大数据来修复。unicode 来自“完成”节点的 set(),并且列表看起来就像 random weakref
s。
I didn't use guppy since it required a C extension and I didn't have root so it was going to be a pain to build.
我没有使用 guppy,因为它需要一个 C 扩展名,而且我没有 root,所以构建起来会很痛苦。
None of the objectI was using have a __del__
method, and looking through the libraries, it doesn't look like django nor the python-mysqldb do either. Any other ideas?
我使用的对象都没有__del__
方法,查看库,它看起来既不像 django,也不像 python-mysqldb。还有其他想法吗?
回答by Jameson Quinn
See http://opensourcehacker.com/2008/03/07/debugging-django-memory-leak-with-trackrefs-and-guppy/. Short answer: if you're running django but not in a web-request-based format, you need to manually run db.reset_queries()
(and of course have DEBUG=False, as others have mentioned). Django automatically does reset_queries()
after a web request, but in your format, that never happens.
请参阅http://opensourcehacker.com/2008/03/07/debugging-django-memory-leak-with-trackrefs-and-guppy/。简短回答:如果您正在运行 django 但不是基于 Web 请求的格式,则需要手动运行db.reset_queries()
(当然还有其他人提到的 DEBUG=False)。Django 会reset_queries()
在 Web 请求后自动执行,但在您的格式中,这永远不会发生。
回答by Paul Tarjan
Is DEBUG=False in settings.py?
在settings.py 中是DEBUG=False 吗?
If not Django will happily store all the SQL queries you make which adds up.
如果没有,Django 会很高兴地存储您所做的所有 SQL 查询。
回答by Nicolas Dumazet
Have you tried gc.set_debug()?
你试过gc.set_debug()吗?
You need to ask yourself simple questions:
你需要问自己一些简单的问题:
- Am I using objects with
__del__
methods? Do I absolutely, unequivocally, need them? - Can I get reference cycles in my code? Can't we break these circles before getting rid of the objects?
- 我是否使用带有
__del__
方法的对象?我绝对、明确地需要它们吗? - 我可以在我的代码中获得引用循环吗?我们不能在摆脱物体之前打破这些圆圈吗?
See, the main issue would be a cycle of objects containing __del__
methods:
看,主要问题是包含__del__
方法的对象循环:
import gc
class A(object):
def __del__(self):
print 'a deleted'
if hasattr(self, 'b'):
delattr(self, 'b')
class B(object):
def __init__(self, a):
self.a = a
def __del__(self):
print 'b deleted'
del self.a
def createcycle():
a = A()
b = B(a)
a.b = b
return a, b
gc.set_debug(gc.DEBUG_LEAK)
a, b = createcycle()
# remove references
del a, b
# prints:
## gc: uncollectable <A 0x...>
## gc: uncollectable <B 0x...>
## gc: uncollectable <dict 0x...>
## gc: uncollectable <dict 0x...>
gc.collect()
# to solve this we break explicitely the cycles:
a, b = createcycle()
del a.b
del a, b
# objects are removed correctly:
## a deleted
## b deleted
gc.collect()
I would really encourage you to flag objects / concepts that are cycling in your application and focus on their lifetime: when you don't need them anymore, do we have anything referencing it?
我真的鼓励你标记在你的应用程序中循环的对象/概念,并关注它们的生命周期:当你不再需要它们时,我们有任何引用它的东西吗?
Even for cycles without __del__
methods, we can have an issue:
即使对于没有__del__
方法的循环,我们也会遇到一个问题:
import gc
# class without destructor
class A(object): pass
def createcycle():
# a -> b -> c
# ^ |
# ^<--<--<--|
a = A()
b = A()
a.next = b
c = A()
b.next = c
c.next = a
return a, b, b
gc.set_debug(gc.DEBUG_LEAK)
a, b, c = createcycle()
# since we have no __del__ methods, gc is able to collect the cycle:
del a, b, c
# no panic message, everything is collectable:
##gc: collectable <A 0x...>
##gc: collectable <A 0x...>
##gc: collectable <dict 0x...>
##gc: collectable <A 0x...>
##gc: collectable <dict 0x...>
##gc: collectable <dict 0x...>
gc.collect()
a, b, c = createcycle()
# but as long as we keep an exterior ref to the cycle...:
seen = dict()
seen[a] = True
# delete the cycle
del a, b, c
# nothing is collected
gc.collect()
If you have to use "seen"-like dictionaries, or history, be careful that you keep only the actual data you need, and no external references to it.
如果您必须使用“看过”之类的词典或历史记录,请注意仅保留所需的实际数据,而不要对其进行外部引用。
I'm a bit disappointed now by set_debug
, I wish it could be configured to output data somewhere else than to stderr, but hopefully that should change soon.
我现在有点失望set_debug
,我希望它可以配置为将数据输出到 stderr 以外的其他地方,但希望这应该很快改变。
回答by zgoda
See this excellent blog post from Ned Batchelderon how they traced down real memory leak in HP's Tabblo. A classic and worth reading.
请参阅Ned Batchelder 的这篇出色的博客文章,了解他们如何在 HP 的 Tabblo 中追踪实际内存泄漏。经典,值得一读。
回答by Martin v. L?wis
I think you should use different tools. Apparently, the statistics you got is only about GC objects (i.e. objects which may participate in cycles); most notably, it lacks strings.
我认为你应该使用不同的工具。显然,你得到的统计数据只是关于 GC 对象(即可能参与循环的对象);最值得注意的是,它没有字符串。
I recommend to use Pympler; this should provide you with more detailed statistics.
我建议使用Pympler;这应该为您提供更详细的统计数据。
回答by rob
Do you use any extension? They are a wonderful place for memory leaks, and will not be tracked by python tools.
你使用任何扩展吗?它们是内存泄漏的好地方,不会被 python 工具跟踪。