为什么 Java 和 Python 垃圾收集方法不同?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21934/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why Java and Python garbage collection methods are different?
提问by popopome
Python uses the reference count method to handle object life time. So an object that has no more use will be immediately destroyed.
Python 使用引用计数方法来处理对象生命周期。因此,不再使用的对象将立即被销毁。
But, in Java, the GC(garbage collector) destroys objects which are no longer used at a specific time.
但是,在 Java 中,GC(垃圾收集器)会销毁在特定时间不再使用的对象。
Why does Java choose this strategy and what is the benefit from this?
Java为什么选择这种策略,这样做有什么好处?
Is this better than the Python approach?
这比 Python 方法更好吗?
采纳答案by Daren Thomas
There are drawbacks of using reference counting. One of the most mentioned is circular references: Suppose A references B, B references C and C references B. If A were to drop its reference to B, both B and C will still have a reference count of 1 and won't be deleted with traditional reference counting. CPython (reference counting is not part of python itself, but part of the C implementation thereof) catches circular references with a separate garbage collection routine that it runs periodically...
使用引用计数有一些缺点。提到最多的一个是循环引用:假设 A 引用 B,B 引用 C 和 C 引用 B。如果 A 要删除对 B 的引用,则 B 和 C 的引用计数仍为 1,不会被删除与传统的引用计数。CPython(引用计数不是python本身的一部分,而是它的C实现的一部分)用一个单独的垃圾收集例程来捕获循环引用,它定期运行......
Another drawback: Reference counting can make execution slower. Each time an object is referenced and dereferenced, the interpreter/VM must check to see if the count has gone down to 0 (and then deallocate if it did). Garbage Collection does not need to do this.
另一个缺点:引用计数会使执行速度变慢。每次引用和取消引用对象时,解释器/VM 都必须检查计数是否已降至 0(如果确实如此,则取消分配)。垃圾收集不需要这样做。
Also, Garbage Collection can be done in a separate thread (though it can be a bit tricky). On machines with lots of RAM and for processes that use memory only slowly, you might not want to be doing GC at all! Reference counting would be a bit of a drawback there in terms of performance...
此外,垃圾收集可以在一个单独的线程中完成(虽然它可能有点棘手)。在具有大量 RAM 的机器上以及仅缓慢使用内存的进程上,您可能根本不想进行 GC!就性能而言,引用计数将是一个缺点......
回答by Luke Quinane
Actually reference counting and the strategies used by the Sun JVM are all different types of garbage collection algorithms.
实际上,Sun JVM 使用的引用计数和策略都是不同类型的垃圾收集算法。
There are two broad approaches for tracking down dead objects: tracing and reference counting. In tracing the GC starts from the "roots" - things like stack references, and traces all reachable (live) objects. Anything that can't be reached is considered dead. In reference counting each time a reference is modified the object's involved have their count updated. Any object whose reference count gets set to zero is considered dead.
追踪死对象有两种广泛的方法:追踪和引用计数。在跟踪 GC 时,从“根”开始 - 诸如堆栈引用之类的东西,并跟踪所有可访问的(活动)对象。任何无法到达的东西都被认为是死的。在每次修改引用时的引用计数中,所涉及的对象都会更新它们的计数。任何引用计数设置为零的对象都被认为是死的。
With basically all GC implementations there are trade offs but tracing is usually good for high through put (i.e. fast) operation but has longer pause times (larger gaps where the UI or program may freeze up). Reference counting can operate in smaller chunks but will be slower overall. It may mean less freezes but poorer performance overall.
基本上所有 GC 实现都有权衡,但跟踪通常适用于高吞吐量(即快速)操作,但暂停时间更长(UI 或程序可能冻结的较大间隙)。引用计数可以在较小的块中运行,但总体上会较慢。这可能意味着更少的冻结,但整体性能更差。
Additionally a reference counting GC requires a cycle detector to clean up any objects in a cycle that won't be caught by their reference count alone. Perl 5 didn't have a cycle detector in its GC implementation and could leak memory that was cyclic.
此外,引用计数 GC 需要一个循环检测器来清理循环中的任何对象,这些对象不会被它们的引用计数单独捕获。Perl 5 在其 GC 实现中没有循环检测器,并且可能会泄漏循环内存。
Research has also been done to get the best of both worlds (low pause times, high throughput): http://cs.anu.edu.au/~Steve.Blackburn/pubs/papers/urc-oopsla-2003.pdf
还进行了研究以获得两全其美(低暂停时间,高吞吐量):http: //cs.anu.edu.au/~Steve.Blackburn/pubs/papers/urc-oopsla-2003.pdf
回答by Eli Courtwright
Darren Thomas gives a good answer. However, one big difference between the Java and Python approaches is that with reference counting in the common case (no circular references) objects are cleaned up immediately rather than at some indeterminate later date.
达伦·托马斯给出了一个很好的答案。然而,Java 和 Python 方法之间的一大区别在于,在常见情况下(无循环引用)对象的引用计数会立即清除,而不是在某个不确定的以后清除。
For example, I can write sloppy, non-portable code in CPython such as
例如,我可以在 CPython 中编写草率的、不可移植的代码,例如
def parse_some_attrs(fname):
return open(fname).read().split("~~~")[2:4]
and the file descriptor for that file I opened will be cleaned up immediately because as soon as the reference to the open file goes away, the file is garbage collected and the file descriptor is freed. Of course, if I run Jython or IronPython or possibly PyPy, then the garbage collector won't necessarily run until much later; possibly I'll run out of file descriptors first and my program will crash.
并且我打开的那个文件的文件描述符将被立即清除,因为一旦对打开文件的引用消失,文件就会被垃圾收集并释放文件描述符。当然,如果我运行 Jython 或 IronPython 或者可能是 PyPy,那么垃圾收集器不一定会运行到很晚;可能我会先用完文件描述符,然后我的程序会崩溃。
So you SHOULD be writing code that looks like
所以你应该编写看起来像的代码
def parse_some_attrs(fname):
with open(fname) as f:
return f.read().split("~~~")[2:4]
but sometimes people like to rely on reference counting to always free up their resources because it can sometimes make your code a little shorter.
但有时人们喜欢依靠引用计数来释放他们的资源,因为它有时会使你的代码更短一些。
I'd say that the best garbage collector is the one with the best performance, which currently seems to be the Java-style generational garbage collectors that can run in a separate thread and has all these crazy optimizations, etc. The differences to how you write your code should be negligible and ideally non-existent.
我想说最好的垃圾收集器是性能最好的垃圾收集器,目前似乎是 Java 风格的分代垃圾收集器,可以在单独的线程中运行,并具有所有这些疯狂的优化等。编写您的代码应该可以忽略不计,理想情况下不存在。
回答by Espo
I think the article "Java theory and practice: A brief history of garbage collection" from IBM should help explain some of the questions you have.
我认为IBM的文章“ Java 理论与实践:垃圾收集简史”应该有助于解释您的一些问题。
回答by mfx
Garbage collection is faster (more time efficient) than reference counting, if you have enough memory. For example, a copying gc traverses the "live" objects and copies them to a new space, and can reclaim all the "dead" objects in one step by marking a whole memory region. This is very efficient, ifyou have enough memory. Generational collections use the knowledge that "most objects die young"; often only a few percent of objects have to be copied.
如果您有足够的内存,垃圾收集比引用计数更快(更省时)。例如,一个copying gc遍历“存活”对象并将它们复制到一个新的空间,并且可以通过标记整个内存区域来一步回收所有“死亡”对象。如果您有足够的内存,这将非常有效。分代收藏使用“大多数对象死得很年轻”的知识;通常只需要复制百分之几的对象。
[This is also the reason why gc can be faster than malloc/free]
【这也是gc可以比malloc/free更快的原因】
Reference counting is much more space efficient than garbage collection, since it reclaims memory the very moment it gets unreachable. This is nice when you want to attach finalizers to objects (e.g. to close a file once the File object gets unreachable). A reference counting system can work even when only a few percent of the memory is free. But the management cost of having to increment and decrement counters upon each pointer assignment cost a lot of time, and some kind of garbage collection is still needed to reclaim cycles.
引用计数比垃圾收集更节省空间,因为它会在无法访问的那一刻回收内存。当您想将终结器附加到对象时,这很好(例如,一旦 File 对象无法访问就关闭文件)。即使只有百分之几的内存空闲,引用计数系统也可以工作。但是每次指针分配时必须增加和减少计数器的管理成本会花费大量时间,并且仍然需要某种垃圾收集来回收周期。
So the trade-off is clear: if you have to work in a memory-constrained environment, or if you need precise finalizers, use reference counting. If you have enough memory and need the speed, use garbage collection.
所以权衡很明显:如果您必须在内存受限的环境中工作,或者如果您需要精确的终结器,请使用引用计数。如果您有足够的内存并需要速度,请使用垃圾收集。
回答by Alejandro VD
One big disadvantage of Java's tracing GC is that from time to time it will "stop the world" and freeze the application for a relatively long time to do a full GC. If the heap is big and the the object tree complex, it will freeze for a few seconds. Also each full GC visits the whole object tree over and over again, something that is probably quite inefficient. Another drawback of the way Java does GC is that you have to tell the jvm what heap size you want (if the default is not good enough); the JVM derives from that value several thresholds that will trigger the GC process when there is too much garbage stacking up in the heap.
Java 的跟踪 GC 的一大缺点是,它会时不时地“停止世界”并冻结应用程序相对较长的时间来执行完整的 GC。如果堆很大并且对象树很复杂,它会冻结几秒钟。此外,每个完整的 GC 一遍又一遍地访问整个对象树,这可能效率很低。Java GC 方式的另一个缺点是你必须告诉 jvm 你想要什么堆大小(如果默认值不够好);JVM 从该值派生出几个阈值,当堆中堆积太多垃圾时,这些阈值将触发 GC 过程。
I presume that this is actually the main cause of the jerky feeling of Android (based on Java), even on the most expensive cellphones, in comparison with the smoothness of iOS (based on ObjectiveC, and using RC).
我想这实际上是导致Android(基于Java),即使在最昂贵的手机上,与iOS(基于ObjectiveC,使用RC)的流畅性相比,生涩感的主要原因。
I'd love to see a jvm option to enable RC memory management, and maybe keeping GC only to run as a last resort when there is no more memory left.
我很想看到一个 jvm 选项来启用 RC 内存管理,并且可能让 GC 只在没有更多内存时作为最后的手段运行。
回答by ckpwong
The latest Sun Java VM actually have multiple GC algorithms which you can tweak. The Java VM specifications intentionally omitted specifying actual GC behaviour to allow different (and multiple) GC algorithms for different VMs.
最新的 Sun Java VM 实际上有多个可以调整的 GC 算法。Java VM 规范有意省略指定实际 GC 行为以允许不同(和多个)GC 算法用于不同的 VM。
For example, for all the people who dislike the "stop-the-world" approach of the default Sun Java VM GC behaviour, there are VM such as IBM's WebSphere Real Timewhich allows real-time application to run on Java.
例如,对于所有不喜欢默认 Sun Java VM GC 行为的“stop-the-world”方法的人,有 VM,例如IBM 的 WebSphere Real Time,它允许实时应用程序在 Java 上运行。
Since the Java VM spec is publicly available, there is (theoretically) nothing stopping anyone from implementing a Java VM that uses CPython's GC algorithm.
由于 Java VM 规范是公开可用的,因此(理论上)没有什么能阻止任何人实现使用 CPython 的 GC 算法的 Java VM。
回答by Tom Hawtin - tackline
Reference counting is particularly difficult to do efficiently in a multi-threaded environment. I don't know how you'd even start to do it without getting into hardware assisted transactions or similar (currently) unusual atomic instructions.
在多线程环境中,引用计数特别难以有效执行。我不知道您如何在不进入硬件辅助事务或类似(当前)不寻常的原子指令的情况下开始这样做。
Reference counting is easy to implement. JVMs have had a lot of money sunk into competing implementations, so it shouldn't be surprising that they implement very good solutions to very difficult problems. However, it's becoming increasingly easy to target your favourite language at the JVM.
引用计数很容易实现。JVM 已经投入了大量资金用于竞争性实现,因此它们为非常困难的问题实现了非常好的解决方案也就不足为奇了。但是,在 JVM 上使用您最喜欢的语言变得越来越容易。
回答by David Cournapeau
Late in the game, but I think one significant rationale for RC in python is its simplicity. See this email by Alex Martelli, for example.
在游戏后期,但我认为在 Python 中使用 RC 的一个重要理由是它的简单性。例如,请参阅Alex Martelli 的这封电子邮件。
(I could not find a link outside google cache, the email date from 13th october 2005 on python list).
(我在 google 缓存之外找不到链接,python 列表中的电子邮件日期为 2005 年 10 月 13 日)。

