Java 垃圾回收如何与循环引用一起工作?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1910194/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does Java Garbage Collection work with Circular References?
提问by AlexeyMK
From my understanding, garbage collection in Java cleans up some objects if nothing else is 'pointing' to that object.
根据我的理解,如果没有其他东西“指向”该对象,Java 中的垃圾收集会清理一些对象。
My question is, what happens if we have something like this:
我的问题是,如果我们有这样的事情会发生什么:
class Node {
public object value;
public Node next;
public Node(object o, Node n) { value = 0; next = n;}
}
//...some code
{
Node a = new Node("a", null),
b = new Node("b", a),
c = new Node("c", b);
a.next = c;
} //end of scope
//...other code
a
, b
, and c
should be garbage collected, but they are all being referenced by other objects.
a
, b
, 和c
应该被垃圾收集,但它们都被其他对象引用。
How does the Java garbage collection deal with this? (or is it simply a memory drain?)
Java 垃圾收集如何处理这个问题?(或者它只是一个内存消耗?)
采纳答案by Bill the Lizard
Java's GC considers objects "garbage" if they aren't reachable through a chain starting at a garbage collection root, so these objects will be collected. Even though objects may point to each other to form a cycle, they're still garbage if they're cut off from the root.
如果无法通过从垃圾收集根开始的链访问对象,Java 的 GC 会将对象视为“垃圾”,因此将收集这些对象。即使对象可能相互指向以形成一个循环,但如果从根上切断它们,它们仍然是垃圾。
See the section on unreachable objects in Appendix A: The Truth About Garbage Collection in Java Platform Performance: Strategies and Tacticsfor the gory details.
有关详细信息,请参阅附录 A:Java 平台性能中垃圾收集的真相:策略和策略中有关无法访问对象的部分。
回答by Amnon
Garbage collection doesn't usually mean "clean some object iff nothing else is 'pointing' to that object" (that's reference counting). Garbage collection roughly means finding objects that can't be reached from the program.
垃圾收集通常并不意味着“清理某个对象,如果没有其他东西'指向'该对象”(即引用计数)。垃圾收集大致意味着查找程序无法访问的对象。
So in your example, after a,b, and c go out of scope, they can be collected by the GC, since you can't access these objects anymore.
因此,在您的示例中,在 a、b 和 c 超出范围后,GC 可以收集它们,因为您无法再访问这些对象。
回答by TofuBeer
This article(no longer available) goes into depth about the garbage collector (conceptually... there are several implementations). The relevant part to your post is "A.3.4 Unreachable":
这篇文章(不再可用)深入介绍了垃圾收集器(概念上......有几种实现)。与您的帖子相关的部分是“A.3.4 Unreachable”:
A.3.4 Unreachable An object enters an unreachable state when no more strong references to it exist. When an object is unreachable, it is a candidate for collection. Note the wording: Just because an object is a candidate for collection doesn't mean it will be immediately collected. The JVM is free to delay collection until there is an immediate need for the memory being consumed by the object.
A.3.4 Unreachable 当一个对象不再存在强引用时,它就会进入一个不可到达的状态。当一个对象不可访问时,它是一个收集的候选对象。请注意措辞:仅仅因为一个对象是收集的候选对象并不意味着它会立即被收集。JVM 可以自由地延迟收集,直到立即需要对象消耗的内存。
回答by Sbodd
The Java GCs don't actually behave as you describe. It's more accurate to say that they start from a base set of objects, frequently called "GC roots", and will collect any object that can not be reached from a root.
GC roots include things like:
Java GC 实际上并不像您描述的那样工作。更准确地说,它们从一组基本对象开始,通常称为“GC 根”,并将收集任何无法从根访问的对象。
GC 根包括以下内容:
- static variables
- local variables (including all applicable 'this' references) currently in the stack of a running thread
- 静态变量
- 当前在正在运行的线程的堆栈中的局部变量(包括所有适用的“this”引用)
So, in your case, once the local variables a, b, and c go out of scope at the end of your method, there are no more GC roots that contain, directly or indirectly, a reference to any of your three nodes, and they'll be eligible for garbage collection.
因此,在您的情况下,一旦局部变量 a、b 和 c 在您的方法结束时超出范围,就不再有 GC 根直接或间接包含对您的三个节点中的任何一个的引用,并且他们将有资格进行垃圾收集。
TofuBeer's link has more detail if you want it.
如果需要,TofuBeer 的链接有更多详细信息。
回答by Claudiu
Bill answered your question directly. As Amnon said, your definition of garbage collection is just reference counting. I just wanted to add that even very simple algorithms like mark and sweep and copy collection easily handle circular references. So, nothing magic about it!
比尔直接回答了你的问题。正如 Amnon 所说,您对垃圾收集的定义只是引用计数。我只想补充一点,即使是非常简单的算法,如标记和清除以及复制集合,也能轻松处理循环引用。所以,这没什么神奇的!
回答by Jerry Coffin
A garbage collector starts from some "root" set of places that are always considered "reachable", such as the CPU registers, stack, and global variables. It works by finding any pointers in those areas, and recursively finding everything they point at. Once it's found all that, everythingelse is garbage.
垃圾收集器从始终被认为是“可达”的一些“根”位置集开始,例如 CPU 寄存器、堆栈和全局变量。它的工作原理是查找这些区域中的任何指针,然后递归查找它们指向的所有内容。一旦找到所有这些,其他一切都是垃圾。
There are, of course, quite a few variations, mostly for the sake of speed. For example, most modern garbage collectors are "generational", meaning that they divide objects into generations, and as an object gets older, the garbage collector goes longer and longer between times that it tries to figure out whether that object is still valid or not -- it just starts to assume that if it has lived a long time, chances are pretty good that it'll continue to live even longer.
当然,有很多变化,主要是为了速度。例如,大多数现代垃圾收集器是“分代的”,这意味着它们将对象分成几代,并且随着对象变老,垃圾收集器在尝试确定该对象是否仍然有效的时间间隔越来越长——它只是开始假设,如果它已经活了很长时间,它很有可能会继续活得更久。
Nonetheless, the basic idea remains the same: it's all based on starting from some root set of things that it takes for granted could still be used, and then chasing all the pointers to find what else could be in use.
尽管如此,基本思想保持不变:这一切都基于从一些它认为理所当然仍然可以使用的东西的根集开始,然后追逐所有的指针以找到其他可以使用的东西。
Interesting aside: may people are often surprised by the degree of similarity between this part of a garbage collector and code for marshaling objects for things like remote procedure calls. In each case, you're starting from some root set of objects, and chasing pointers to find all the other objects those refer to...
有趣的是:人们可能经常对垃圾收集器的这一部分与用于封送诸如远程过程调用之类的对象的代码之间的相似程度感到惊讶。在每种情况下,您都是从一些对象的根集开始,并追逐指针以找到所有其他引用的对象......
回答by J?rg W Mittag
You are correct. The specific form of garbage collection you describe is called "reference counting". The way it works (conceptually, at least, most modern implementations of reference counting are actually implemented quite differently) in the simplest case, looks like this:
你是对的。您描述的垃圾收集的具体形式称为“引用计数”。在最简单的情况下,它的工作方式(至少在概念上,引用计数的大多数现代实现实际上是完全不同的),如下所示:
- whenever a reference to an object is added (e.g. it is assigned to a variable or a field, passed to method, and so on), its reference count is increased by 1
- whenever a reference to an object is removed (the method returns, the variable goes out of scope, the field is re-assigned to a different object or the object which contains the field gets itself garbage collected), the reference count is decreased by 1
- as soon as the reference count hits 0, there is no more reference to the object, which means nobody can use it anymore, therefore it is garbage and can be collected
- 每当添加对对象的引用(例如,将其分配给变量或字段、传递给方法等)时,其引用计数都会增加 1
- 每当删除对对象的引用时(该方法返回,变量超出范围,该字段被重新分配给不同的对象或包含该字段的对象本身被垃圾回收),引用计数减少 1
- 一旦引用计数达到 0,就不再有对该对象的引用,这意味着没有人可以再使用它,因此它是垃圾并且可以被收集
And this simple strategy has exactly the problem you decribe: if A references B and B references A, then both of their reference counts can neverbe less than 1, which means they will never get collected.
这个简单的策略正好有你描述的问题:如果 A 引用 B 和 B 引用 A,那么它们的引用计数永远不会小于 1,这意味着它们永远不会被收集。
There are four ways to deal with this problem:
有四种方法可以处理这个问题:
- Ignore it. If you have enough memory, your cycles are small and infrequent and your runtime is short, maybe you can get away with simply not collecting cycles. Think of a shell script interpreter: shell scripts typically only run for a few seconds and don't allocate much memory.
- Combine your reference counting garbage collector with anothergarbage collector which doesn't have problems with cycles. CPython does this, for example: the main garbage collector in CPython is a reference counting collector, but from time to time a tracing garbage collector is run to collect the cycles.
- Detect the cycles. Unfortunately, detecting cycles in a graph is a rather expensive operation. In particular, it requires pretty much the same overhead that a tracing collector would, so you could just as well use one of those.
- Don't implement the algorithm in the naive way you and I would: since the 1970s, there have been multiple quite interesting algorithms developed that combine cycle detection and reference counting in a single operation in a clever way that is significantly cheaper than either doing them both seperately or doing a tracing collector.
- 忽略它。如果您有足够的内存,您的周期小且不频繁,并且您的运行时间很短,也许您可以通过不收集周期来摆脱困境。想想一个 shell 脚本解释器:shell 脚本通常只运行几秒钟并且不会分配太多内存。
- 将您的引用计数垃圾收集器与另一个没有循环问题的垃圾收集器相结合。例如,CPython 就是这样做的:CPython 中的主要垃圾收集器是一个引用计数收集器,但有时会运行一个跟踪垃圾收集器来收集循环。
- 检测循环。不幸的是,检测图中的循环是一项相当昂贵的操作。特别是,它需要的开销与跟踪收集器几乎相同,因此您也可以使用其中之一。
- 不要以你我都会的幼稚方式实现算法:自 1970 年代以来,已经开发了多种非常有趣的算法,它们以一种聪明的方式将循环检测和引用计数结合在一个操作中,这比执行它们要便宜得多单独或做一个跟踪收集器。
By the way, the othermajor way to implement a garbage collector (and I have already hinted at that a couple of times above), is tracing. A tracing collector is based on the concept of reachability. You start out with some root setthat you know is alwaysreachable (global constants, for example, or the Object
class, the current lexical scope, the current stack frame) and from there you traceall objects that are reachable from the root set, then all objects that are reachable from the objects reachable from the root set and so on, until you have the transitive closure. Everything that is notin that closure is garbage.
顺便说一句,实现垃圾收集器的另一种主要方法(我已经在上面多次暗示过)是跟踪. 跟踪收集器基于可达性的概念。你从一些你知道总是可达的根集开始(例如,全局常量,或类、当前词法范围、当前堆栈帧),然后从那里跟踪从根集可达的所有对象,然后所有从根集可达的对象可达的所有对象,依此类推,直到你有传递闭包。不在那个闭包中的所有东西都是垃圾。Object
Since a cycle is only reachable within itself, but not reachable from the root set, it will be collected.
由于循环只能在其自身内部可达,而不能从根集可达,因此它将被收集。
回答by Aniket Thakur
yes Java Garbage collector handles circular-reference!
是的 Java 垃圾收集器处理循环引用!
How?
There are special objects called called garbage-collection roots (GC roots). These are always reachable and so is any object that has them at its own root.
有称为垃圾收集根(GC 根)的特殊对象。这些总是可以访问的,任何以它们为根的对象也是如此。
A simple Java application has the following GC roots:
一个简单的 Java 应用程序具有以下 GC 根:
- Local variables in the main method
- The main thread
- Static variables of the main class
- main方法中的局部变量
- 主线程
- 主类的静态变量
To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. It works as follows
为了确定哪些对象不再被使用,JVM 会间歇性地运行一种非常恰当的标记和清除算法。它的工作原理如下
- The algorithm traverses all object references, starting with the GC roots, and marks every object found as alive.
- All of the heap memory that is not occupied by marked objects is reclaimed. It is simply marked as free, essentially swept free of unused objects.
- 该算法遍历所有对象引用,从 GC 根开始,并将找到的每个对象标记为活动对象。
- 所有未被标记对象占用的堆内存都被回收。它被简单地标记为空闲,基本上清除了未使用的对象。
So if any object is not reachable from the GC roots(even if it is self-referenced or cyclic-referenced) it will be subjected to garbage collection.
因此,如果无法从 GC 根访问任何对象(即使它是自引用或循环引用),它将受到垃圾收集。
Ofcourse sometimes this may led to memory leak if programmer forgets to dereference an object.
当然,如果程序员忘记取消引用一个对象,有时这可能会导致内存泄漏。
Source : Java Memory Management
来源:Java 内存管理