如何在 C++ 中实现垃圾回收
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5009869/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to implement garbage collection in C++
提问by Josh Morrison
I saw some post about implement GC in C and some people said it's impossible to do it because C is weakly typed. I want to know how to implement GC in C++.
我看到一些关于在 C 中实现 GC 的帖子,有些人说这是不可能的,因为 C 是弱类型的。我想知道如何在 C++ 中实现 GC。
I want some general idea about how to do it. Thank you very much!
我想要一些关于如何做到这一点的一般想法。非常感谢!
This is a Bloomberg interview question my friend told me. He did badly at that time. We want to know your ideas about this.
这是我朋友告诉我的彭博采访问题。那个时候他做的不好。我们想知道您对此的看法。
回答by templatetypedef
Garbage collection in C and C++ are both difficult topics for a few reasons:
C 和 C++ 中的垃圾收集都是困难的主题,原因如下:
Pointers can be typecast to integers and vice-versa. This means that I could have a block of memory that is reachable only by taking an integer, typecasting it to a pointer, then dereferencing it. A garbage collector has to be careful not to think a block is unreachable when indeed it still can be reached.
Pointers are not opaque. Many garbage collectors, like stop-and-copy collectors, like to move blocks of memory around or compact them to save space. Since you can explicitly look at pointer values in C and C++, this can be difficult to implement correctly. You would have to be sure that if someone was doing something tricky with typecasting to integers that you correctly updated the integer if you moved a block of memory around.
Memory management can be done explicitly. Any garbage collector will need to take into account that the user is able to explicitly free blocks of memory at any time.
In C++, there is a separation between allocation/deallocation and object construction/destruction. A block of memory can be allocated with sufficient space to hold an object without any object actually being constructed there. A good garbage collector would need to know, when it reclaims memory, whether or not to call the destructor for any objects that might be allocated there. This is especially true for the standard library containers, which often make use of
std::allocator
to use this trick for efficiency reasons.Memory can be allocated from different areas. C and C++ can get memory either from the built-in freestore (malloc/free or new/delete), or from the OS via
mmap
or other system calls, and, in the case of C++, fromget_temporary_buffer
orreturn_temporary_buffer
. The programs might also get memory from some third-party library. A good garbage collector needs to be able to track references to memory in these other pools and (possibly) would have to be responsible for cleaning them up.Pointers can point into the middle of objects or arrays. In many garbage-collected languages like Java, object references always point to the start of the object. In C and C++ pointers can point into the middle of arrays, and in C++ into the middle of objects (if multiple inheritance is used). This can greatly complicate the logic for detecting what's still reachable.
指针可以被类型转换为整数,反之亦然。这意味着我可以拥有一块内存,该内存块只能通过取一个整数,将其类型转换为一个指针,然后取消引用它才能访问。垃圾收集器必须小心不要认为一个块实际上仍然可以到达它是不可到达的。
指针不是不透明的。许多垃圾收集器,如停止复制收集器,喜欢移动内存块或压缩它们以节省空间。由于您可以在 C 和 C++ 中明确查看指针值,因此很难正确实现。您必须确保,如果有人在对整数进行类型转换时做了一些棘手的事情,那么如果您移动了一块内存,您就可以正确更新整数。
可以明确地进行内存管理。任何垃圾收集器都需要考虑到用户可以随时显式释放内存块。
在 C++ 中,分配/释放和对象构造/销毁之间存在分离。可以为一块内存分配足够的空间来容纳一个对象,而无需在那里实际构造任何对象。一个好的垃圾收集器需要知道,当它回收内存时,是否为可能在那里分配的任何对象调用析构函数。对于标准库容器尤其如此,
std::allocator
出于效率原因,标准库容器经常使用此技巧。内存可以从不同的区域分配。C 和 C++ 可以从内置的 freestore(malloc/free 或 new/delete)或通过
mmap
或 其他系统调用从操作系统获取内存,在 C++ 的情况下,从get_temporary_buffer
或return_temporary_buffer
. 程序也可能从某些第三方库中获取内存。一个好的垃圾收集器需要能够跟踪对这些其他池中内存的引用,并且(可能)必须负责清理它们。指针可以指向对象或数组的中间。在许多垃圾收集语言(如 Java)中,对象引用始终指向对象的开头。在 C 和 C++ 中,指针可以指向数组的中间,而在 C++ 中,指针可以指向对象的中间(如果使用多重继承)。这会使检测仍然可以访问的内容的逻辑变得非常复杂。
So, in short, it's extremely hard to build a garbage collector for C or C++. Most libraries that do garbage collection in C and C++ are extremely conservative in their approach and are technically unsound - they assume that you won't, for example, take a pointer, cast it to an integer, write it to disk, and then load it back in at some later time. They also assume that any value in memory that's the size of a pointer could possibly be a pointer, and so sometimes refuse to free unreachable memory because there's a nonzero chance that there's a pointer to it.
因此,简而言之,为 C 或 C++ 构建垃圾收集器非常困难。大多数在 C 和 C++ 中进行垃圾收集的库在他们的方法上都非常保守,并且在技术上是不健全的 - 例如,他们假设您不会接受一个指针,将其转换为整数,将其写入磁盘,然后加载稍后再回来。他们还假设内存中任何一个指针大小的值都可能是一个指针,因此有时拒绝释放无法访问的内存,因为存在指向它的指针的可能性非零。
As others have pointed out, the Boehm GCdoes do garbage collection for C and C++, but subject to the aforementioned restrictions.
正如其他人指出的那样,Boehm GC确实对 C 和 C++ 进行了垃圾收集,但受到上述限制。
Interestingly, C++11 includes some new library functions that allow the programmer to mark regions of memory as reachable and unreachable in anticipation of future garbage collection efforts. It may be possible in the future to build a really good C++11 garbage collector with this sort of information. In the meantime though, you'll need to be extremely careful not to break any of the above rules.
有趣的是,C++11 包含一些新的库函数,允许程序员将内存区域标记为可访问和不可访问,以期待未来的垃圾收集工作。将来可能会使用此类信息构建一个非常好的 C++11 垃圾收集器。但与此同时,您需要非常小心,不要违反上述任何规则。
回答by Steve314
C isn't C++, but both have the same "weakly typed" issues. It's not the implicit typecasts that cause an issue, though, but the tendency towards "punning" (subverting the type system), especially in data structure libraries.
C 不是 C++,但两者都有相同的“弱类型”问题。然而,导致问题的不是隐式类型转换,而是“双关语”(颠覆类型系统)的趋势,尤其是在数据结构库中。
There aregarbage collectors out there for C and/or C++. The Boehm conservative collector is probably the best know. It's conservative in that, if it sees a bit pattern that looks like a pointer to some object, it doesn't collect that object. That value might be some other type of value completely, so the object could be collected, but "conservative" means playing safe.
存在有++的垃圾收集器那里为C和/或C。Boehm 保守的收藏家可能是最知名的。这是保守的,如果它看到一个看起来像指向某个对象的指针的位模式,它不会收集该对象。该值可能完全是其他类型的值,因此可以收集该对象,但“保守”意味着安全。
Even a conservative collector can be fooled, though, if you use calculated pointers. There's a data structure, for example, where every list node has a field giving the difference between the next-node and previous-node addresses. The idea is to give double-linked list behaviour with a single link per node, at the expense of more complex iterators. Since there's no explicit pointer anywhere to most of the nodes, they may be wrongly collected.
但是,如果您使用计算指针,即使是保守的收集器也可能被愚弄。例如,有一个数据结构,其中每个列表节点都有一个字段,用于给出下一个节点和上一个节点地址之间的差异。这个想法是给双链表行为,每个节点一个链接,以更复杂的迭代器为代价。由于在任何地方都没有指向大多数节点的显式指针,因此可能会错误地收集它们。
Of course this is a very exceptional special case.
当然,这是一个非常特殊的特例。
More important - you can either have reliable destructors or garbage collection, not both. When a garbage cycle is collected, the collector cannot decide which destructor to call first.
更重要的是 - 您可以拥有可靠的析构函数或垃圾收集,而不是两者兼而有之。当垃圾循环被收集时,收集器无法决定首先调用哪个析构函数。
Since the RAII pattern is pervasive in C++, and that relies on destructors, there is IMO a conflict. There may be valid exceptions, but my view is that if you want garbage collection, you should use a language that's designed from the ground up for garbage collection (Java, C#, ...).
由于 RAII 模式在 C++ 中普遍存在,并且依赖于析构函数,因此 IMO 存在冲突。可能存在有效的例外,但我的观点是,如果您想要垃圾回收,您应该使用一种从头开始设计用于垃圾回收的语言(Java、C# 等)。
回答by AJG85
You could either use smart pointers or create your own container object which will track references and handle memory allocation etc. Smart pointers would probably be preferable. Often times you can avoid dynamic heap allocation altogether.
您可以使用智能指针或创建自己的容器对象来跟踪引用和处理内存分配等。智能指针可能更可取。通常,您可以完全避免动态堆分配。
For example:
例如:
char* pCharArray = new char[128];
// do some stuff with characters
delete [] pCharArray;
The danger with the above being if anything throws between the new and the delete your delete will not be executed. Something like above could easily be replaced with safer "garbage collected"code:
上面的危险是,如果在新的和删除之间有任何问题,您的删除将不会被执行。像上面这样的东西可以很容易地用更安全的“垃圾收集”代码替换:
std::vector<char> charArray;
// do some stuff with characters
Bloomberg has notoriously irrelevant interview questions from a practical coding standpoint. Like most interviewers they are primarily concerned with how you think and your communication skills than the actual solution though.
从实际编码的角度来看,彭博社的面试问题是出了名的不相关。像大多数面试官一样,他们主要关心你的想法和你的沟通技巧,而不是实际的解决方案。
回答by yan
Look into the Boehm Garbage Collector.
查看Boehm 垃圾收集器。
回答by Marcelo Cantos
The claim you saw is false; the Boehm collectorsupports C and C++. I suggest reading the Boehm collector's documentation (particularly this page)for a good overview of how one might write a garbage collector in C or C++.
你看到的说法是错误的;该勃姆集电极支持C和C ++。我建议阅读 Boehm 收集器的文档(特别是本页),以很好地概述如何用 C 或 C++ 编写垃圾收集器。
回答by Yochai Timmer
You can read about the shared_ptrstruct.
您可以阅读有关shared_ptr结构的信息。
It implements a simple reference-countinggarbage collector.
它实现了一个简单的引用计数垃圾收集器。
If you want a real garbage collector, you can overload the newoperator.
如果你想要一个真正的垃圾收集器,你可以重载new操作符。
Create a struct similar to shared_ptr, call it Object.
创建一个类似于shared_ptr的struct,称之为Object。
This will wrap the new object created. Now with overloading its operators, you can control the GC.
这将包装创建的新对象。现在通过重载其运算符,您可以控制 GC。
All you need to do now, is just implement one of the many GC algorithms
您现在需要做的只是实现众多GC 算法之一