C++ 我如何知道代码中的哪些部分从未使用过?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4813947/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I know which parts in the code are never used?
提问by user63898
I have legacy C++ code that I'm supposed to remove unused code from. The problem is that the code base is large.
我有遗留的 C++ 代码,我应该从中删除未使用的代码。问题是代码库很大。
How can I find out which code is never called/never used?
如何找出从未调用/从未使用过的代码?
采纳答案by Matthieu M.
There are two varieties of unused code:
有两种未使用的代码:
- the local one, that is, in some functions some paths or variables are unused (or used but in no meaningful way, like written but never read)
- the global one: functions that are never called, global objects that are never accessed
- 本地的,即在某些函数中,某些路径或变量未使用(或使用但没有任何意义,例如写入但从未读取)
- 全局的:永远不会被调用的函数,永远不会被访问的全局对象
For the first kind, a good compiler can help:
对于第一种,一个好的编译器可以提供帮助:
-Wunused
(GCC, Clang) should warn about unused variables, Clang unused analyzer has even been incremented to warn about variables that are never read (even though used).-Wunreachable-code
(older GCC, removed in 2010) should warn about local blocks that are never accessed (it happens with early returns or conditions that always evaluate to true)- there is no option I know of to warn about unused
catch
blocks, because the compiler generally cannot prove that no exception will be thrown.
-Wunused
(GCC, Clang) 应该对未使用的变量发出警告,Clang 未使用的分析器甚至已增加以警告从未读取的变量(即使使用过)。-Wunreachable-code
(较旧的 GCC,于 2010 年删除)应该警告从未访问过的本地块(它发生在早期返回或总是评估为真的条件下)- 我知道没有选项可以警告未使用的
catch
块,因为编译器通常无法证明不会抛出异常。
For the second kind, it's much more difficult. Statically it requires whole program analysis, and even though link time optimization may actually remove dead code, in practice the program has been so much transformed at the time it is performed that it is near impossible to convey meaningful information to the user.
对于第二种,难度要大得多。静态上它需要整个程序分析,即使链接时间优化实际上可以删除死代码,但实际上程序在执行时已经发生了很大的变化,几乎不可能向用户传达有意义的信息。
There are therefore two approaches:
因此有两种方法:
- The theoretic one is to use a static analyzer. A piece of software that will examine the whole code at once in great detail and find all the flow paths. In practice I don't know any that would work here.
- The pragmatic one is to use an heuristic: use a code coverage tool (in the GNU chain it's
gcov
. Note that specific flags should be passed during compilation for it to work properly). You run the code coverage tool with a good set of varied inputs (your unit-tests or non-regression tests), the dead code is necessarily within the unreached code... and so you can start from here.
- 理论上是使用静态分析器。一个软件,可以一次非常详细地检查整个代码并找到所有的流程路径。实际上,我不知道有什么可以在这里工作。
- 实用的方法是使用启发式方法:使用代码覆盖工具(在 GNU 链中它是
gcov
。请注意,应在编译期间传递特定标志以使其正常工作)。您使用一组良好的各种输入(您的单元测试或非回归测试)运行代码覆盖率工具,死代码必然在未到达的代码中......所以您可以从这里开始。
If you are extremely interested in the subject, and have the time and inclination to actually work out a tool by yourself, I would suggest using the Clang libraries to build such a tool.
如果您对这个主题非常感兴趣,并且有时间和意愿自己实际开发一个工具,我建议您使用 Clang 库来构建这样一个工具。
- Use the Clang library to get an AST (abstract syntax tree)
- Perform a mark-and-sweep analysis from the entry points onward
- 使用 Clang 库获取 AST(抽象语法树)
- 从入口点开始执行标记和清除分析
Because Clang will parse the code for you, and perform overload resolution, you won't have to deal with the C++ languages rules, and you'll be able to concentrate on the problem at hand.
因为 Clang 会为您解析代码,并执行重载解析,所以您不必处理 C++ 语言规则,您将能够专注于手头的问题。
However this kind of technique cannot identify the virtual overrides that are unused, since they could be called by third-party code you cannot reason about.
然而,这种技术无法识别未使用的虚拟覆盖,因为它们可能会被您无法推理的第三方代码调用。
回答by olsner
For the case of unused whole functions (and unused global variables), GCC can actually do most of the work for you provided that you're using GCC and GNU ld.
对于未使用的整个函数(和未使用的全局变量)的情况,只要您使用 GCC 和 GNU ld,GCC 实际上可以为您完成大部分工作。
When compiling the source, use -ffunction-sections
and -fdata-sections
, then when linking use -Wl,--gc-sections,--print-gc-sections
. The linker will now list all the functions that could be removed because they were never called and all the globals that were never referenced.
编译源时,使用-ffunction-sections
和-fdata-sections
,然后在链接时使用-Wl,--gc-sections,--print-gc-sections
。链接器现在将列出所有可以删除的函数,因为它们从未被调用过,以及所有从未被引用过的全局变量。
(Of course, you can also skip the --print-gc-sections
part and let the linker remove the functions silently, but keep them in the source.)
(当然,您也可以跳过该--print-gc-sections
部分,让链接器静默删除函数,但将它们保留在源代码中。)
Note:this will only find unused complete functions, it won't do anything about dead code within functions. Functions called from dead code in live functions will also be kept around.
注意:这只会找到未使用的完整函数,它不会对函数内的死代码做任何事情。在活动函数中从死代码调用的函数也将保留。
Some C++-specific features will also cause problems, in particular:
一些 C++ 特有的特性也会引起问题,特别是:
- Virtual functions. Without knowing which subclasses exist and which are actually instantiated at run time, you can't know which virtual functions you need to exist in the final program. The linker doesn't have enough information about that so it will have to keep all of them around.
- Globals with constructors, and their constructors. In general, the linker can't know that the constructor for a global doesn't have side effects, so it must run it. Obviously this means the global itself also needs to be kept.
- 虚函数。如果不知道存在哪些子类以及哪些在运行时实际实例化,您就无法知道最终程序中需要存在哪些虚函数。链接器没有足够的信息,因此它必须保留所有这些信息。
- 具有构造函数的全局变量及其构造函数。通常,链接器无法知道全局的构造函数没有副作用,因此必须运行它。显然,这意味着全局本身也需要保留。
In both cases, anything usedby a virtual function or a global-variable constructor also has to be kept around.
在这两种情况下,虚函数或全局变量构造函数使用的任何东西也必须保留。
An additional caveat is that if you're building a shared library, the default settings in GCC will export every functionin the shared library, causing it to be "used" as far as the linker is concerned. To fix that you need to set the default to hiding symbols instead of exporting (using e.g. -fvisibility=hidden
), and then explicitly select the exported functions that you need to export.
一个额外的警告是,如果您正在构建一个共享库,GCC 中的默认设置将导出共享库中的每个函数,导致它就链接器而言被“使用”。要解决这个问题,您需要将默认设置设置为隐藏符号而不是导出(使用 eg -fvisibility=hidden
),然后明确选择您需要导出的导出函数。
回答by UmmaGumma
Well if you using g++ you can use this flag -Wunused
那么如果你使用 g++ 你可以使用这个标志 -Wunused
According documentation:
根据文档:
Warn whenever a variable is unused aside from its declaration, whenever a function is declared static but never defined, whenever a label is declared but not used, and whenever a statement computes a result that is explicitly not used.
每当变量在声明之外未使用时发出警告,每当函数声明为静态但从未定义时,每当声明了标签但未使用时,以及每当语句计算显式未使用的结果时发出警告。
http://docs.freebsd.org/info/gcc/gcc.info.Warning_Options.html
http://docs.freebsd.org/info/gcc/gcc.info.Warning_Options.html
Edit: Here is other useful flag -Wunreachable-code
According documentation:
编辑:这是其他有用的标志-Wunreachable-code
根据文档:
This option is intended to warn when the compiler detects that at least a whole line of source code will never be executed, because some condition is never satisfied or because it is after a procedure that never returns.
该选项旨在在编译器检测到至少有一整行源代码永远不会被执行时发出警告,因为某些条件永远不会满足,或者因为它在一个永远不会返回的过程之后。
Update: I found similar topic Dead code detection in legacy C/C++ project
更新:我在遗留 C/C++ 项目中发现了类似的主题死代码检测
回答by Carlos V
I think you are looking for a code coveragetool. A code coverage tool will analyze your code as it is running, and it will let you know which lines of code were executed and how many times, as well as which ones were not.
我认为您正在寻找代码覆盖率工具。代码覆盖率工具将在您的代码运行时对其进行分析,并让您知道哪些代码行被执行了多少次,以及哪些没有被执行。
You could try giving this open source code coverage tool a chance: TestCocoon- code coverage tool for C/C++ and C#.
您可以尝试给这个开源代码覆盖工具一个机会:TestCocoon- 用于 C/C++ 和 C# 的代码覆盖工具。
回答by Justin Morgan
The real answer here is: You can never really know for sure.
真正的答案是:你永远无法真正确定。
At least, for nontrivial cases, you can't be sure you've gotten all of it. Consider the following from Wikipedia's article on unreachable code:
至少,对于非平凡的情况,你不能确定你已经掌握了所有这些。请考虑维基百科关于无法访问代码的文章中的以下内容:
double x = sqrt(2);
if (x > 5)
{
doStuff();
}
As Wikipedia correctly notes, a clever compiler may be able to catch something like this. But consider a modification:
正如维基百科正确指出的那样,一个聪明的编译器可能能够捕捉到这样的东西。但考虑修改:
int y;
cin >> y;
double x = sqrt((double)y);
if (x != 0 && x < 1)
{
doStuff();
}
Will the compiler catch this? Maybe. But to do that, it will need to do more than run sqrt
against a constant scalar value. It will have to figure out that (double)y
will always be an integer (easy), and then understand the mathematical range of sqrt
for the set of integers (hard). A very sophisticated compiler might be able to do this for the sqrt
function, or for every function in math.h, or for any fixed-input function whose domain it can figure out. This gets very, very complex, and the complexity is basically limitless. You can keep adding layers of sophistication to your compiler, but there will always be a way to sneak in some code that will be unreachable for any given set of inputs.
编译器会捕捉到这个吗?也许。但要做到这一点,它需要做的不仅仅是sqrt
针对一个恒定的标量值运行。它必须弄清楚它(double)y
总是一个整数(简单),然后理解sqrt
整数集的数学范围(难)。一个非常复杂的编译器可能能够为该sqrt
函数或math.h 中的每个函数执行此操作,或者为它可以计算域的任何固定输入函数执行此操作。这变得非常非常复杂,而且复杂性基本上是无限的。您可以继续向编译器添加复杂的层,但是总会有一种方法可以潜入某些代码,这些代码对于任何给定的输入集都是无法访问的。
And then there are the input sets that simply never get entered.Input that would make no sense in real life, or get blocked by validation logic elsewhere. There's no way for the compiler to know about those.
然后是根本永远不会输入的输入集。在现实生活中没有意义的输入,或者在其他地方被验证逻辑阻止的输入。编译器无法知道这些。
The end result of this is that while the software tools others have mentioned are extremely useful, you're never going to know for sure that you caught everything unless you go through the code manually afterward. Even then, you'll never be certain that you didn't miss anything.
这样做的最终结果是,虽然其他人提到的软件工具非常有用,但除非您随后手动检查代码,否则您永远无法确定是否捕获了所有内容。即便如此,您也永远无法确定自己没有遗漏任何东西。
The only real solution, IMHO, is to be as vigilant as possible, use the automation at your disposal, refactor where you can, and constantly look for ways to improve your code. Of course, it's a good idea to do that anyway.
恕我直言,唯一真正的解决方案是尽可能保持警惕,使用您可以使用的自动化,尽可能地重构,并不断寻找改进代码的方法。当然,无论如何这样做是个好主意。
回答by Mr Shark
回答by Tony
You could try using PC-lint/FlexeLint from Gimple Software. It claims to
您可以尝试使用Gimple Software 的 PC-lint/FlexeLint。它声称
find unused macros, typedef's, classes, members, declarations, etc. across the entire project
在整个项目中查找未使用的宏、typedef、类、成员、声明等
I've used it for static analysis and found it very good but I have to admit that I have not used it to specifically find dead code.
我已经将它用于静态分析并发现它非常好,但我不得不承认我没有用它来专门查找死代码。
回答by Simon Richter
My normal approach to finding unused stuff is
我找到未使用的东西的正常方法是
- make sure the build system handles dependency tracking correctly
- set up a second monitor, with a full-screen terminal window, running repeated builds and showing the first screenful of output.
watch "make 2>&1"
tends to do the trick on Unix. - run a find-and-replace operation on the entire source tree, adding "//? " at the beginning of every line
- fix the first error flagged by the compiler, by removing the "//?" in the corresponding lines.
- Repeat until there are no errors left.
- 确保构建系统正确处理依赖跟踪
- 设置第二个监视器,带有全屏终端窗口,运行重复构建并显示第一个屏幕输出。
watch "make 2>&1"
倾向于在 Unix 上做到这一点。 - 在整个源代码树上运行查找和替换操作,在每一行的开头添加“//?”
- 通过删除“//?”来修复编译器标记的第一个错误 在相应的行中。
- 重复直到没有错误为止。
This is a somewhat lengthy process, but it does give good results.
这是一个有点漫长的过程,但它确实给出了很好的结果。
回答by Lie Ryan
Mark as much public functions and variables as private or protected without causing compilation error, while doing this, try to also refactor the code. By making functions private and to some extent protected, you reduced your search area since private functions can only be called from the same class (unless there are stupid macro or other tricks to circumvent access restriction, and if that's the case I'd recommend you find a new job). It is much easier to determine that you don't need a private function since only the class you're currently working on can call this function. This method is easier if your code base have small classes and is loosely coupled. If your code base does not have small classes or have very tight coupling, I suggest cleaning those up first.
在不导致编译错误的情况下将尽可能多的公共函数和变量标记为私有或受保护,同时尝试重构代码。通过将函数设为私有并在某种程度上受到保护,您减少了搜索区域,因为私有函数只能从同一个类中调用(除非有愚蠢的宏或其他技巧来规避访问限制,如果是这种情况,我建议您找一份新工作)。确定您不需要私有函数要容易得多,因为只有您当前正在处理的类才能调用此函数。如果您的代码库具有小类并且松散耦合,则此方法更容易。如果您的代码库没有小类或非常紧密的耦合,我建议先清理它们。
Next will be to mark all the remaining public functions and make a call graph to figure out the relationship between the classes. From this tree, try to figure out which part of the branch looks like it can be trimmed.
接下来将标记所有剩余的公共函数并制作调用图以找出类之间的关系。从这棵树上,试着找出树枝的哪一部分看起来可以修剪。
The advantage of this method is that you can do it on per module basis, so it is easy to keep passing your unittest without having large period of time when you've got broken code base.
这种方法的优点是您可以在每个模块的基础上执行此操作,因此很容易保持通过单元测试,而不会在代码库损坏时花费很长时间。
回答by Adam Higuera
If you are on Linux, you may want to look into callgrind
, a C/C++ program analysis tool that is part of the valgrind
suite, which also contains tools that check for memory leaks and other memory errors (which you should be using as well). It analyzes a running instance of your program, and produces data about its call graph, and about the performance costs of nodes on the call graph. It is usually used for performance analysis, but it also produces a call graph for your applications, so you can see what functions are called, as well as their callers.
如果您使用的是 Linux,则可能需要查看callgrind
,这是valgrind
套件中的一个 C/C++ 程序分析工具,其中还包含检查内存泄漏和其他内存错误的工具(您也应该使用)。它分析程序的运行实例,并生成有关其调用图以及调用图上节点性能成本的数据。它通常用于性能分析,但它也会为您的应用程序生成调用图,因此您可以查看调用了哪些函数,以及它们的调用者。
This is obviously complementary to the static methods mentioned elsewhere on the page, and it will only be helpful for eliminating wholly unused classes, methods, and functions - it well not help find dead code inside methods which are actually called.
这显然是对页面上其他地方提到的静态方法的补充,它只会有助于消除完全未使用的类、方法和函数——它无助于在实际调用的方法中找到死代码。