C++ 如何在开发过程中检测代码重复?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/191614/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 13:34:44  来源:igfitidea点击:

How to detect code duplication during development?

c++code-duplication

提问by David Dibben

We have a fairly large code base, 400K LOC of C++, and code duplication is something of a problem. Are there any tools which can effectively detect duplicated blocks of code?

我们有一个相当大的代码库,400K LOC 的 C++,代码重复是一个问题。是否有任何工具可以有效检测重复的代码块?

Ideally this would be something that developers could use during development rather than just run occasionally to see where the problems are. It would also be nice if we could integrate such a tool with CruiseControl to give a report after each check in.

理想情况下,这将是开发人员可以在开发过程中使用的东西,而不是偶尔运行以查看问题所在。如果我们可以将这样的工具与 CruiseControl 集成以在每次签到后给出报告,那也很好。

I had a look at Duplocsome time ago, it showed a nice graph but requires a smalltalk environment to use it, which makes running it automatically rather difficult.

前段时间我看过Duploc,它显示了一个很好的图形,但需要一个 smalltalk 环境才能使用它,这使得自动运行它变得相当困难。

Free tools would be nice, but if there are some good commercial tools I would also be interested.

免费工具会很好,但如果有一些好的商业工具我也会感兴趣。

回答by Simon Steele

Simiandetects duplicate code in C++ projects.

Simian检测 C++ 项目中的重复代码。

Update: Also works with Java, C#, C, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, Groovy source code and even plain text files

更新:也适用于 Java、C#、C、COBOL、Ruby、JSP、ASP、HTML、XML、Visual Basic、Groovy 源代码甚至纯文本文件

回答by user39039

I've used PMD's Copy-and-Paste-Detectorand integrated it into CruiseControl by using the following wrapper script (be sure to have the pmd jar in the classpath).

我使用了PMD 的 Copy-and-Paste-Detector并通过使用以下包装脚本将其集成到 CruiseControl 中(确保类路径中有 pmd jar)。

Our check runs nightly. If you wish to limit output to list only files from the current change set you might need some custom programming (idea: check all and list only duplicates where one of the changed files is involved. You have to check all files because a change could use some code from a non-changed file). Should be doable by using XML output and parsing the result. Don't forget to post that script when it's done ;)

我们的支票每晚运行。如果您希望将输出限制为仅列出当前更改集中的文件,您可能需要一些自定义编程(想法:检查所有并仅列出涉及更改文件之一的重复项。您必须检查所有文件,因为更改可能使用来自未更改文件的一些代码)。应该可以通过使用 XML 输出并解析结果来实现。完成后不要忘记发布该脚本;)

For starters the "Text" output should be ok, but you will want to display the results in a user-friendly way, for which i use a perl script to generate HTML files from the "xml" output of CPD. Those are accessible by posting them to the tomcat where cruise's reporting jsp resides. The developers can view them from there and see the results of their dirty hacking :)

对于初学者来说,“文本”输出应该没问题,但您会希望以用户友好的方式显示结果,为此我使用 perl 脚本从 CPD 的“xml”输出生成 HTML 文件。这些可以通过将它们发布到 Cruise 的报告 jsp 所在的 tomcat 来访问。开发人员可以从那里查看它们并查看他们肮脏的黑客攻击的结果:)

It runs quite fast, less than 2 seconds on 150 KLoc code (empty lines and comments not counted in that number).

它运行得非常快,在 150 个 Kloc 代码上不到 2 秒(空行和注释不计入该数字)。

duplicatecheck.xml:

重复检查.xml

<project name="duplicatecheck" default="cpd">

<property name="files.dir" value="dir containing your sources"/>
<property name="output.dir" value="dir containing results for publishing"/>

<target name="cpd">
    <taskdef name="cpd" classname="net.sourceforge.pmd.cpd.CPDTask"/>
    <cpd minimumTokenCount="100" 
         language="cpp" 
         outputFile="${output.dir}/duplicates.txt"
         ignoreLiterals="false"
         ignoreIdentifiers="false"
         format="text">
        <fileset dir="${files.dir}/">
            <include name="**/*.h"/>
            <include name="**/*.cpp"/>
                <!-- exclude third-party stuff -->
            <exclude name="boost/"/>
            <exclude name="cppunit/"/>
        </fileset>
    </cpd>
</target>

回答by benno

duploappears to be a C implementation of the algorithm used in Duploc. It is simple to compile and install, and while the options are limited it seems to more or less work out-of-the-box.

duplo似乎是 Duploc 中使用的算法的 C 实现。编译和安装很简单,虽然选项有限,但它似乎或多或少是开箱即用的。

回答by Andy Lester

Look at the PMD project.

看看PMD 项目

I've never used it, but have always wanted to.

我从来没有用过,但一直想用。

回答by SamB

These Debian packages seem to do somethingalong these lines:

这些 Debian 软件包似乎按照以下方式做一些事情

P.S. There ought to be a debtagstag for all tools related for finding [near] duplication. (But what would it be called?)

PS 应该有所有与查找 [near] 重复相关的工具的Debtags标签。(但它会叫什么?)

回答by Ira Baxter

Well, you can run a clone detector on your source code base every night.

好吧,您可以每晚在源代码库上运行克隆检测器。

Many clone detectors work by comparing source lines, and can only find exact duplicate code.

许多克隆检测器通过比较源代码行来工作,并且只能找到精确的重复代码。

CCFinder, above, works by comparing language tokens, so it isn't sensitive to white space changes. It can detect clones which are variants of the original code if there only single token changes (e.g, change a variable X to Y in the clone).

上面的 CCFinder 通过比较语言标记来工作,因此它对空白更改不敏感。如果只有单个标记更改(例如,将克隆中的变量 X 更改为 Y),它可以检测作为原始代码变体的克隆。

Ideally what you want is the above, but the ability to find clones where the variations are allowed to be relatively arbitrary, e.g., replace a variable by an expression, a statement by a block, etc.

理想情况下,您想要的是上述内容,但是能够找到允许变化相对任意的克隆,例如,用表达式替换变量,用块替换语句等。

Our CloneDR clone detector does this for Java, C#, C++, COBOL, VB.net, VB6, Fortran and a variety of other languages. It can be seen at: http://www.semdesigns.com/Products/Clone/index.html

我们的 CloneDR 克隆检测器为 Java、C#、C++、COBOL、VB.net、VB6、Fortran 和各种其他语言执行此操作。可以在以下位置看到:http: //www.semdesigns.com/Products/Clone/index.html

As well as being able to handle multiple languages, CloneDR engine is capable of handling a variety of input encoding styles, including ASCII, ISO-8859-1, UTF8, UTF16, EBCDIC, a number of Microsoft encodings, and (Japanese) Shift-JIS.

除了能够处理多种语言之外,CloneDR 引擎还能够处理各种输入编码样式,包括 ASCII、ISO-8859-1、UTF8、UTF16、EBCDIC、许多 Microsoft 编码和(日语)Shift-工业标准。

The site has several clone detection run example reports, including one for C++.

该站点有几个克隆检测运行示例报告,其中包括一个 C++。

EDIT Feb 2014: Now handles all of C++14.

编辑 2014 年 2 月:现在处理所有 C++14。

回答by Sean McMillan

Same (http://sourceforge.net/projects/same/) is extremely plain, but it works on text lines instead of tokens, which is useful if you're using a language that isn't supported by one of the fancier clone finders.

相同(http://sourceforge.net/projects/same/)非常简单,但它适用于文本行而不是令牌,如果您使用的语言不受更高级的克隆之一支持,这将非常有用发现者。

回答by bk1e

CCFinderXis a free (for in-house use) cloned code detector that supports multiple programming languages (Java, C, C++, COBOL, VB, C#).

CCFinderX是一款免费(供内部使用)克隆代码检测器,支持多种编程语言(Java、C、C++、COBOL、VB、C#)。

回答by bk1e

ConQATis a great tool which suports C++ code analysis. Can find duplicates ignoring whitespace. Has extreamly handy gui and console interfaces. Because of it's flexibility it is not an easy to to setup. I've found this blog post very useful for setting up c++ project.

ConQAT是一个很好的工具,支持 C++ 代码分析。可以找到忽略空格的重复项。具有非常方便的 gui 和控制台界面。由于它的灵活性,设置起来并不容易。我发现这篇博文对于设置 c++ 项目非常有用

回答by Rudolf FERENC

You can use our SourceMetertool for detecting code duplication. It is a command line tool (very similar to compilers), so you can it easily integrate into continuous integration tools, like CruiseControlyour mentioned, or Jenkins.

您可以使用我们的SourceMeter工具来检测代码重复。它是一个命令行工具(与编译器非常相似),因此您可以轻松地将其集成到持续集成工具中,例如您提到的CruiseControlJenkins