C++ 相似码检测器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10912349/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Similar code detector
提问by ?imon Tóth
I'm search for a tool that could compare source codes for similarity.
我正在寻找一种可以比较源代码相似性的工具。
We have a very trivial system right now that has huge amount of false positives and the real positives can easily get buried in them.
我们现在有一个非常微不足道的系统,它有大量的误报,而真正的肯定很容易被埋没。
My requirements are:
我的要求是:
- reasonably small amount of false positives
- good detection rate (yeah these are going against each other)
- ideally with a more complex output than just a single value
- usable for C (C99) and C++ (C++03 and optimally C++11)
- still maintained
- usable for comparing two source files against each other
- usable in non-interactive mode
- 相当少量的误报
- 良好的检测率(是的,这些是相互矛盾的)
- 理想情况下,输出比单个值更复杂
- 可用于 C (C99) 和 C++(C++03 和最佳 C++11)
- 仍然保持
- 可用于比较两个源文件
- 可在非交互模式下使用
EDIT:
编辑:
To avoid confusion, the following two code snippets are identical and should be detected as such:
为避免混淆,以下两个代码片段是相同的,应该这样检测:
for (int i = 0; i < 10; i++) { bla; }
for (int i = 0; i < 10; i++) { bla; }
int i; while (i < 10) { bla; i++; }
int i; while (i < 10) { bla; i++; }
The same here:
和这里一样:
int x = 10; y = x + 5;
int x = 10; y = x + 5;
int a = 10; y = a + 5;
int a = 10; y = a + 5;
回答by Throwback1986
I've used MOSS in the past: http://theory.stanford.edu/~aiken/moss/to detect plagiarized code. Since it works on a semantic level, it will detect the situations you presented above. The tool is language-aware, so comments are not considered in the analysis, and it goes a long way in detecting code that has been modified through simple search-and-replace of variable and/or function names.
我过去使用过 MOSS:http: //theory.stanford.edu/~aiken/moss/来检测抄袭代码。由于它在语义级别上工作,因此它将检测您上面介绍的情况。该工具具有语言感知能力,因此在分析中不考虑注释,并且在检测通过简单搜索和替换变量和/或函数名称而修改的代码方面大有帮助。
Note: I used the tool a few years ago when I taught computer science in grad school, and it worked wonderfully in detecting code that had been yanked from the internet. Here is a well-documented account of similar application: http://fie2012.org/sites/fie2012.org/history/fie99/papers/1110.pdf
注意:几年前我在研究生院教授计算机科学时使用了该工具,它在检测从互联网上提取的代码方面效果非常好。以下是类似应用程序的详细记录:http: //fie2012.org/sites/fie2012.org/history/fie99/papers/1110.pdf
If you google "measure software similarity", you should find a few more useful hits: http://www.ics.heacademy.ac.uk/resources/assessment/plagiarism/detectiontools_sourcecode.html
如果你谷歌“测量软件相似性”,你应该找到一些更有用的点击:http: //www.ics.heacademy.ac.uk/resources/assessment/plagiarism/detectiontools_sourcecode.html
回答by Yavar
Your problem in Computer Science Terminology maybe stated as Source Code Plagiarism Detection. A good start would be to read this article on Dr Dobbs: Detecting Source-Code Plagiarism. It lists the Algorithms for detecting Plagiarism in the source code.
您在计算机科学术语中的问题可能被表述为源代码抄袭检测。一个好的开始是阅读这篇关于 Dr Dobbs:Detecting Source-Code Plagiarism 的文章。它列出了源代码中检测抄袭的算法。
Note: What you have asked for is indeed a tough computing problem :)
注意:您所要求的确实是一个棘手的计算问题:)
回答by Benjamin Bannier
回答by Pierre Jean
I start to use JPLAG (https://github.com/jplag/jplag) to check code similarity and compare students works in Java and text files. It works well to check same code structure and variable Substitution.
我开始使用 JPLAG ( https://github.com/jplag/jplag) 检查代码相似性并比较 Java 和文本文件中的学生作品。检查相同的代码结构和变量替换效果很好。
回答by Jefferey Cave
(response is late, but the question's relevance never goes away)
(回复晚了,但问题的相关性永远不会消失)
I was faced a similar problem and wrote a web based application.
我遇到了类似的问题并编写了一个基于 Web 的应用程序。
https://jefferey-cave.gitlab.io/miss/
https://jefferey-cave.gitlab.io/miss/
I was teaching in javascript and python, so those are the languages it handles. It does not handle C/C++ (currently). I'd be curious to see how the Javascript interpreter handles C.
我在教 javascript 和 python,所以这些是它处理的语言。它不处理 C/C++(当前)。我很想知道 Javascript 解释器如何处理 C。
The problem I was faced with was it being illegal to submit student code across international boundaries (MOSS was forbidden) so needed something that would run locally. The implementation is pure client-side browser.
我面临的问题是跨越国际边界提交学生代码是非法的(MOSS 被禁止),所以需要一些可以在本地运行的东西。实现是纯客户端浏览器。
I found it more useful in determining group dynamics in the classroom (who is working/studying with whom).
我发现它在确定课堂中的小组动态(谁与谁一起工作/学习)方面更有用。
It has some fun live graphics, so it was useful to show to an Undergrad class after they submitted their first assignment. There was always a high degree of similarity in the first assignment, so no harm in demonstrating it live (with the submission names anonymized).
它有一些有趣的实时图形,因此在他们提交第一个作业后向本科生展示很有用。第一个作业总是有高度的相似性,所以现场演示没有坏处(提交名称匿名)。
I always tell the story of the student I thought was (grossly and blatantly) cheating. Their work showed remarkable similarity to another student's very unique answer. Comparing the student's work to the rest of the class showed no significant similarityrelative to the rest of the class. This led to a deeper investigation of the submission ... turns out there had been an tutorial, and the style showed through, but the work was unique.
我总是讲一个我认为是(严重且公然)作弊的学生的故事。他们的工作与另一个学生非常独特的答案非常相似。将学生的作业与班上其他人的作业进行比较,表明与班上其他人没有显着的相似性。这导致了对提交的更深入的调查......原来有一个教程,风格显示出来,但作品是独一无二的。
Nothing happened, and those students never how close they came.
什么也没发生,那些学生从来没有离得有多近。