创建 C/C++ 解析器/分析器的好工具

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/526797/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 15:48:21  来源:igfitidea点击:

Good tools for creating a C/C++ parser/analyzer

c++cparsingyacclex

提问by Matt Ball

What are some good tools for getting a quick start for parsing and analyzing C/C++ code?

有哪些好的工具可以快速开始解析和分析 C/C++ 代码?

In particular, I'm looking for open source tools that handle the C/C++ preprocessor and language. Preferably, these tools would use lex/yacc (or flex/bison) for the grammar, and not be too complicated. They should handle the latest ANSI C/C++ definitions.

特别是,我正在寻找处理 C/C++ 预处理器和语言的开源工具。这些工具最好使用 lex/yacc(或 flex/bison)作为语法,不要太复杂。他们应该处理最新的 ANSI C/C++ 定义。

Here's what I've found so far, but haven't looked at them in detail (thoughts?):

这是我到目前为止发现的内容,但还没有详细研究它们(想法?):

  • CScope- Old-school C analyzer. Doesn't seem to do a full parse, though. Described as a glorified 'grep' for finding C functions.
  • GCC- Everybody's favorite open source compiler. Very complicated, but seems to do it all. There's a related project for creating GCC extensions called GEM, but hasn't been updated since GCC 4.1 (2006).
  • PUMA- The PUre MAnipulator. (from the page: "The intention of this project is to provide a library of classes for the analysis and manipulation of C/C++ sources. For this purpose PUMA provides classes for scanning, parsing and of course manipulating C/C++ sources."). This looks promising, but hasn't been updated since 2001. Apparently PUMA has been incorporated into AspectC++, but even this project hasn't been updated since 2006.
  • Various C/C++ raw grammars. You can get c-c++-grammars-1.2.tar.gz, but this has been unmaintained since 1997. A little Google searching pulls up other basic lex/yacc grammars that could serve as a starting place.
  • Any others?
  • CScope- 老式 C 分析器。不过,似乎没有做完整的解析。被描述为用于查找 C 函数的美化“grep”。
  • GCC- 每个人都喜欢的开源编译器。非常复杂,但似乎做到了这一切。有一个用于创建 GCC 扩展名为GEM的相关项目,但自 GCC 4.1 (2006) 以来一直没有更新。
  • PUMA- 纯机械手。(来自页面:“该项目的目的是提供一个用于分析和操作 C/C++ 源代码的类库。为此,PUMA 提供了用于扫描、解析和当然操作 C/C++ 源代码的类。”) . 这看起来很有希望,但自 2001 年以来一直没有更新。显然 PUMA 已被合并到AspectC++ 中,但即使是这个项目自 2006 年以来也没有更新。
  • 各种 C/C++ 原始语法。你可以得到c-c++-grammars-1.2.tar.gz,但是这个自 1997 年以来就没有维护了。谷歌搜索一下就可以找到其他基本的 lex/yacc 语法,可以作为一个起点。
  • 还有其他人吗?

I'm hoping to use this as a starting point for translating C/C++ source into a new toy language.

我希望以此为起点,将 C/C++ 源代码翻译成一种新的玩具语言。

Thanks! -Matt

谢谢!-马特

(Added 2/9): Just a clarification: I want to extract semantic information from the preprocessor in addition to the C/C++ code itself. I don't want "#define foo 42" to disappear into the integer "42", but remain attached to the name "foo". This, unfortunately, excludes several solutions that run the preprocessor first and only deliver the C/C++ parse tree)

(添加于 2/9):只是澄清一下:除了 C/C++ 代码本身之外,我还想从预处理器中提取语义信息。我不希望“#define foo 42”消失在整数“42”中,但仍与名称“foo”相连。不幸的是,这排除了一些首先运行预处理器并且只提供 C/C++ 解析树的解决方案)

采纳答案by Adam Rosenfield

Parsing C++ is extremely hard because the grammar is undecidable. To quote Yossi Kreinin:

解析 C++ 非常困难,因为语法是不可判定的。引用Yossi Kreinin 的话

Outstandingly complicated grammar

"Outstandingly" should be interpreted literally, because all popular languageshave context-free(or "nearly" context-free) grammars, while C++ has undecidablegrammar. If you like compilers and parsers, you probably know what this means. If you're not into this kind of thing, there's a simple exampleshowing the problem with parsing C++: is AA BB(CC);an object definition or a function declaration? It turns out that the answer depends heavily on the code beforethe statement - the "context". This shows (on an intuitive level) that the C++ grammar is quite context-sensitive.

异常复杂的语法

“杰出”应该从字面上解释,因为所有流行的语言都具有上下文无关(或“几乎”上下文无关)的语法,而 C++ 具有不可判定的语法。如果您喜欢编译器和解析器,您可能知道这意味着什么。如果你不喜欢这种事情,这里有一个简单的例子来说明解析 C++ 的问题:是AA BB(CC);对象定义还是函数声明?事实证明,答案在很大程度上取决于语句之前的代码——“上下文”。这表明(在直观层面上)C++ 语法对上下文非常敏感。

回答by epatel

You can look at clangthat uses llvm for parsing.

您可以查看使用 llvm 进行解析的clang

Support C++ fully now link

现在完全支持 C++链接

回答by Sean McCauliff

The ANTLRparser generator has a grammarfor C/C++ as well as the preprocessor. I've never used it so I can't say how complete its parsing of C++ is going to be. ANTLR itself has been a useful tool for me on a couple of occasions for parsing much simpler languages.

所述ANTLR解析器生成具有语法为C / C ++以及预处理器。我从来没有用过它,所以我不能说它对 C++ 的解析将是多么完整。ANTLR 本身在很多场合对我来说都是一个有用的工具,可以解析更简单的语言。

回答by ?ukasz Lew

Depending on your problem GCCXMLmight be your answer. Basically it parses the source using GCC and then gives you easily digestible XML of parse tree. With GCCXML you are done once and for all.

根据您的问题,GCCXML可能是您的答案。基本上它使用 GCC 解析源代码,然后为您提供易于消化的解析树 XML。使用 GCCXML,您一劳永逸。

回答by Eli Bendersky

pycparseris a complete parser for C (C99) written in Python. It has a fully configurable AST backend, so it's being used as a basis for any kind of language processing you might need.

pycparser是一个用 Python 编写的完整的 C (C99) 解析器。它有一个完全可配置的 AST 后端,因此它被用作您可能需要的任何类型的语言处理的基础。

Doesn't support C++, though. Granted, it's muchharder than C.

但是不支持C++。诚然,这是多大难度比C.



Update (2012): at this time the answer, without any doubt, would be Clang- it's modular, supports the full C++ (with many C++-11 features) and has a relatively friendly code base. It also has a C API for bindings to high-level languages (i.e. for Python).

更新(2012 年):毫无疑问,此时的答案是Clang- 它是模块化的,支持完整的 C++(具有许多 C++-11 功能)并且具有相对友好的代码库。它还具有用于绑定到高级语言(即Python)的 C API 。

回答by Andy Dent

Have a look at how doxygenworks, full source code is available and it's flex-based.

看看doxygen是如何工作的,完整的源代码是可用的,它是基于 flex 的。

A misleading candidate is GOLDwhich is a free Windows-based parser toolkit explicitly for creating translators. Their list of supported languagesrefers to the languages in which one can implement parsers, not the list of supported parse grammars.

一个误导性的候选者是GOLD,它是一个免费的基于 Windows 的解析器工具包,明确用于创建翻译器。他们支持的语言列表是指可以实现解析器的语言,而不是支持的解析语法列表。

They only have grammars for C and C#, no C++.

他们只有 C 和 C# 的语法,没有 C++。

回答by none

Parsing C++ is a very complex challenge.

解析 C++ 是一项非常复杂的挑战

There's the Boost/Spirit framework, and a couple of years ago they did play with the idea of implementing a C++ parser, but it's far from complete.

有 Boost/Spirit 框架,几年前他们确实有过实现 C++ 解析器的想法,但还远未完成

Fully and properly parsing ISO C++ is far from trivial, and there were in fact many related efforts. But it is an inherently complex job that isn't easily accomplished, without rewriting a full compiler frontend understanding all of C++ andthe preprocessor. A pre-processor implementation called "wave" is available from the Spirit folks.

完全正确地解析 ISO C++ 绝非易事,实际上有很多相关的努力。但这是一项本质上很复杂的工作,如果不重写理解所有 C++预处理器的完整编译器前端,就不容易完成。Spirit 人员可提供名为“wave”的预处理器实现。

That said, you might want to have a look at pork/oink(elsa-based), which is a C++ parser toolkit specifically meant to be used for source code transformation purposes, it is being used by the Mozilla project to do large-scale static source code analysis and automated code rewriting, the most interesting part is that it not only supports most of C++, but also the preprocessor itself!

也就是说,您可能想看看猪肉/oink(基于 elsa),这是一个 C++ 解析器工具包,专门用于源代码转换目的,Mozilla 项目正在使用它来进行大规模的静态源代码分析和自动代码重写,最有趣的部分是它不仅支持大部分C++,还支持预处理器本身!

On the other hand there's indeed one single proprietary solution available: the EDG frontend, which can be used for pretty much all C++ related efforts.

另一方面,确实有一个专有解决方案可用:EDG 前端,它几乎可用于所有与 C++ 相关的工作。

Personally, I would check out the elsa-based pork/oink suite which is used at Mozilla, apart from that, the FSF has now approved work on gcc pluginsusing the runtime library license, thus I'd assume that things are going to change rapidly, once people can easily leverage the gcc-based C++ parser for such purposes using binary plugins.

就我个人而言,我会查看 Mozilla 使用的基于 elsa 的猪肉/oink 套件,除此之外,FSF 现在已经批准使用运行时库许可证开发gcc 插件,因此我认为事情会发生变化很快,一旦人们可以使用二进制插件轻松地将基于 gcc 的 C++ 解析器用于此类目的。

So, in a nutshell: if you the bucks: EDG, if you need something free/open source now: else/oink are fairly promising, if you have some time, you might want to use gcc for your project.

所以,简而言之:如果你有钱:EDG,如果你现在需要一些免费/开源的东西:else/oink 是相当有前途的,如果你有一些时间,你可能想在你的项目中使用 gcc。

Another option just for C code is cscout.

另一个仅用于 C 代码的选项是cscout

回答by Charlie Martin

The grammar for C++ is sort of notoriously hairy. There's a good thread at Lambda about it,but the gist is that C++ grammar can require arbitrarily much lookahead.

C++ 的语法是出了名的毛茸茸的。Lambda 上一个关于它的好线程,但要点是 C++ 语法可能需要任意多的前瞻。

For the kind of thing I imagine you might be doing, I'd think about hacking either Gnu CC, or Splint. Gnu CC in particular does separate out the language generation part pretty thoroughly, so you might be best off building a new g++ backend.

对于我想象中您可能正在做的事情,我会考虑破解 Gnu CC 或Splint。特别是 Gnu CC 确实非常彻底地分离了语言生成部分,因此您最好构建一个新的 g++ 后端。

回答by Brett Rossier

Actually, PUMA and AspectC++ are still both actively maintained and updated. I was looking into using AspectC++ and was wondering about the lack of updates myself. I e-mailed the author who said that both AspectC++ and PUMA are still being developed. You can get to source code through SVN https://svn.aspectc.org/repos/or you can get regular binary builds at http://akut.aspectc.org. As with a lot of excellent c++ projects these days, the author doesn't have time to keep up with web page maintenance. Makes sense if you've got a full time job and a life.

实际上,PUMA 和 AspectC++ 仍在积极维护和更新。我正在考虑使用 AspectC++ 并且想知道自己缺乏更新。我给作者发了电子邮件,他说 AspectC++ 和 PUMA 仍在开发中。您可以通过 SVN https://svn.aspectc.org/repos/获取源代码,或者您可以在http://akut.aspectc.org获取常规二进制构建。和现在很多优秀的c++项目一样,作者没有时间跟上网页维护。如果你有一份全职工作和生活,这是有道理的。

回答by user52875

Elsabeats everything else I know hands down for C++ parsing, even though it is not 100% compliant. I'm a fan. There's a module that prints out C++, so that may be a good starting point for your toy project.

Elsa在 C++ 解析方面击败了我所知道的一切,即使它不是 100% 兼容。我是一个粉丝。有一个模块可以打印出 C++,因此这可能是您的玩具项目的一个很好的起点。