任何用 PHP 编写的体面的 PHP 解析器?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5586358/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 21:53:05  来源:igfitidea点击:

Any decent PHP parser written in PHP?

phpparsing

提问by NikiC

I do lots of work manipulating and analyzing PHP code. Normally I just use the Tokenizerto do this. For most applications this is sufficient. But sometimes parsing using a lexer just isn't reliable enough (obviously).

我做了很多操作和分析 PHP 代码的工作。通常我只是使用Tokenizer来做到这一点。对于大多数应用程序,这已经足够了。但有时使用词法分析器进行解析不够可靠(显然)。

Thus I am looking for some PHP parser written in PHP. I found hnw/PhpParserand kumatch/stagehand-php-parser. Both are created by an automated conversion of zend_language_parser.yto a .y file with PHP instead of C (and then compiled to a LALR(1) parser). But this automated conversion just can't be worked with.

因此,我正在寻找一些用 PHP 编写的 PHP 解析器。我找到了hnw/PhpParserkumatch/stagehand-php-parser。两者都是通过将zend_language_parser.y自动转换为使用 PHP 而不是 C 的 .y 文件创建的(然后编译为 LALR(1) 解析器)。但是这种自动转换无法使用。

So, is there any decent PHP parser written in PHP? (I need one for PHP 5.2 and one for 5.3. But just one of them would be a good starting point, too.)

那么,有没有用 PHP 编写的像样的 PHP 解析器?(我需要一个用于 PHP 5.2 和一个用于 5.3。但其中一个也将是一个很好的起点。)

回答by NikiC

After no complete and stable parser was found here I decided to write one myself. Here is the result:

在这里找不到完整和稳定的解析器后,我决定自己编写一个。结果如下:

PHP-Parser: A PHP parser written in PHP

PHP-Parser:用 PHP 编写的 PHP 解析器

The project supports parsing code written for any PHP version between PHP 5.2 and PHP 7.1.

该项目支持解析为 PHP 5.2 和 PHP 7.1 之间的任何 PHP 版本编写的代码。

Apart from the parser itself the library provides some related components:

除了解析器本身,该库还提供了一些相关组件:

  • Compilation of the AST back to PHP("pretty printing")
  • Infrastructure for traversing and changing the AST
  • Serialization to and from XML (as well as dumping in a human readable form)
  • Resolution of namespaced names (aliases etc.)
  • 将 AST 编译回 PHP(“漂亮的打印”)
  • 用于遍历和更改 AST 的基础设施
  • 与 XML 之间的序列化(以及以人类可读的形式转储)
  • 命名空间名称(别名等)的解析

For an usage overview see the "Usage of basic components"section of the documentation.

有关用法概述,请参阅文档“基本组件的用法”部分。

回答by Charles

This isn't going to be a great option for you, as it violates the pure-PHP constraint, but:

这对您来说不是一个很好的选择,因为它违反了纯 PHP 约束,但是:

A while ago, the php-internals folks decided that they would switch to Lemonas their parsing technology. There's a branch in the PHP svn repothat contains the required changes.

不久前,php 内部人员决定改用Lemon作为他们的解析技术。PHP svn 存储库中有一个包含所需更改的分支

They decided not to continue with this, as they found that their Lemon solution is about 10-15% slower. But, the branch is still there.

他们决定不继续这样做,因为他们发现他们的 Lemon 解决方案慢了大约 10-15%。但是,分行还在。

There's an older Lemon parserwritten as a PHP extension. You might be able to work with it. There's also this PEAR package. There's also this other lemon package(via thisblog post about PGN).

有一个旧的 Lemon 解析器作为 PHP 扩展编写。你也许可以使用它。还有这个 PEAR 包。还有这个其他的柠檬包(通过这篇关于PGN 的博客文章)。

Of course, even if you get it working, I'm not sure what you'd do with the data, or what the data even looks like.

当然,即使你让它工作,我也不确定你会用数据做什么,或者数据是什么样的。

Another wacky option would be peeking at Quercus, a PHP implementation in Java. They'd have to have written a parser, maybe it might be worth investigating.

另一个古怪的选择是查看 Quercus,它是 Java 中的 PHP 实现。他们必须编写一个解析器,也许值得研究。

回答by naderman

The metrics tool PHP Dependcontains code to generate an AST from PHP source written entirely in PHP. It does make use of PHP's own token_get_all for the tokenization however.

度量工具PHP Depend包含从完全用 PHP 编写的 PHP 源代码生成 AST 的代码。但是,它确实使用 PHP 自己的 token_get_all 进行标记化。

The source code is available on github: https://github.com/manuelpichler/pdepend/tree/master/src/main/php/PHP/Depend

源代码可在 github 上找到:https: //github.com/manuelpichler/pdepend/tree/master/src/main/php/PHP/Depend

The implementation of the AST for some parts like mathematical expressions was not yet complete last I checked, but according to its author that is the goal.

上次我检查时,某些部分(如数学表达式)的 AST 实现尚未完成,但据其作者称,这就是目标。

回答by Ira Baxter

Well, this isn't in PHP, sorry, but building this kind of machinery is hard, and PHP isn't particularly suited for the task of language processing.

嗯,这不是在 PHP 中,抱歉,但是构建这种机器很困难,而且 PHP 不是特别适合语言处理任务。

Our PHP Front End it provides full PHP 4.x and 5.x (EDIT 9/2016: now handles PHP 7) parsing, automatically builds ASTs with all the details of a full PHP grammar, can generate compilable source text from the ASTs. This is harder than it might sound when you consider all the screwy details including weird string literals, captured comments, numbers-with-radix, etc.

我们的PHP 前端它提供完整的 PHP 4.x 和 5.x(编辑 9/2016:现在处理 PHP 7)解析,使用完整 PHP 语法的所有细节自动构建 AST,可以从 AST 生成可编译的源文本。当您考虑所有棘手的细节(包括奇怪的字符串文字、捕获的注释、带基数的数字等)时,这比听起来要困难得多。

But ASTs are hardly enough(you've already observed that tokens aren't even barely enough).

但是AST 还远远不够(您已经观察到令牌甚至还不够)。

The foundation on which it is built, the DMS Software Reengineering Toolkitprovides support for analysis and arbitary transformations of the ASTs. It will also read large sets of files at once, enabling analysis and transformations acrossPHP files.

作为构建它的基础,DMS 软件再造工具包为 AST 的分析和任意转换提供支持。它还可以一次读取大量文件,从而支持PHP 文件的分析和转换。

回答by Vladislav Rastrusny

There is a port of ANTLR to PHP: http://code.google.com/p/antlrphpruntime/w/list

有一个 ANTLR 到 PHP 的端口:http: //code.google.com/p/antlrphpruntime/w/list

It's abandoned, but I think it should still work.

它被放弃了,但我认为它应该仍然有效。