PHP 中最快的 XML 解析器是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3048583/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 08:31:53  来源:igfitidea点击:

What is the fastest XML parser in PHP?

phpxmlperformance

提问by Jakub Lédl

for a certain project, I need some way to parse XML and get data from it. So I wonder, which one of built-in parsers is the fastest?

对于某个项目,我需要某种方法来解析 XML 并从中获取数据。所以我想知道,哪个内置解析器是最快的?

Also, it would be nice of the parser could accept a XML string as input - I have my own implementation of thread-safe working with files and I don't want some nasty non-thread-safe libraries to make my efforts useless.

此外,解析器可以接受 XML 字符串作为输入会很好 - 我有自己的线程安全处理文件的实现,我不希望一些讨厌的非线程安全库使我的努力变得无用。

采纳答案by Evan Carroll

The fastest parser will be SAX -- it doesn't have to create a dom, and it can be done with partial xml, or progressively. Info on the PHP SAX parser (Expat) can be found here. Alternatively there is a libxml based DOM parser named SimpleXML. A DOM based parser will be easier to work with but it is typically a few orders of magnitude slower.

最快的解析器将是 SAX —— 它不必创建 dom,并且可以使用部分 xml 或逐步完成。有关PHP SAX 解析器 (Expat) 的信息可以在此处找到。或者,有一个名为 SimpleXML 的基于 libxml 的 DOM 解析器。基于 DOM 的解析器将更容易使用,但通常要慢几个数量级。

回答by Josiah

**This is geared primarily toward those starting with XML Parsing and not sure which parser to use.

**这主要面向那些开始使用 XML 解析并且不确定要使用哪个解析器的人。

There are two "big" ways to go about parsing - you can either load the XML into memory and find what you need (DOM, SimpleXML) or you can stream it - read it and execute code based on what you read (XMLReader, SAX).

有两种“大”的解析方式——你可以将 XML 加载到内存中并找到你需要的东西(DOM、SimpleXML),或者你可以流式传输它——读取它并根据你读取的内容执行代码(XMLReader、SAX )。

According to Microsoft, SAX is a "push" parser, which sends every piece of information to your application and your application processes it. SimpleXML is a "pull" parser, which allows you to skip chunks of data and only grab what you need. According to Microsoft, this can both simplify and accelerate your application, and I would assume the .NET and PHP implementations are similar. I suppose your choice would depend on your needs - if you're pulling out just a few tags from a larger chunk and can use the $xml->next('Element')to skip significant chunks, you may find that XMLReader is faster than SAX.

根据 Microsoft 的说法,SAX 是一个“推送”解析器,它将每条信息发送到您的应用程序,然后您的应用程序对其进行处理。SimpleXML 是一个“拉式”解析器,它允许您跳过数据块而只获取您需要的内容。根据微软的说法,这可以简化和加速你的应用程序,我认为 .NET 和 PHP 实现是相似的。我想您的选择取决于您的需要——如果您只是从较大的块中提取几个标签并且可以使用$xml->next('Element')跳过重要的块,您可能会发现 XMLReader 比 SAX 快。

Parsing "small" (<30kb, 700 lines) XML files repetitively, you might not expect there would be a huge time difference between the methods of parsing. I was surprised to find that there was. I ran a comparison of a small feed processed in SimpleXML and XMLReader. Hopefully this will help someone else to visualize how significant a difference this data is. For a real life comparison, this is parsing the response to two Amazon MWS Product Information request feeds.

重复解析“小”(<30kb,700 行)XML 文件,您可能不会想到解析方法之间会有巨大的时间差异。我很惊讶地发现有。我对在 SimpleXML 和 XMLReader 中处理的小提要进行了比较。希望这将帮助其他人想象这些数据的显着差异。对于现实生活中的比较,这是解析对两个亚马逊 MWS 产品信息请求提要的响应。

Each Parse Time is the time required to take 2 XML strings and return about 120 variables containing values from each string. Each loop takes different data, but each of the tests was on the same data in the same order.

每个解析时间是获取 2 个 XML 字符串并返回大约 120 个包含来自每个字符串的值的变量所需的时间。每个循环采用不同的数据,但每个测试都以相同的顺序对相同的数据进行。

SimpleXML loads the document into memory. I used microtime to check both the time to complete the parse (extract the relevant values), as well as the time spent creating the element (when new SimpleXMLElement($xml)was called). I have rounded these to 4 decimal places.

SimpleXML 将文档加载到内存中。我使用 microtime 来检查完成解析的时间(提取相关值),以及创建元素所花费的时间(何时new SimpleXMLElement($xml)被调用)。我已将这些四舍五入到小数点后 4 位。

Parse Time: 0.5866 seconds
Parse Time: 0.3045 seconds 
Parse Time: 0.1037 seconds
Parse Time: 0.0151 seconds 
Parse Time: 0.0282 seconds 
Parse Time: 0.0622 seconds 
Parse Time: 0.7756 seconds
Parse Time: 0.2439 seconds  
Parse Time: 0.0806 seconds 
Parse Time: 0.0696 seconds
Parse Time: 0.0218 seconds
Parse Time: 0.0542 seconds
__________________________
            2.3500 seconds
            0.1958 seconds average

Time Spent Making the Elements: 0.5232 seconds 
Time Spent Making the Elements: 0.2974 seconds 
Time Spent Making the Elements: 0.0980 seconds 
Time Spent Making the Elements: 0.0097 seconds 
Time Spent Making the Elements: 0.0231 seconds 
Time Spent Making the Elements: 0.0091 seconds 
Time Spent Making the Elements: 0.7190 seconds 
Time Spent Making the Elements: 0.2410 seconds 
Time Spent Making the Elements: 0.0765 seconds 
Time Spent Making the Elements: 0.0637 seconds 
Time Spent Making the Elements: 0.0081 seconds 
Time Spent Making the Elements: 0.0507 seconds 
______________________________________________
                                2.1195 seconds
                                0.1766 seconds average
over 90% of the total time is spent loading elements into the DOM.

Only 0.2305 seconds is spent locating the elements and returning them.

While the XMLReader, which is stream based, I was able to skip a significant chunk of one of the XML feeds since the data I wanted was near the top of each element. "Your Mileage May Vary."

虽然 XMLReader 是基于流的,但我能够跳过其中一个 XML 提要的很大一部分,因为我想要的数据靠近每个元素的顶部。“你的旅费可能会改变。”

Parse Time: 0.1059 seconds  
Parse Time: 0.0169 seconds 
Parse Time: 0.0214 seconds 
Parse Time: 0.0665 seconds 
Parse Time: 0.0255 seconds 
Parse Time: 0.0241 seconds 
Parse Time: 0.0234 seconds 
Parse Time: 0.0225 seconds 
Parse Time: 0.0183 seconds 
Parse Time: 0.0202 seconds 
Parse Time: 0.0245 seconds 
Parse Time: 0.0205 seconds 
__________________________
            0.3897 seconds
            0.0325 seconds average

What is striking is that although locating elements is slightly faster in SimpleXML once it is all loaded, it is actually over 6 times faster to use XMLReaderoverall.

令人惊讶的是,虽然在 SimpleXML 中定位元素在全部加载后稍微快一点,但实际上使用 XMLReader整体要快 6 倍以上

You can find some information on using XMLReader at How to use XMLReader in PHP?

您可以在如何在 PHP 中使用 XMLReader 中找到有关使用 XMLReader 的一些信息

回答by Bill Karwin

Each XML extension has its own strengths and weaknesses. For example, I have a script that parses the XML data dump from Stack Overflow. The posts.xmlfile is 2.8GB! For this large XML file, I had to use XMLReaderbecause it reads XML in a streaming mode, instead of trying to load and represent the whole XML document in memory at once, as the DOM extension does.

每个 XML 扩展都有自己的优点和缺点。例如,我有一个脚本可以解析来自 Stack Overflow 的 XML 数据转储。该posts.xml文件是2.8GB!对于这个大型 XML 文件,我不得不使用XMLReader它,因为它以流模式读取 XML,而不是像 DOM 扩展那样尝试在内存中一次加载和表示整个 XML 文档。

So you need to be more specific about describing how you are going to use the XML, in order to decide which PHP extension to use.

因此,您需要更具体地描述您将如何使用 XML,以便决定使用哪个 PHP 扩展。

All of PHP's XML extensions provide some method to read XML data as a string.

所有 PHP 的 XML 扩展都提供了一些将 XML 数据作为字符串读取的方法。

回答by Tobias P.

There are not really much parsers in PHP.

PHP 中的解析器并不多。

The most effective will be those provided with PHP, write a benchmark with DOM and SimpleXML and check which performs better.

最有效的将是 PHP 提供的那些,使用 DOM 和 SimpleXML 编写基准测试并检查哪个性能更好。