在 PHP 中处理大型 XML 的最佳方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1167062/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best way to process large XML in PHP
提问by Petruza
I have to parse large XML files in php, one of them is 6.5 MB and they could be even bigger. The SimpleXML extension as I've read, loads the entire file into an object, which may not be very efficient. In your experience, what would be the best way?
我必须在 php 中解析大型 XML 文件,其中之一是 6.5 MB,它们可能更大。我读过的 SimpleXML 扩展将整个文件加载到一个对象中,这可能不是很有效。根据您的经验,最好的方法是什么?
采纳答案by Eric Petroelje
For a large file, you'll want to use a SAX parserrather than a DOM parser.
对于大文件,您需要使用SAX 解析器而不是 DOM 解析器。
With a DOM parser it will read in the whole file and load it into an object tree in memory. With a SAX parser, it will read the file sequentially and call your user-defined callback functions to handle the data (start tags, end tags, CDATA, etc.)
使用 DOM 解析器,它将读取整个文件并将其加载到内存中的对象树中。使用 SAX 解析器,它将顺序读取文件并调用您的用户定义的回调函数来处理数据(开始标记、结束标记、CDATA 等)。
With a SAX parser you'll need to maintain state yourself (e.g. what tag you are currently in) which makes it a bit more complicated, but for a large file it will be much more efficient memory wise.
使用 SAX 解析器,您需要自己维护状态(例如您当前所在的标签),这使得它有点复杂,但对于大文件,它在内存方面会更高效。
回答by oskarth
My take on it:
我的看法:
https://github.com/prewk/XmlStreamer
https://github.com/prewk/XmlStreamer
A simple class that will extract all children to the XML root element while streaming the file. Tested on 108 MB XML file from pubmed.com.
一个简单的类,它将在流式传输文件时将所有子元素提取到 XML 根元素。测试来自 pubmed.com 的 108 MB XML 文件。
class SimpleXmlStreamer extends XmlStreamer {
public function processNode($xmlString, $elementName, $nodeIndex) {
$xml = simplexml_load_string($xmlString);
// Do something with your SimpleXML object
return true;
}
}
$streamer = new SimpleXmlStreamer("myLargeXmlFile.xml");
$streamer->parse();
回答by COil
When using a DOMDocumentwith large XML files, don't forget to pass the LIBXML_PARSEHUGEflag in the options of the load()method. (Same applies for the other loadmethods of the DOMDocumentobject)
使用DOMDocument大型 XML 文件时,不要忘记LIBXML_PARSEHUGE在load()方法的选项中传递标志。(同样适用于对象的其他load方法DOMDocument)
$checkDom = new \DOMDocument('1.0', 'UTF-8');
$checkDom->load($filePath, LIBXML_PARSEHUGE);
(Works with a 120mo XML file)
(适用于 120 个月的 XML 文件)
回答by kenleycapps
A SAX Parser, as Eric Petroelje recommends, would be better for large XML files. A DOM parser loads in the entire XML file and allows you to run xpath queries-- a SAX (Simple API for XML) parser will simply read one line at a time and give you hook points for processing.
正如 Eric Petroelje 推荐的那样,SAX 解析器更适合大型 XML 文件。DOM 解析器加载整个 XML 文件并允许您运行 xpath 查询——SAX(XML 的简单 API)解析器将一次只读取一行并为您提供挂钩点进行处理。
回答by gahooa
It really depends on what you want to do with the data? Do you need it all in memory to effectively work with it?
这真的取决于你想用数据做什么?您是否需要将其全部保存在内存中才能有效地使用它?
6.5 MB is not that big, in terms of today's computers. You could, for example, ini_set('memory_limit', '128M');
6.5 MB 并没有那么大,就今天的计算机而言。例如,您可以ini_set('memory_limit', '128M');
However, if your data can be streamed, you may want to look at using a SAX parser. It really depends on your usage needs.
但是,如果您的数据可以流式传输,您可能需要考虑使用SAX 解析器。这实际上取决于您的使用需求。
回答by Benedict Cohen
SAX parser is the way to go. I've found that SAX parsing can get messy if you don't stay organised.
SAX 解析器是要走的路。我发现如果您不保持条理,SAX 解析会变得混乱。
I use an approach based on STX (Streaming Transformations for XML) to parse large XML files. I use the SAX methods to build a SimpleXML object to keep track of the data in the current context (ie just the nodes between the root and the current node). Other functions are then used for processing the SimpleXML document.
我使用基于 STX(XML 流转换)的方法来解析大型 XML 文件。我使用 SAX 方法来构建一个 SimpleXML 对象来跟踪当前上下文中的数据(即,只是根节点和当前节点之间的节点)。然后使用其他函数来处理 SimpleXML 文档。
回答by Liam
I needed to parse a large XML file that happened to have an element on each line (the StackOverflow data dump). In this specific case it was sufficient to read the file one line at a time and parse each line using SimpleXML. For me this had the advantage of not having to learn anything new.
我需要解析一个大型 XML 文件,该文件碰巧每行都有一个元素(StackOverflow 数据转储)。在这种特定情况下,一次读取文件一行并使用 SimpleXML 解析每一行就足够了。对我来说,这样做的好处是不必学习任何新东西。

