如何在 PHP 中使用 XMLReader?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1835177/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 04:00:35  来源:igfitidea点击:

How to use XMLReader in PHP?

phpxmlparsingsimplexmlxmlreader

提问by Shadi Almosri

I have the following XML file, the file is rather large and i haven't been able to get simplexml to open and read the file so i'm trying XMLReader with no success in php

我有以下 XML 文件,该文件相当大,我无法让 simplexml 打开和读取文件,所以我正在尝试 XMLReader,但在 php 中没有成功

<?xml version="1.0" encoding="ISO-8859-1"?>
<products>
    <last_updated>2009-11-30 13:52:40</last_updated>
    <product>
        <element_1>foo</element_1>
        <element_2>foo</element_2>
        <element_3>foo</element_3>
        <element_4>foo</element_4>
    </product>
    <product>
        <element_1>bar</element_1>
        <element_2>bar</element_2>
        <element_3>bar</element_3>
        <element_4>bar</element_4>
    </product>
</products>

I've unfortunately not found a good tutorial on this for PHP and would love to see how I can get each element content to store in a database.

不幸的是,我没有找到关于 PHP 的好教程,我很想知道如何将每个元素内容存储在数据库中。

回答by Josh Davis

It all depends on how big the unit of work, but I guess you're trying to treat each <product/>nodes in succession.

这完全取决于工作单元的大小,但我想您正在尝试<product/>连续处理每个节点。

For that, the simplest way would be to use XMLReader to get to each node, then use SimpleXML to access them. This way, you keep the memory usage low because you're treating one node at a time and you still leverage SimpleXML's ease of use. For instance:

为此,最简单的方法是使用 XMLReader 访问每个节点,然后使用 SimpleXML 访问它们。这样,您可以保持较低的内存使用量,因为您一次处理一个节点,并且您仍然可以利用 SimpleXML 的易用性。例如:

$z = new XMLReader;
$z->open('data.xml');

$doc = new DOMDocument;

// move to the first <product /> node
while ($z->read() && $z->name !== 'product');

// now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'product')
{
    // either one should work
    //$node = new SimpleXMLElement($z->readOuterXML());
    $node = simplexml_import_dom($doc->importNode($z->expand(), true));

    // now you can use $node without going insane about parsing
    var_dump($node->element_1);

    // go to next <product />
    $z->next('product');
}

Quick overview of pros and cons of different approaches:

快速概述不同方法的优缺点:

XMLReader only

仅 XMLReader

  • Pros: fast, uses little memory

  • Cons: excessively hard to write and debug, requires lots of userland code to do anything useful. Userland code is slow and prone to error. Plus, it leaves you with more lines of code to maintain

  • 优点:速度快,占用内存少

  • 缺点:编写和调试过于困难,需要大量的用户空间代码来做任何有用的事情。Userland 代码很慢并且容易出错。此外,它还为您留下了更多的代码行来维护

XMLReader + SimpleXML

XMLReader + SimpleXML

  • Pros: doesn't use much memory (only the memory needed to process one node) and SimpleXML is, as the name implies, really easy to use.

  • Cons: creating a SimpleXMLElement object for each node is not very fast. You really have to benchmark it to understand whether it's a problem for you. Even a modest machine would be able to process a thousand nodes per second, though.

  • 优点:不使用太多内存(仅处理一个节点所需的内存),顾名思义,SimpleXML 非常易于使用。

  • 缺点:为每个节点创建一个 SimpleXMLElement 对象不是很快。您确实必须对其进行基准测试以了解它是否对您来说是个问题。不过,即使是一台普通的机器也能够每秒处理一千个节点。

XMLReader + DOM

XMLReader + DOM

  • Pros: uses about as much memory as SimpleXML, and XMLReader::expand()is faster than creating a new SimpleXMLElement. I wish it was possible to use simplexml_import_dom()but it doesn't seem to work in that case

  • Cons: DOM is annoying to work with. It's halfway between XMLReader and SimpleXML. Not as complicated and awkward as XMLReader, but light years away from working with SimpleXML.

  • 优点:使用的内存与 SimpleXML 差不多,并且XMLReader::expand()比创建新的 SimpleXMLElement 更快。我希望可以使用,simplexml_import_dom()但在这种情况下似乎不起作用

  • 缺点:DOM 使用起来很烦人。它介于 XMLReader 和 SimpleXML 之间。不像 XMLReader 那样复杂和笨拙,但与 SimpleXML 的工作相去甚远。

My advice: write a prototype with SimpleXML, see if it works for you. If performance is paramount, try DOM. Stay as far away from XMLReader as possible. Remember that the more code you write, the higher the possibility of you introducing bugs or introducing performance regressions.

我的建议:用 SimpleXML 写一个原型,看看它是否适合你。如果性能是最重要的,请尝试 DOM。尽可能远离 XMLReader。请记住,您编写的代码越多,您引入错误或引入性能回归的可能性就越大。

回答by try5tan3

For xml formatted with attributes...

对于使用属性格式化的 xml...

data.xml:

数据.xml:

<building_data>
<building address="some address" lat="28.902914" lng="-71.007235" />
<building address="some address" lat="48.892342" lng="-75.0423423" />
<building address="some address" lat="58.929753" lng="-79.1236987" />
</building_data>

php code:

php代码:

$reader = new XMLReader();

if (!$reader->open("data.xml")) {
    die("Failed to open 'data.xml'");
}

while($reader->read()) {
  if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'building') {
    $address = $reader->getAttribute('address');
    $latitude = $reader->getAttribute('lat');
    $longitude = $reader->getAttribute('lng');
}

$reader->close();

回答by Francis Lewis

The accepted answer gave me a good start, but brought in more classes and more processing than I would have liked; so this is my interpretation:

接受的答案给了我一个好的开始,但带来了比我想要的更多的课程和更多的处理;所以这是我的解释:

$xml_reader = new XMLReader;
$xml_reader->open($feed_url);

// move the pointer to the first product
while ($xml_reader->read() && $xml_reader->name != 'product');

// loop through the products
while ($xml_reader->name == 'product')
{
    // load the current xml element into simplexml and we're off and running!
    $xml = simplexml_load_string($xml_reader->readOuterXML());

    // now you can use your simpleXML object ($xml).
    echo $xml->element_1;

    // move the pointer to the next product
    $xml_reader->next('product');
}

// don't forget to close the file
$xml_reader->close();

回答by Josiah

Most of my XML parsing life is spent extracting nuggets of useful information out of truckloads of XML (Amazon MWS). As such, my answer assumes you want only specific information and you know where it is located.

我的大部分 XML 解析工作都花在从成堆的 XML (Amazon MWS) 中提取有用的信息块上。因此,我的回答假设您只需要特定信息并且您知道它位于何处。

I find the easiest way to use XMLReader is to know which tags I want the information out of and use them. If you know the structure of the XML and it has lots of unique tags, I find that using the first case is the easy. Cases 2 and 3 are just to show you how it can be done for more complex tags. This is extremely fast; I have a discussion of speed over on What is the fastest XML parser in PHP?

我发现使用 XMLReader 的最简单方法是知道我想要哪些标签的信息并使用它们。如果您知道 XML 的结构并且它有很多独特的标签,我发现使用第一种情况很容易。案例 2 和案例 3 只是向您展示如何处理更复杂的标签。这非常快;我在 PHP 中最快的 XML 解析器是什么?

The most important thing to remember when doing tag-based parsing like this is to use if ($myXML->nodeType == XMLReader::ELEMENT) {...- which checks to be sure we're only dealing with opening nodes and not whitespace or closing nodes or whatever.

像这样进行基于标签的解析时要记住的最重要的事情是使用if ($myXML->nodeType == XMLReader::ELEMENT) {...- 它检查以确保我们只处理打开节点而不是空格或关闭节点或其他任何东西。

function parseMyXML ($xml) { //pass in an XML string
    $myXML = new XMLReader();
    $myXML->xml($xml);

    while ($myXML->read()) { //start reading.
        if ($myXML->nodeType == XMLReader::ELEMENT) { //only opening tags.
            $tag = $myXML->name; //make $tag contain the name of the tag
            switch ($tag) {
                case 'Tag1': //this tag contains no child elements, only the content we need. And it's unique.
                    $variable = $myXML->readInnerXML(); //now variable contains the contents of tag1
                    break;

                case 'Tag2': //this tag contains child elements, of which we only want one.
                    while($myXML->read()) { //so we tell it to keep reading
                        if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') { // and when it finds the amount tag...
                            $variable2 = $myXML->readInnerXML(); //...put it in $variable2. 
                            break;
                        }
                    }
                    break;

                case 'Tag3': //tag3 also has children, which are not unique, but we need two of the children this time.
                    while($myXML->read()) {
                        if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') {
                            $variable3 = $myXML->readInnerXML();
                            break;
                        } else if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Currency') {
                            $variable4 = $myXML->readInnerXML();
                            break;
                        }
                    }
                    break;

            }
        }
    }
$myXML->close();
}

回答by sebob

Simple example:

public function productsAction()
{
    $saveFileName = 'ceneo.xml';
    $filename = $this->path . $saveFileName;
    if(file_exists($filename)) {

    $reader = new XMLReader();
    $reader->open($filename);

    $countElements = 0;

    while($reader->read()) {
        if($reader->nodeType == XMLReader::ELEMENT) {
            $nodeName = $reader->name;
        }

        if($reader->nodeType == XMLReader::TEXT && !empty($nodeName)) {
            switch ($nodeName) {
                case 'id':
                    var_dump($reader->value);
                    break;
            }
        }

        if($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'offer') {
            $countElements++;
        }
    }
    $reader->close();
    exit(print('<pre>') . var_dump($countElements));
    }
}

回答by Percutio

XMLReader is well documented onPHP site. This is a XML Pull Parser, which means it's used to iterate through nodes (or DOM Nodes) of given XML document. For example, you could go through the entire document you gave like this:

XMLReader 在PHP 站点上有很好的记录。这是一个 XML Pull Parser,这意味着它用于遍历给定 XML 文档的节点(或 DOM 节点)。例如,您可以像这样浏览您提供的整个文档:

<?php
$reader = new XMLReader();
if (!$reader->open("data.xml"))
{
    die("Failed to open 'data.xml'");
}
while($reader->read())
{
    $node = $reader->expand();
    // process $node...
}
$reader->close();
?>

It is then up to you to decide how to deal with the node returned by XMLReader::expand().

然后由您决定如何处理由XMLReader::expand()返回的节点。