用 PHP 解析 RSS/Atom 提要的最佳方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/250679/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 22:04:05  来源:igfitidea点击:

Best way to parse RSS/Atom feeds with PHP

phpparsingrssatom-feed

提问by carson

I'm currently using Magpie RSSbut it sometimes falls over when the RSS or Atom feed isn't well formed. Are there any other options for parsing RSS and Atom feeds with PHP?

我目前正在使用Magpie RSS,但是当 RSS 或 Atom 提要格式不正确时,它有时会失败。是否有其他选项可以使用 PHP 解析 RSS 和 Atom 提要?

采纳答案by Philip Morton

回答by Brian Cline

I've always used the SimpleXML functions built in to PHPto parse XML documents. It's one of the few generic parsers out there that has an intuitive structure to it, which makes it extremely easy to build a meaningful class for something specific like an RSS feed. Additionally, it will detect XML warnings and errors, and upon finding any you could simply run the source through something like HTML Tidy (as ceejayoz mentioned) to clean it up and attempt it again.

我一直使用PHP 内置的 SimpleXML 函数来解析 XML 文档。它是少数具有直观结构的通用解析器之一,这使得为特定内容(如 RSS 提要)构建有意义的类变得非常容易。此外,它会检测 XML 警告和错误,一旦发现任何错误,您可以简单地通过 HTML Tidy(如 ceejayoz 提到的)之类的东西运行源代码来清理它并再次尝试。

Consider this very rough, simple class using SimpleXML:

考虑这个使用 SimpleXML 的非常粗糙、简单的类:

class BlogPost
{
    var $date;
    var $ts;
    var $link;

    var $title;
    var $text;
}

class BlogFeed
{
    var $posts = array();

    function __construct($file_or_url)
    {
        $file_or_url = $this->resolveFile($file_or_url);
        if (!($x = simplexml_load_file($file_or_url)))
            return;

        foreach ($x->channel->item as $item)
        {
            $post = new BlogPost();
            $post->date  = (string) $item->pubDate;
            $post->ts    = strtotime($item->pubDate);
            $post->link  = (string) $item->link;
            $post->title = (string) $item->title;
            $post->text  = (string) $item->description;

            // Create summary as a shortened body and remove images, 
            // extraneous line breaks, etc.
            $post->summary = $this->summarizeText($post->text);

            $this->posts[] = $post;
        }
    }

    private function resolveFile($file_or_url) {
        if (!preg_match('|^https?:|', $file_or_url))
            $feed_uri = $_SERVER['DOCUMENT_ROOT'] .'/shared/xml/'. $file_or_url;
        else
            $feed_uri = $file_or_url;

        return $feed_uri;
    }

    private function summarizeText($summary) {
        $summary = strip_tags($summary);

        // Truncate summary line to 100 characters
        $max_len = 100;
        if (strlen($summary) > $max_len)
            $summary = substr($summary, 0, $max_len) . '...';

        return $summary;
    }
}

回答by PJunior

With 4 lines, I import a rss to an array.

用 4 行,我将一个 rss 导入到一个数组中。

$feed = implode(file('http://yourdomains.com/feed.rss'));
$xml = simplexml_load_string($feed);
$json = json_encode($xml);
$array = json_decode($json,TRUE);

For a more complex solution

对于更复杂的解决方案

$feed = new DOMDocument();
 $feed->load('file.rss');
 $json = array();
 $json['title'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
 $json['description'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
 $json['link'] = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('link')->item(0)->firstChild->nodeValue;
 $items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');

 $json['item'] = array();
 $i = 0;

 foreach($items as $key => $item) {
 $title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
 $description = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
 $pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
 $guid = $item->getElementsByTagName('guid')->item(0)->firstChild->nodeValue;

 $json['item'][$key]['title'] = $title;
 $json['item'][$key]['description'] = $description;
 $json['item'][$key]['pubdate'] = $pubDate;
 $json['item'][$key]['guid'] = $guid; 
 }

echo json_encode($json);

回答by Vladimir Lukyanov

I would like introduce simple script to parse RSS:

我想介绍一个简单的脚本来解析 RSS:

$i = 0; // counter
$url = "http://www.banki.ru/xml/news.rss"; // url to parse
$rss = simplexml_load_file($url); // XML parser

// RSS items loop

print '<h2><img style="vertical-align: middle;" src="'.$rss->channel->image->url.'" /> '.$rss->channel->title.'</h2>'; // channel title + img with src

foreach($rss->channel->item as $item) {
if ($i < 10) { // parse only 10 items
    print '<a href="'.$item->link.'">'.$item->title.'</a><br />';
}

$i++;
}

回答by Kornel

If feed isn't well-formed XML, you're supposed to reject it, no exceptions. You're entitled to call feed creator a bozo.

如果提要不是格式良好的 XML,您应该拒绝它,没有例外。您有权将提要创建者称为 bozo

Otherwise you're paving way to mess that HTML ended up in.

否则,您就是在为弄乱 HTML 的结果铺平道路。

回答by ceejayoz

The HTML Tidy library is able to fix some malformed XML files. Running your feeds through that before passing them on to the parser may help.

HTML Tidy 库能够修复一些格式错误的 XML 文件。在将它们传递给解析器之前通过它运行您的提要可能会有所帮助。

回答by ceejayoz

I use SimplePieto parse a Google Reader feed and it works pretty well and has a decent feature set.

我使用SimplePie来解析 Google 阅读器提要,它运行良好并且具有不错的功能集。

Of course, I haven't tested it with non-well-formed RSS / Atom feeds so I don't know how it copes with those, I'm assuming Google's are fairly standards compliant! :)

当然,我还没有用格式不正确的 RSS / Atom 提要对其进行测试,所以我不知道它是如何处理这些的,我假设 Google 是相当符合标准的!:)

回答by Adam

Personally I use BNC Advanced Feed Parser- i like the template system that is very easy to use

我个人使用 BNC Advanced Feed Parser - 我喜欢非常易于使用的模板系统

回答by Thinol

The PHP RSS reader - http://www.scriptol.com/rss/rss-reader.php- is a complete but simple parser used by thousand of users...

PHP RSS 阅读器 - http://www.scriptol.com/rss/rss-reader.php- 是一个完整但简单的解析器,被数千名用户使用...

回答by Lucas

Another great free parser - http://bncscripts.com/free-php-rss-parser/It's very light ( only 3kb ) and simple to use!

另一个很棒的免费解析器 - http://bncscripts.com/free-php-rss-parser/它非常轻巧(只有 3kb)且易于使用!