bash 如何在 shell 脚本中解析 rss-feeds/xml

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/443991/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 20:37:48  来源:igfitidea点击:

How to parse rss-feeds / xml in a shell script

xmlbashrssscripting

提问by Oli

I'd like to parse rss feeds and download podcastson my ReadyNas which is running 24/7 anyway.

我想在我的 ReadyNas 上解析 rss 提要和下载播客,无论如何它都在 24/7 运行。

So I'm thinking about having a shell script checking periodically the feeds and spawning wget to download the files.

所以我正在考虑让一个 shell 脚本定期检查提要并生成 wget 来下载文件。

What is the best way to do the parsing?

进行解析的最佳方法是什么?

Thanks!

谢谢!

回答by leo

Sometimes a simple one liner with shell standard commands can be enough for this:

有时,一个带有 shell 标准命令的简单单行就足够了:

 wget -q -O- "http://www.rss-specifications.com/rss-podcast.xml" | grep -o '<enclosure url="[^"]*' | grep -o '[^"]*$' | xargs wget -c

Sure this does not work in every case, but it's often good enough.

当然这并不适用于所有情况,但通常已经足够了。

回答by cddr

Do you have access to awk? Maybe you could use XMLGawk

你可以访问awk吗?也许你可以使用XMLGawk

回答by Oli

I read about XMLStartlethereand there

在这里那里阅读了有关XMLStartlet 的信息

But is there a port to ReadyNas NV+ available?

但是有没有ReadyNas NV+ 的端口可用?

回答by kenorb

I've wrote the following simple script for downloading XML from Amazon S3, so it would be useful for parsing different kind of XML files:

我编写了以下用于从 Amazon S3 下载 XML 的简单脚本,因此它对于解析不同类型的 XML 文件非常有用:

#!/bin/bash
#
# Download all files from the Amazon feed
#
# Usage:
#  ./dl_amazon_feed_files.sh http://example.s3.amazonaws.com/
# Note: Don't forget about slash at the end
#

wget -qO- "" | grep -o '<Key>[^<]*' | grep -o "[^>]*$" | xargs -I% -L1 wget -c "%"

This is similar approach to @leo answer.

这与@leo answer 的方法类似。

回答by Giacomo

You can use xsltproc from libxml2and write a simple xsl stylesheet that parses the rss and outputs a list of links.

您可以使用libxml2 中的xsltproc并编写一个简单的 xsl 样式表来解析 rss 并输出链接列表。