bash 使用 curl、grep 和 sed 从 HTML 中提取数据

Question

提问by user2397282

I am trying to learn some terminal commands, and saw this one that grabs the links of the latest Google doodle and copies it to your clipboard:

我正在尝试学习一些终端命令，并看到这个抓取最新 Google 涂鸦的链接并将其复制到剪贴板的命令：

$ curl http://www.google.com/doodles#oodles/archive |
grep -A5 'latest-doodle on' | grep 'img src' |
sed s/.*'<img src="\/\/'/''/ | sed s/'" alt=".*'/''/ | pbcopy

I tried to do something similar - this command should copy the word of the day to your clipboard:

我尝试做类似的事情 - 此命令应该将当天的单词复制到剪贴板：

curl "http://www.merriam-webster.com/word-of-the-day/" |
grep -A5 'main_entry_word' | sed s/.*'<strong class="main_entry_word">'/''/ |
sed s/'</\strong>.*'/''/ | pbcopy

I got an error that said:

我收到一条错误消息：

sed: 1: "s/</\strong>.*//": bad flag in substitute command: '/'

I'm not really sure what I'm doing and I've tried some tutorials on other websites but I can't figure it out. I think the main problem is that I don't understand what most of the 'sed' command does.

我不太确定我在做什么，我在其他网站上尝试了一些教程，但我无法弄清楚。我认为主要问题是我不明白大多数“sed”命令的作用。

Can someone help me please?

有人能帮助我吗？

Answer 1

采纳答案by Bruce K

sed s/'<\/strong>.*'/''/

or

或者

sed s@'</strong>.*'@''@

Answer 2

回答by Kent

If I understand your requirement right, you want to extract the text between <strong...class="...">and </strong>, I would use single grep to save your grep|grep|sed|sed...:

如果我理解你的要求吧，要提取的文本<strong...class="...">和</strong>，我会用单grep来保存您grep|grep|sed|sed...：

also use -soption of curl:

还可以使用-scurl 选项：

kent$  curl -s "link"|grep -Po '<strong\s+class="main_entry_word">\K.*?(?=</strong>)'

output:

输出：

palmy

bash 使用 curl、grep 和 sed 从 HTML 中提取数据

提问by user2397282

采纳答案by Bruce K

回答by Kent

相关推荐

最近更新

标签

bash 使用 curl、grep 和 sed 从 HTML 中提取数据

提问by user2397282

采纳答案by Bruce K

回答by Kent

相关推荐

bash 使用shell脚本将数据插入数据库

Bash 脚本 Mysql 警告：在命令行界面上使用密码可能不安全

bash 嵌套的 awk 命令

将特殊字符从输入传递到 bash 脚本

相关推荐

最近更新

标签