bash 管道 curl 输出到 grep

Question

提问by David Xie

Just a little disclaimer, I am not very familiar with programming so please excuse me if I'm using any terms incorrectly/in a confusing way.

只是一点免责声明，我对编程不是很熟悉，所以如果我错误地/以令人困惑的方式使用了任何术语，请原谅。

I want to be able to extract specific information from a webpage and tried doing this by piping the output of a curl function into grep. Oh and this is in cygwin if that matters.

我希望能够从网页中提取特定信息，并尝试通过将 curl 函数的输出传输到 grep 来执行此操作。哦，如果这很重要，这在 cygwin 中。

When just typing in

刚打字的时候

$ curl www.ncbi.nlm.nih.gov/gene/823951

The terminal prints the whole webpage in what I believe to be html. From here I thought I could just pipe this output into a grep function with whatever search term want with:

终端以我认为是 html 的格式打印整个网页。从这里开始，我想我可以将此输出通过管道传输到 grep 函数中，并使用任何搜索词：

  $ curl www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene Symbol"

But instead of printing the webpage at all, the terminal gives me:

但是终端根本没有打印网页，而是给了我：

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  142k    0  142k    0     0  41857      0 --:--:--  0:00:03 --:--:-- 42083

Can anyone explain why it does this/how I can search for specific lines of text in a webpage? I eventually want to compile information like gene names, types, and descriptions into a database, so I was hoping to export the results from the grep function into a text file after that.

谁能解释为什么这样做/我如何在网页中搜索特定的文本行？我最终想将基因名称、类型和描述等信息编译到数据库中，所以我希望之后将 grep 函数的结果导出到文本文件中。

Any help is extremely appreciated, thanks in advance!

非常感谢任何帮助，提前致谢！

Answer 1

回答by retrospectacus

Curl detects that it is not outputting to a terminal, and shows you the Progress Meter. You can suppress the progress meter with -s.

Curl 检测到它没有输出到终端，并向您显示进度表。您可以使用 -s 抑制进度表。

The HTML data is indeed being sent to grep. However that page does not contain the text "Gene Symbol". Grep is case-sensitive (unless invoked with -i) and you are looking for "Gene symbol".

HTML 数据确实被发送到 grep。然而，该页面不包含文本“基因符号”。Grep 区分大小写（除非使用 -i 调用）并且您正在寻找“基因符号”。

$ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene symbol"
    <dt class="noline"> Gene symbol </dt>

You probably also want the next line of HTML, which you can make grep output with the -A option:

您可能还需要下一行 HTML，您可以使用 -A 选项生成 grep 输出：

$ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep -A1 "Gene symbol"
    <dt class="noline"> Gene symbol </dt>
    <dd class="noline">AT3G47960</dd>

See man curland man grepfor more information about these and other options.

有关这些和其他选项的更多信息，请参阅man curl和man grep。

bash 管道 curl 输出到 grep

提问by David Xie

回答by retrospectacus

相关推荐

最近更新

标签

bash 管道 curl 输出到 grep

提问by David Xie

回答by retrospectacus

相关推荐

bash 脚本条件部分中的 EOT

bash 无法识别我的 init.d 的服务

bash 如何从失败的命令返回退出代码 0

bash 如何在具有头尾操作的文件中使用管道？

相关推荐

最近更新

标签