bash 管道 curl 输出到 grep

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36458128/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 14:28:33  来源:igfitidea点击:

Piping curl output into grep

bashsearchcurlgrepcygwin

提问by David Xie

Just a little disclaimer, I am not very familiar with programming so please excuse me if I'm using any terms incorrectly/in a confusing way.

只是一点免责声明,我对编程不是很熟悉,所以如果我错误地/以令人困惑的方式使用了任何术语,请原谅。

I want to be able to extract specific information from a webpage and tried doing this by piping the output of a curl function into grep. Oh and this is in cygwin if that matters.

我希望能够从网页中提取特定信息,并尝试通过将 curl 函数的输出传输到 grep 来执行此操作。哦,如果这很重要,这在 cygwin 中。

When just typing in

刚打字的时候

$ curl www.ncbi.nlm.nih.gov/gene/823951

The terminal prints the whole webpage in what I believe to be html. From here I thought I could just pipe this output into a grep function with whatever search term want with:

终端以我认为是 html 的格式打印整个网页。从这里开始,我想我可以将此输出通过管道传输到 grep 函数中,并使用任何搜索词:

  $ curl www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene Symbol"

But instead of printing the webpage at all, the terminal gives me:

但是终端根本没有打印网页,而是给了我:

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  142k    0  142k    0     0  41857      0 --:--:--  0:00:03 --:--:-- 42083

Can anyone explain why it does this/how I can search for specific lines of text in a webpage? I eventually want to compile information like gene names, types, and descriptions into a database, so I was hoping to export the results from the grep function into a text file after that.

谁能解释为什么这样做/我如何在网页中搜索特定的文本行?我最终想将基因名称、类型和描述等信息编译到数据库中,所以我希望之后将 grep 函数的结果导出到文本文件中。

Any help is extremely appreciated, thanks in advance!

非常感谢任何帮助,提前致谢!

回答by retrospectacus

Curl detects that it is not outputting to a terminal, and shows you the Progress Meter. You can suppress the progress meter with -s.

Curl 检测到它没有输出到终端,并向您显示进度表。您可以使用 -s 抑制进度表。

The HTML data is indeed being sent to grep. However that page does not contain the text "Gene Symbol". Grep is case-sensitive (unless invoked with -i) and you are looking for "Gene symbol".

HTML 数据确实被发送到 grep。然而,该页面不包含文本“基因符号”。Grep 区分大小写(除非使用 -i 调用)并且您正在寻找“基因符号”。

$ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene symbol"
    <dt class="noline"> Gene symbol </dt>

You probably also want the next line of HTML, which you can make grep output with the -A option:

您可能还需要下一行 HTML,您可以使用 -A 选项生成 grep 输出:

$ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep -A1 "Gene symbol"
    <dt class="noline"> Gene symbol </dt>
    <dd class="noline">AT3G47960</dd>

See man curland man grepfor more information about these and other options.

有关这些和其他选项的更多信息,请参阅man curlman grep