bash 管道 curl 输出到 grep
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36458128/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Piping curl output into grep
提问by David Xie
Just a little disclaimer, I am not very familiar with programming so please excuse me if I'm using any terms incorrectly/in a confusing way.
只是一点免责声明,我对编程不是很熟悉,所以如果我错误地/以令人困惑的方式使用了任何术语,请原谅。
I want to be able to extract specific information from a webpage and tried doing this by piping the output of a curl function into grep. Oh and this is in cygwin if that matters.
我希望能够从网页中提取特定信息,并尝试通过将 curl 函数的输出传输到 grep 来执行此操作。哦,如果这很重要,这在 cygwin 中。
When just typing in
刚打字的时候
$ curl www.ncbi.nlm.nih.gov/gene/823951
The terminal prints the whole webpage in what I believe to be html. From here I thought I could just pipe this output into a grep function with whatever search term want with:
终端以我认为是 html 的格式打印整个网页。从这里开始,我想我可以将此输出通过管道传输到 grep 函数中,并使用任何搜索词:
$ curl www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene Symbol"
But instead of printing the webpage at all, the terminal gives me:
但是终端根本没有打印网页,而是给了我:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 142k 0 142k 0 0 41857 0 --:--:-- 0:00:03 --:--:-- 42083
Can anyone explain why it does this/how I can search for specific lines of text in a webpage? I eventually want to compile information like gene names, types, and descriptions into a database, so I was hoping to export the results from the grep function into a text file after that.
谁能解释为什么这样做/我如何在网页中搜索特定的文本行?我最终想将基因名称、类型和描述等信息编译到数据库中,所以我希望之后将 grep 函数的结果导出到文本文件中。
Any help is extremely appreciated, thanks in advance!
非常感谢任何帮助,提前致谢!
回答by retrospectacus
Curl detects that it is not outputting to a terminal, and shows you the Progress Meter. You can suppress the progress meter with -s.
Curl 检测到它没有输出到终端,并向您显示进度表。您可以使用 -s 抑制进度表。
The HTML data is indeed being sent to grep. However that page does not contain the text "Gene Symbol". Grep is case-sensitive (unless invoked with -i) and you are looking for "Gene symbol".
HTML 数据确实被发送到 grep。然而,该页面不包含文本“基因符号”。Grep 区分大小写(除非使用 -i 调用)并且您正在寻找“基因符号”。
$ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene symbol"
<dt class="noline"> Gene symbol </dt>
You probably also want the next line of HTML, which you can make grep output with the -A option:
您可能还需要下一行 HTML,您可以使用 -A 选项生成 grep 输出:
$ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep -A1 "Gene symbol"
<dt class="noline"> Gene symbol </dt>
<dd class="noline">AT3G47960</dd>
See man curl
and man grep
for more information about these and other options.
有关这些和其他选项的更多信息,请参阅man curl
和man grep
。