Linux 如何在shell变量中获取网页的内容?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3742983/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 23:35:33  来源:igfitidea点击:

How to get the contents of a webpage in a shell variable?

linuxbashshellwget

提问by Aillyn

In Linux how can I fetch an URL and get its contents in a variable in shell script?

在 Linux 中,如何获取 URL 并在 shell 脚本的变量中获取其内容?

采纳答案by codaddict

You can use wgetcommand to download the page and read it into a variable as:

您可以使用wget命令下载页面并将其读入变量,如下所示:

content=$(wget google.com -q -O -)
echo $content

We use the -Ooption of wgetwhich allows us to specify the name of the file into which wgetdumps the page contents. We specify -to get the dump onto standard output and collect that into the variable content. You can add the -qquiet option to turn off's wget output.

我们使用which的-O选项wget允许我们指定wget将页面内容转储到的文件的名称。我们指定-将转储到标准输出并将其收集到变量中content。您可以添加-q安静选项来关闭 wget 输出。

You can use the curlcommand for this aswell as:

您也可以为此使用curl命令:

content=$(curl -L google.com)
echo $content

We need to use the -Loption as the page we are requesting might have moved. In which case we need to get the page from the new location. The -Lor --locationoption helps us with this.

我们需要使用该-L选项,因为我们请求的页面可能已经移动。在这种情况下,我们需要从新位置获取页面。该-L--location选项帮助我们与此有关。

回答by Colin Hebert

There is the wgetcommand or the curl.

wget命令或curl.

You can now use the file you downloaded with wget. Or you can handle a stream with curl.

您现在可以使用通过 wget 下载的文件。或者您可以使用 curl 处理流。



Resources :

资源 :

回答by Jim Lewis

content=`wget -O - $url`

回答by Giacomo

You can use curlor wgetto retrieve the raw data, or you can use w3m -dumpto have a nice text representation of a web page.

您可以使用curlwget检索原始数据,或者您可以使用w3m -dump网页的漂亮文本表示。

$ foo=$(w3m -dump http://www.example.com/); echo $foo
You have reached this web page by typing "example.com", "example.net","example.org" or "example.edu" into your web browser. These domain names are reserved for use in documentation and are not available for registration. See RFC 2606, Section 3.

回答by julianvdb

There are many ways to get a page from the command line... but it also depends if you want the code source or the page itself:

有很多方法可以从命令行获取页面……但这也取决于您想要代码源还是页面本身:

If you need the code source:

如果您需要代码源:

with curl:

卷曲:

curl $url

with wget:

使用 wget:

wget -O - $url

but if you want to get what you can see with a browser, lynx can be useful:

但是如果你想获得你可以用浏览器看到的东西,lynx 会很有用:

lynx -dump $url

I think you can find so many solutions for this little problem, maybe you should read all man pages for those commands. And don't forget to replace $urlby your URL :)

我认为您可以为这个小问题找到很多解决方案,也许您应该阅读这些命令的所有手册页。并且不要忘记替换$url为您的 URL :)

Good luck :)

祝你好运 :)

回答by ephemient

If you have LWPinstalled, it provides a binary simply named "GET".

如果您安装了LWP,它会提供一个名为“ GET”的二进制文件。

$ GET http://example.com
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
  <META http-equiv="Content-Type" content="text/html; charset=utf-8">
  <TITLE>Example Web Page</TITLE>
</HEAD> 
<body>  
<p>You have reached this web page by typing &quot;example.com&quot;,
&quot;example.net&quot;,&quot;example.org&quot
  or &quot;example.edu&quot; into your web browser.</p>
<p>These domain names are reserved for use in documentation and are not available 
  for registration. See <a href="http://www.rfc-editor.org/rfc/rfc2606.txt">RFC 
  2606</a>, Section 3.</p>
</BODY>
</HTML>

wget -O-, curl, and lynx -sourcebehave similarly.

wget -O-, curl, 和lynx -source行为类似。