bash Wget 不获取谷歌搜索结果

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29204103/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 12:36:21  来源:igfitidea点击:

Wget does not fetch google search results

bashwget

提问by anubhava

I noticed when running wget https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=fooand similar queries, I don't get the search results, but the google homepage.

我注意到在运行wget https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=foo和类似查询时,我没有得到搜索结果,而是 google 主页。

There seems to be some redirect within the google page. Does anyone know a fix to wgetso it would work?

谷歌页面内似乎有一些重定向。有谁知道修复方法,wget所以它会起作用吗?

回答by anubhava

You can use this curl commands to pull Google query results:

您可以使用此 curl 命令来拉取 Google 查询结果:

curl -sA "Chrome" -L 'http://www.google.com/search?hl=en&q=time' -o search.html

For using httpsURL:

使用https网址:

curl -k -sA "Chrome" -L 'https://www.google.com/search?hl=en&q=time' -o ssearch.html

-Aoption sets a custom user-agent Chromein request to Google.

-A选项设置自定义用户代理Chrome请求谷歌。

回答by Dolda2000

#q=foois your hint, as that's a fragment ID, which never gets sent to the server. I'm guessing you just took this URL from your browser URL-bar when using the live-search function. Since it is implemented with a lot of client-side magic, you cannot rely on it to work; try using Google with live search disabled instead. A URL pattern that seems to work looks like this: http://www.google.com/search?hl=en&q=foo.

#q=foo是你的提示,因为这是一个片段 ID,它永远不会被发送到服务器。我猜您在使用实时搜索功能时只是从浏览器的 URL 栏中获取了这个 URL。因为它是用很多客户端魔法实现的,所以你不能依赖它来工作;尝试在禁用实时搜索的情况下使用 Google。这似乎是工作类似如下的URL模式:http://www.google.com/search?hl=en&q=foo

However, I do notice that Google returns 403 Forbiddenwhen called na?vely with wget, indicating that they don't want that. You can easily get past it by setting some other user-agent string, but do consider all the implications before doing so on a regular basis.

但是,我确实注意到 Google403 Forbidden在用 调用时会返回wget,表明他们不想要那样。您可以通过设置其他一些用户代理字符串轻松解决它,但在定期执行此操作之前请务必考虑所有影响。