使用python请求库进行谷歌搜索
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22623798/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
google search with python requests library
提问by James
(I've tried looking but all of the other answers seem to be using urllib2)
(我试过寻找,但所有其他答案似乎都在使用 urllib2)
I've just started trying to use requests, but I'm still not very clear on how to send or request something additional from the page. For example, I'll have
我刚刚开始尝试使用请求,但我仍然不太清楚如何从页面发送或请求其他内容。例如,我会有
import requests
r = requests.get('http://google.com')
but I have no idea how to now, for example, do a google search using the search bar presented. I've read the quickstart guide but I'm not very familiar with HTML POST and the like, so it hasn't been very helpful.
但我现在不知道如何使用显示的搜索栏进行谷歌搜索。我已经阅读了快速入门指南,但我对 HTML POST 等不太熟悉,所以它并不是很有帮助。
Is there a clean and elegant way to do what I am asking?
有没有一种干净优雅的方式来完成我的要求?
采纳答案by Trimax
Request Overview
请求概述
The Google search request is a standard HTTP GET command. It includes a collection of parameters relevant to your queries. These parameters are included in the request URL as name=value pairs separated by ampersand (&) characters. Parameters include data like the search query and a unique CSE ID (cx) that identifies the CSE that is making the HTTP request. The WebSearch or Image Search service returns XML results in response to your HTTP requests.
Google 搜索请求是标准的 HTTP GET 命令。它包括与您的查询相关的参数集合。这些参数作为 name=value 对包含在请求 URL 中,由与号 (&) 字符分隔。参数包括搜索查询和唯一的 CSE ID (cx) 等数据,用于标识发出 HTTP 请求的 CSE。WebSearch 或图像搜索服务返回 XML 结果以响应您的 HTTP 请求。
First, you must get your CSE ID (cx parameter) at Control Panel of Custom Search Engine
首先,您必须在自定义搜索引擎的控制面板中获取您的 CSE ID(cx 参数)
Then, See the official Google Developers site for Custom Search.
然后,请参阅 Google Developers 官方网站以获取自定义搜索。
There are many examples like this:
有很多这样的例子:
http://www.google.com/search?
start=0
&num=10
&q=red+sox
&cr=countryCA
&lr=lang_fr
&client=google-csbe
&output=xml_no_dtd
&cx=00255077836266642015:u-scht7a-8i
And there are explained the list of parameters that you can use.
并解释了您可以使用的参数列表。
回答by abhishake
input:
输入:
import requests
def googleSearch(query):
with requests.session() as c:
url = 'https://www.google.co.in'
query = {'q': query}
urllink = requests.get(url, params=query)
print urllink.url
googleSearch('Linkin Park')
output:
输出:
https://www.google.co.in/?q=Linkin+Park
回答by Ben
import requests
from bs4 import BeautifulSoup
headers_Get = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
}
def google(q):
s = requests.Session()
q = '+'.join(q.split())
url = 'https://www.google.com/search?q=' + q + '&ie=utf-8&oe=utf-8'
r = s.get(url, headers=headers_Get)
soup = BeautifulSoup(r.text, "html.parser")
output = []
for searchWrapper in soup.find_all('h3', {'class':'r'}): #this line may change in future based on google's web page structure
url = searchWrapper.find('a')["href"]
text = searchWrapper.find('a').text.strip()
result = {'text': text, 'url': url}
output.append(result)
return output
Will return an array of google results in {'text': text, 'url': url} format. Top result url would be google('search query')[0]['url']
将以 {'text': text, 'url': url} 格式返回一组 google 结果。顶级结果网址将是google('search query')[0]['url']

