使用 Python 在 Google 中搜索
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38635419/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Searching in Google with Python
提问by Yarden
I want to search a text in Google using a python script and return the name, description and URL for each result. I'm currently using this code:
我想使用 python 脚本在 Google 中搜索文本并返回每个结果的名称、描述和 URL。我目前正在使用此代码:
from google import search
ip=raw_input("What would you like to search for? ")
for url in search(ip, stop=20):
print(url)
This returns only the URL's. How can I return the name and description for each URL?
这仅返回 URL。如何返回每个 URL 的名称和描述?
采纳答案by Yarden
Not exatcly what I was looking for, but I found myself a nice solution for now (I might edit this if I will able to make this better). I combined searching in Google like I did (returning only URL) and the Beautiful Soup package for parsing HTML pages:
不是我一直在寻找的,但我现在发现自己是一个不错的解决方案(如果我能够使它变得更好,我可能会编辑它)。我像以前一样在 Google 中搜索(仅返回 URL)和用于解析 HTML 页面的 Beautiful Soup 包:
from google import search
import urllib
from bs4 import BeautifulSoup
def google_scrape(url):
thepage = urllib.urlopen(url)
soup = BeautifulSoup(thepage, "html.parser")
return soup.title.text
i = 1
query = 'search this'
for url in search(query, stop=10):
a = google_scrape(url)
print str(i) + ". " + a
print url
print " "
i += 1
This gives me a list of the title of pages and the link.
这给了我一个页面标题和链接的列表。
And another great solutions:
另一个很棒的解决方案:
from google import search
import requests
for url in search(ip, stop=10):
r = requests.get(url)
title = everything_between(r.text, '<title>', '</title>')
回答by Jokab
I assume you are using this library by Mario Vilasbecause of the stop=20
argument which appears in his code. It seems like this library is not able to return anything but the URLs, making it horribly undeveloped. As such, what you want to do is not possible with the library you are currently using.
我假设您正在使用Mario Vilas 的这个库,因为stop=20
他的代码中出现了这个论点。似乎这个库除了 URL 之外什么都不能返回,这使得它非常不成熟。因此,您当前使用的库无法执行您想要执行的操作。
I would suggest you instead use abenassi/Google-Search-API. Then you can simply do:
我建议您改用abenassi/Google-Search-API。然后你可以简单地做:
from google import google
num_page = 3
search_results = google.search("This is my query", num_page)
for result in search_results:
print(result.description)
回答by Piyush Rumao
Most of them I tried using, but didn't work out for me or gave errors like search module not found despite importing packages. Or I did work out with selenium web driverand it works great if used with Firefoxor chromeor Phantom web browser, but still I felt it was a bit slow in terms of execution time, as it queried browser first and then returned search result.
我尝试使用它们中的大多数,但对我来说没有用,或者尽管导入了包,但出现了诸如找不到搜索模块之类的错误。或者我确实使用了selenium web 驱动程序,如果与Firefox或chrome或Phantom web 浏览器一起使用,效果很好,但我仍然觉得它在执行时间方面有点慢,因为它先查询浏览器,然后返回搜索结果。
So I thought of using google api and it works amazingly quick and returns results accurately.
所以我想到了使用 google api,它的工作速度非常快,并且可以准确地返回结果。
Before I share the code here are few quick tips to follow:-
在我在这里分享代码之前,请遵循以下几个快速提示:-
- Register on Google Api to get a Google Api key (free version)
- Now search for Google Custom Search and set up your free account to get a custom search id
- Now add this package(google-api-python-client) in your python project(can be done by writing !pip install google-api-python-client )
- 在 Google Api 上注册以获取 Google Api 密钥(免费版)
- 现在搜索 Google 自定义搜索并设置您的免费帐户以获取自定义搜索 ID
- 现在在你的 python 项目中添加这个包(google-api-python-client)(可以通过编写 !pip install google-api-python-client 来完成)
That is it and all you have to do now is run this code:-
就是这样,您现在要做的就是运行此代码:-
from googleapiclient.discovery import build
my_api_key = "your API KEY TYPE HERE"
my_cse_id = "YOUR CUSTOM SEARCH ENGINE ID TYPE HERE"
def google_search(search_term, api_key, cse_id, **kwargs):
service = build("customsearch", "v1", developerKey=api_key)
res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
return res['items']
results= google_search("YOUR SEARCH QUERY HERE",my_api_key,my_cse_id,num=10)
for result in results:
print(result["link"])
回答by Hartator
You can also use a third party service like Serp APIthat is a Google search engine results. It solves the issues of having to rent proxies and parsing the HTML results. JSON output is particularly rich.
您还可以使用第三方服务,例如作为 Google 搜索引擎结果的Serp API。它解决了必须租用代理和解析 HTML 结果的问题。JSON 输出特别丰富。
It's easy to integrate with Python:
与 Python 集成很容易:
from lib.google_search_results import GoogleSearchResults
params = {
"q" : "Coffee",
"location" : "Austin, Texas, United States",
"hl" : "en",
"gl" : "us",
"google_domain" : "google.com",
"api_key" : "demo",
}
query = GoogleSearchResults(params)
dictionary_results = query.get_dictionary()
GitHub: https://github.com/serpapi/google-search-results-python
GitHub: https://github.com/serpapi/google-search-results-python
回答by Strange
Usually, you cannot use google search function from python by importing google package in python3. but you can use it in python2.
通常,您无法通过在 python3.x 中导入 google 包来使用 python 中的 google 搜索功能。但是你可以在python2中使用它。
Even by using the requests.get(url+query) the scrapping won't perform because google prevents scraping by redirecting it to captcha page.
即使使用 requests.get(url+query) 也不会执行抓取,因为谷歌通过将其重定向到验证码页面来防止抓取。
Possible ways:
可能的方法:
- You can write code in python2
- If you want to write it in python3, then make 2 files and retrieve search results from python2 script.
- If found difficult, the best way is to use Google Colab or Jupyter Notebook with python3 runtime. You won't get any error.
- 你可以在python2中编写代码
- 如果你想在python3中编写它,那么制作2个文件并从python2脚本中检索搜索结果。
- 如果觉得困难,最好的方法是使用 Google Colab 或 Jupyter Notebook 和 python3 运行时。你不会得到任何错误。