使用自定义搜索以编程方式在 Python 中搜索 google

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37083058/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:47:18  来源:igfitidea点击:

Programmatically searching google in Python using custom search

pythongoogle-custom-search

提问by user2399453

I have a snippet of code using the pygoogle python module that allows me to programmatically search for some term in google succintly:

我有一段使用 pygoogle python 模块的代码片段,它允许我以编程方式在 google 中简洁地搜索某些术语:

 g = pygoogle(search_term)
 g.pages = 1
 results = g.get_urls()[0:10]

I just found out that this has been discontinued unfortunately and replaced by something called the google custom search. I looked at the other related questions on SO but didn't find anything I could use. I have two questions:

我刚刚发现不幸的是,这已经停止了,取而代之的是一种叫做谷歌自定义搜索的东西。我查看了关于 SO 的其他相关问题,但没有找到我可以使用的任何内容。我有两个问题:

1) Does google custom search allow me to do exactly what I am doing in the three lines above?

1) 谷歌自定义搜索是否允许我完全按照上面三行中的操作进行操作?

2) If yes - where can I find example code to do exactly what I am doing above? If no then what is the alternative to do what I did using pygoogle?

2)如果是 - 我在哪里可以找到示例代码来完成我在上面所做的事情?如果没有,那么我使用 pygoogle 所做的事情的替代方法是什么?

回答by mbdevpl

It is possible to do this. The setup is... not very straightforward, but the end result is that you can search the entire web from python with few lines of code.

有可能做到这一点。设置是......不是很简单,但最终的结果是你可以用几行代码从python搜索整个网络。

There are 3 main steps in total.

总共有3个主要步骤。

1st step: get Google API key

第一步:获取 Google API 密钥

The pygoogle's page states:

pygoogle的页面的状态:

Unfortunately, Google no longer supports the SOAP API for search, nor do they provide new license keys. In a nutshell, PyGoogle is pretty much dead at this point.

You can use their AJAX API instead. Take a look here for sample code: http://dcortesi.com/2008/05/28/google-ajax-search-api-example-python-code/

不幸的是,Google 不再支持用于搜索的 SOAP API,也不提供新的许可证密钥。简而言之,PyGoogle 在这一点上几乎已经死了。

您可以改用他们的 AJAX API。在此处查看示例代码:http: //dcortesi.com/2008/05/28/google-ajax-search-api-example-python-code/

... but you actually can't use AJAX API either. You have to get a Google API key. https://developers.google.com/api-client-library/python/guide/aaa_apikeysFor simple experimental use I suggest "server key".

...但实际上您也不能使用 AJAX API。您必须获得 Google API 密钥。https://developers.google.com/api-client-library/python/guide/aaa_apikeys对于简单的实验使用,我建议使用“服务器密钥”。

2nd step: setup Custom Search Engine so that you can search the entire web

第二步:设置自定义搜索引擎,以便您可以搜索整个网络

Indeed, the old API is not available. The best new API that is available is Custom Search. It seems to support only searching within specific domains, however, after following this SO answeryou can search the whole web:

事实上,旧的 API 不可用。可用的最佳新 API 是自定义搜索。它似乎只支持在特定域内搜索,但是,在遵循此 SO 答案后,您可以搜索整个网络:

  1. From the Google Custom Search homepage ( http://www.google.com/cse/), click Create a Custom Search Engine.
  2. Type a name and description for your search engine.
  3. Under Define your search engine, in the Sites to Search box, enter at least one valid URL (For now, just put www.anyurl.com to get past this screen. More on this later ).
  4. Select the CSE edition you want and accept the Terms of Service, then click Next. Select the layout option you want, and then click Next.
  5. Click any of the links under the Next steps section to navigate to your Control panel.
  6. In the left-hand menu, under Control Panel, click Basics.
  7. In the Search Preferences section, select Search the entire web but emphasize included sites.
  8. Click Save Changes.
  9. In the left-hand menu, under Control Panel, click Sites.
  10. Delete the site you entered during the initial setup process.
  1. 从 Google 自定义搜索主页 ( http://www.google.com/cse/),单击创建自定义搜索引擎。
  2. 键入搜索引擎的名称和说明。
  3. 在“定义您的搜索引擎”下的“要搜索的站点”框中,输入至少一个有效的 URL(目前,只需输入 www.anyurl.com 即可跳过此屏幕。稍后会详细介绍)。
  4. 选择您想要的 CSE 版本并接受服务条款,然后单击下一步。选择所需的布局选项,然后单击下一步。
  5. 单击下一步部分下的任何链接以导航到您的控制面板。
  6. 在左侧菜单中的“控制面板”下,单击“基本信息”。
  7. 在“搜索首选项”部分中,选择“搜索整个 Web 但强调包含的站点”。
  8. 单击保存更改。
  9. 在左侧菜单中的控制面板下,单击站点。
  10. 删除您在初始设置过程中输入的站点。

This approach is also recommended by Google: https://support.google.com/customsearch/answer/2631040

这种方法也是谷歌推荐的:https: //support.google.com/customsearch/answer/2631040

3rd step: install Google API client for Python

第 3 步:为 Python 安装 Google API 客户端

pip install google-api-python-client, more info here:

pip install google-api-python-client,更多信息在这里:

4th step (bonus): do the search

第 4 步(奖励):进行搜索

So, after setting this up, you can follow the code samples from few places:

因此,设置好后,您可以从几个地方遵循代码示例:

and end up with this:

并最终得到这个:

from googleapiclient.discovery import build
import pprint

my_api_key = "Google API key"
my_cse_id = "Custom Search Engine ID"

def google_search(search_term, api_key, cse_id, **kwargs):
    service = build("customsearch", "v1", developerKey=api_key)
    res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
    return res['items']

results = google_search(
    'stackoverflow site:en.wikipedia.org', my_api_key, my_cse_id, num=10)
for result in results:
    pprint.pprint(result)

After some tweaking you could write some functions that behave exactly like your snippet, but I'll skip this step here.

经过一些调整后,您可以编写一些行为与您的代码段完全相同的函数,但我将在这里跳过这一步。

回答by Amit Yungman

@mbdevpl's response helped me a lot, so all credit goes to them. But there have been a few changes in the UI, so here is an update:

@mbdevpl 的回应对我帮助很大,所以所有功劳都归功于他们。但是用户界面发生了一些变化,所以这里有一个更新:

A. Install google-api-python-client

A. 安装 google-api-python-client

  1. If you don't already have a Google account, sign up.
  2. If you have never created a Google APIs Console project, read the Managing Projects pageand create a project in the Google API Console.
  3. Installthe library.
  1. 如果您还没有 Google 帐户,请注册
  2. 如果您从未创建过 Google API 控制台项目,请阅读管理项目页面并在Google API 控制台中创建一个项目。
  3. 安装库。

B. To create an API key:

B. 创建 API 密钥:

  1. Navigate to the APIs & Services→Credentialspanel in Cloud Console.
  2. Select Create credentials, then select API keyfrom the drop-down menu.
  3. The API key createddialog box displays your newly created key.
  4. You now have an API_KEY
  1. 导航到Cloud Console 中的API 和服务→凭据面板。
  2. 选择创建凭据,然后从下拉菜单中选择API 密钥
  3. API密钥创建对话框显示新创建的关键。
  4. 你现在有一个API_KEY

C. Setup Custom Search Engine so you can search the entire web

C. 设置自定义搜索引擎,以便您可以搜索整个网络

  1. Create a custom search engine in this link.
  2. In Sites to search, add any valid URL (i.e. www.stackoverflow.com).
  3. That's all you have to fill up, the rest doesn't matter. In the left-side menu, click Edit search engine{your search engine name}Setup
  4. Set Search the entire webto ON.
  5. Remove the URL you added from the list of Sites to search.
  6. Under Search engine IDyou'll find the search-engine-ID.
  1. 此链接中创建自定义搜索引擎。
  2. 在要搜索的站点中,添加任何有效的 URL(即 www.stackoverflow.com)。
  3. 这就是你需要填写的全部内容,其他的都无所谓。在左侧菜单中,单击编辑搜索引擎{您的搜索引擎名称}设置
  4. 搜索整个网络设置为ON
  5. 要搜索站点列表中删除您添加的 URL 。
  6. 搜索引擎 ID 下,您将找到search-engine-ID

Search example

搜索示例

from googleapiclient.discovery import build

my_api_key = "AIbaSyAEY6egFSPeadgK7oS/54iQ_ejl24s4Ggc" #The API_KEY you acquired
my_cse_id = "012345678910111213141:abcdef10g2h" #The search-engine-ID you created


def google_search(search_term, api_key, cse_id, **kwargs):
    service = build("customsearch", "v1", developerKey=api_key)
    res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
    return res['items']


results = google_search('"god is a woman" "thank you next" "7 rings"', my_api_key, my_cse_id, num=10)
for result in results:
    print(result)

Important!on the first run, you might have to enable the API in your account. The error message should contain the link to enable the API in. It will be something like: https://console.developers.google.com/apis/api/customsearch.googleapis.com/overview?project={your project name}.

重要的!在第一次运行时,您可能需要在您的帐户中启用 API。错误消息应包含启用 API 的链接。它类似于:https: //console.developers.google.com/apis/api/customsearch.googleapis.com/overview?project ={your project name} .

You'll be asked to create a service name (It doesn't matter what it is), and give it Roles. I gave it Role Viewerand Service Usage Adminand it works.

您将被要求创建一个服务名称(它是什么无关紧要),并为其指定角色。我给了它角色查看器服务使用管理员,它工作正常。

回答by Marius Johan

Answer from 2020

2020年的回答

Google aren't providing any API anymore for some reason, but https://github.com/bisoncorps/search-engine-parseris developing a python package for scraping Google.

由于某种原因,谷歌不再提供任何 API,但https://github.com/bisoncorps/search-engine-parser正在开发一个用于抓取谷歌的 python 包。

Installation

安装

pip install search-engine-parser

Usage

用法

from search_engine_parser import GoogleSearch

def google(self, query):
    search_args = (query, 1)
    gsearch = GoogleSearch()
    gresults = gsearch.search(*search_args)
    return gresults['links']

google('Is it illegal to scrape google results')

I don't know how legal this is, but as long as you aren't commercializing your product I think you can get away with it. Besides Google haven't really sued anyone because of using their product, they have just banned their IP address.
For more information Is it ok to scrape data from Google results?

我不知道这有多合法,但只要您不将产品商业化,我认为您就可以逃脱惩罚。除了谷歌还没有因为使用他们的产品而真正起诉任何人,他们只是禁止了他们的 IP 地址。
如需更多信息,是否可以从 Google 结果中抓取数据?