如何以编程方式搜索 Google Java API

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3727662/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 03:58:10  来源:igfitidea点击:

How can you search Google Programmatically Java API

javagoogle-search-api

提问by Dan

Does anyone know if and how it is possible to search Google programmatically - especially if there is a Java API for it?

有谁知道是否以及如何以编程方式搜索 Google - 特别是如果有 Java API 的话?

采纳答案by BalusC

Some facts:

一些事实:

  1. Google offers a public search webservice API which returns JSON: http://ajax.googleapis.com/ajax/services/search/web. Documentation here

  2. Java offers java.net.URLand java.net.URLConnectionto fire and handle HTTP requests.

  3. JSON can in Java be converted to a fullworthy Javabean object using an arbitrary Java JSON API. One of the best is Google Gson.

  1. Google 提供了一个公共搜索网络服务 API,它返回JSONhttp: //ajax.googleapis.com/ajax/services/search/web文档在这里

  2. Java 提供java.net.URLjava.net.URLConnection来触发和处理 HTTP 请求。

  3. 可以使用任意 Java JSON API 将 Java 中的 JSON 转换为完整的 Javabean 对象。最好的之一是Google Gson

Now do the math:

现在做数学:

public static void main(String[] args) throws Exception {
    String google = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=";
    String search = "stackoverflow";
    String charset = "UTF-8";

    URL url = new URL(google + URLEncoder.encode(search, charset));
    Reader reader = new InputStreamReader(url.openStream(), charset);
    GoogleResults results = new Gson().fromJson(reader, GoogleResults.class);

    // Show title and URL of 1st result.
    System.out.println(results.getResponseData().getResults().get(0).getTitle());
    System.out.println(results.getResponseData().getResults().get(0).getUrl());
}

With this Javabean class representing the most important JSON data as returned by Google (it actually returns more data, but it's left up to you as an exercise to expand this Javabean code accordingly):

使用此 Javabean 类表示 Google 返回的最重要的 JSON 数据(它实际上返回更多数据,但留给您作为练习来相应地扩展此 Javabean 代码):

public class GoogleResults {

    private ResponseData responseData;
    public ResponseData getResponseData() { return responseData; }
    public void setResponseData(ResponseData responseData) { this.responseData = responseData; }
    public String toString() { return "ResponseData[" + responseData + "]"; }

    static class ResponseData {
        private List<Result> results;
        public List<Result> getResults() { return results; }
        public void setResults(List<Result> results) { this.results = results; }
        public String toString() { return "Results[" + results + "]"; }
    }

    static class Result {
        private String url;
        private String title;
        public String getUrl() { return url; }
        public String getTitle() { return title; }
        public void setUrl(String url) { this.url = url; }
        public void setTitle(String title) { this.title = title; }
        public String toString() { return "Result[url:" + url +",title:" + title + "]"; }
    }

}

See also:

也可以看看:



Updatesince November 2010 (2 months after the above answer), the public search webservice has become deprecated(and the last day on which the service was offered was September 29, 2014). Your best bet is now querying http://www.google.com/searchdirectly along with a honest user agent and then parse the result using a HTML parser. If you omit the user agent, then you get a 403 back. If you're lying in the user agent and simulate a web browser (e.g. Chrome or Firefox), then you get a way much larger HTML response back which is a waste of bandwidth and performance.

更新自2010年11月(上面的回答后2个月),公共搜索web服务已成为过时(并在其上提供的服务的最后一天是2014年9月29日)。现在最好的办法是直接与诚实的用户代理一起查询http://www.google.com/search,然后使用HTML 解析器解析结果。如果省略用户代理,则会返回 403。如果您在用户代理中模拟 Web 浏览器(例如 Chrome 或 Firefox),那么您会得到更大的 HTML 响应,这会浪费带宽和性能。

Here's a kickoff example using Jsoupas HTML parser:

这是一个使用Jsoup作为 HTML 解析器的启动示例:

String google = "http://www.google.com/search?q=";
String search = "stackoverflow";
String charset = "UTF-8";
String userAgent = "ExampleBot 1.0 (+http://example.com/bot)"; // Change this to your company's name and bot homepage!

Elements links = Jsoup.connect(google + URLEncoder.encode(search, charset)).userAgent(userAgent).get().select(".g>.r>a");

for (Element link : links) {
    String title = link.text();
    String url = link.absUrl("href"); // Google returns URLs in format "http://www.google.com/url?q=<url>&sa=U&ei=<someKey>".
    url = URLDecoder.decode(url.substring(url.indexOf('=') + 1, url.indexOf('&')), "UTF-8");

    if (!url.startsWith("http")) {
        continue; // Ads/news/etc.
    }

    System.out.println("Title: " + title);
    System.out.println("URL: " + url);
}

回答by Manuel Selva

In the Terms of Service of googlewe can read:

谷歌的服务条款中,我们可以阅读:

5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) and shall ensure that you comply with the instructions set out in any robots.txt file present on the Services.

5.3 您同意不通过谷歌提供的界面以外的任何方式访问(或试图访问)任何服务,除非您在与谷歌的单独协议中得到明确允许。您明确同意不通过任何自动化方式(包括使用脚本或网络爬虫)访问(或试图访问)任何服务,并应确保您遵守服务中存在的任何 robots.txt 文件中规定的说明.

So I guess the answer is No. More over the SOAP APIis no longer available

所以我想答案是否定的。更多关于SOAP API 的内容不再可用

回答by Sai Sunder

Indeed there is an API to search google programmatically. The API is called google custom search. For using this API, you will need an Google Developer API key and a cx key. A simple procedure for accessing google search from java program is explained in my blog.

确实有一个 API 可以以编程方式搜索谷歌。该 API 称为 google 自定义搜索。要使用此 API,您需要一个 Google Developer API 密钥和一个 cx 密钥。我的博客中解释了从java程序访问谷歌搜索的简单过程。

Now dead, here is the Wayback Machine link.

现在死了,这里是Wayback Machine 链接

回答by Alex Fedulov

Google TOS have been relaxed a bit in April 2014. Now it states:

Google TOS 在 2014 年 4 月放宽了一点。现在它声明:

"Don't misuse our Services. For example, don't interfere with our Services or try to access them using a method other than the interface and the instructions that we provide."

“不要滥用我们的服务。例如,不要干扰我们的服务或尝试使用我们提供的界面和说明以外的方法访问它们。”

So the passage about "automated means" and scripts is gone now. It evidently still is not the desired (by google) way of accessing their services, but I think it is now formally open to interpretation of what exactly an "interface" is and whether it makes any difference as of how exactly returned HTML is processed (rendered or parsed). Anyhow, I have written a Java convenience library and it is up to you to decide whether to use it or not:

所以关于“自动化手段”和脚本的段落现在已经消失了。它显然仍然不是访问他们服务的理想方式(通过谷歌),但我认为现在正式开放解释“接口”究竟是什么以及它是否对返回的 HTML 的处理方式有任何影响(呈现或解析)。无论如何,我已经编写了一个 Java 便利库,由您决定是否使用它:

https://github.com/afedulov/google-web-search

https://github.com/afedulov/google-web-search

回答by Stan Smulders

In light of those TOS alterations last year we built an API that gives access to Google's search. It was for our own use only but after some requests we decided to open it up. We're planning to add additional search engines in the future!

鉴于去年的 TOS 更改,我们构建了一个 API,可以访问 Google 的搜索。它仅供我们自己使用,但经过一些要求,我们决定将其开放。我们计划在未来添加其他搜索引擎!

Should anyone be looking for an easy way to implement / acquire search results you are free to sign up and give the REST API a try: https://searchapi.io

如果有人正在寻找一种简单的方法来实现/获取搜索结果,您可以免费注册并尝试使用 REST API:https: //searchapi.io

It returns JSON results and should be easy enough to implement with the detailed docs.

它返回 JSON 结果,应该很容易用详细的文档来实现。

It's a shame that Bing and Yahoo are miles ahead on Google in this regard. Their APIs aren't cheap, but at least available.

遗憾的是,Bing 和雅虎在这方面领先于 Google。他们的 API 并不便宜,但至少是可用的。

回答by Petter Friberg

To search google using API you should use Google Custom Search, scraping web page is not allowed

要使用 API 搜索 google,您应该使用Google Custom Search不允许抓取网页

In java you can use CustomSearch API Client Library for Java

在 Java 中,您可以使用CustomSearch API Client Library for Java

The maven dependency is:

Maven 依赖项是:

<dependency>
    <groupId>com.google.apis</groupId>
    <artifactId>google-api-services-customsearch</artifactId>
    <version>v1-rev57-1.23.0</version>
</dependency> 

Example code searching using Google CustomSearch API Client Library

使用 Google CustomSearch API 客户端库进行搜索的示例代码

public static void main(String[] args) throws GeneralSecurityException, IOException {

    String searchQuery = "test"; //The query to search
    String cx = "002845322276752338984:vxqzfa86nqc"; //Your search engine

    //Instance Customsearch
    Customsearch cs = new Customsearch.Builder(GoogleNetHttpTransport.newTrustedTransport(), HymansonFactory.getDefaultInstance(), null) 
                   .setApplicationName("MyApplication") 
                   .setGoogleClientRequestInitializer(new CustomsearchRequestInitializer("your api key")) 
                   .build();

    //Set search parameter
    Customsearch.Cse.List list = cs.cse().list(searchQuery).setCx(cx); 

    //Execute search
    Search result = list.execute();
    if (result.getItems()!=null){
        for (Result ri : result.getItems()) {
            //Get title, link, body etc. from search
            System.out.println(ri.getTitle() + ", " + ri.getLink());
        }
    }

}

As you can see you will need to request an api keyand setup an own search engine id, cx.

如您所见,您需要请求一个 api 密钥设置自己的搜索引擎 ID, cx

Note that you can search the whole web by selecting "Search entire web" on basic tab settings during setup of cx, but results will not be exactly the same as a normal browser google search.

请注意,在设置 cx 期间,您可以通过在基本选项卡设置中选择“搜索整个网络”来搜索整个网络,但结果将与普通浏览器 google 搜索不完全相同。

Currently (date of answer) you get 100 api calls per day for free, then google like to share your profit.

目前(回答日期)您每天可以免费获得 100 次 api 调用,然后 google 喜欢分享您的利润。

回答by Hartator

As an alternative to BalusC answer as it has been deprecated and you have to use proxies, you can use this package. Code sample:

作为 BalusC 答案的替代方案,因为它已被弃用并且您必须使用代理,您可以使用此包。代码示例:

Map<String, String> parameter = new HashMap<>();
parameter.put("q", "Coffee");
parameter.put("location", "Portland");
GoogleSearchResults serp = new GoogleSearchResults(parameter);

JsonObject data = serp.getJson();
JsonArray results = (JsonArray) data.get("organic_results");
JsonObject first_result = results.get(0).getAsJsonObject();
System.out.println("first coffee: " + first_result.get("title").getAsString());

Library on GitHub

GitHub 上的

回答by Prashanth

Just an alternative. Searching google and parsing the results can also be done in a generic way using any HTML Parser such as Jsoup in Java. Following is the link to the mentioned example.

只是一个替代方案。搜索 google 并解析结果也可以使用任何 HTML 解析器(例如 Java 中的 Jsoup)以通用方式完成。以下是上述示例的链接。

https://www.codeforeach.com/java/example-how-to-search-google-using-java

https://www.codeforeach.com/java/example-how-to-search-google-using-java