java 使用jsoup获取谷歌结果时出现403错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14467459/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 16:23:35  来源:igfitidea点击:

403 error while getting the google result using jsoup

javajsouphttp-status-code-403

提问by lakshman

I'm trying to get Google results using the following code:

我正在尝试使用以下代码获取 Google 结果:

Document doc = con.connect("http://www.google.com/search?q=lakshman").timeout(5000).get();

But I get this exception:

但我得到这个例外:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403,URL=http://www.google.com/search?q=lakshman

A 403 error means the server is forbidding access, but I can load this URL in a web browser just fine. Why does Jsoup get a 403 error?

403 错误意味着服务器禁止访问,但我可以在 Web 浏览器中加载此 URL 就好了。为什么 Jsoup 会出现 403 错误?

回答by Liang

You just need to add the UserAgent property to HTTP header as follows:

您只需要将 UserAgent 属性添加到 HTTP 标头中,如下所示:

Jsoup.connect(itemUrl)
     .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
     .get()

回答by Rowandish

Google doesn't allow robots, you couldn't use jsoup to connect google. You can use the Google Web Search API (Deprecated)but the number of requests you may make per day will be limited.

谷歌不允许机器人,你不能使用 jsoup 连接谷歌。您可以使用Google Web Search API(已弃用),但您每天可以发出的请求数量会受到限制。

回答by Phani Rahul

Actually, you can evade 403 error by just adding a user-agent

实际上,您可以通过添加用户代理来规避 403 错误

doc = Jsoup.connect(url).timeout(timeout)
                    .userAgent("Mozilla")

But that is against the google policy I think.

但这违反了我认为的谷歌政策。

EDIT: Google catches robots quicker than you think. You can however, use this as a temporary solution.

编辑:谷歌比你想象的更快地捕捉到机器人。但是,您可以将其用作临时解决方案。

回答by Mahdi-Malv

try this:

试试这个:

Document doc =con.connect("http://www.google.com/search?q=lakshman").ignoreHttpErrors(true).timeout(5000).get();

in case userAgent did not work Just like it didn't for me.

万一 userAgent 不起作用就像它不适合我一样。

回答by Oleg

In some cases you need to set a referrer. It helped in my case.

在某些情况下,您需要设置推荐人。它对我有帮助。

The full source here

完整来源在这里

    try{

        String strText = 
                Jsoup
                .connect("http://www.whatismyreferer.com")
                .referrer("http://www.google.com")
                .get()
                .text();

        System.out.println(strText);

    }catch(IOException ioe){
        System.out.println("Exception: " + ioe);
    }

回答by Java Enthusiast

Replace statement

替换语句

Document doc =con.connect("http://www.google.com/search?q=lakshman").timeout(5000).get();

with statement

带声明

Document doc=Jsoup.connect("http://www.google.com/search?q=lakshman").userAgent("Chrome").get();