java 使用jsoup获取谷歌结果时出现403错误

Question

提问by lakshman

I'm trying to get Google results using the following code:

我正在尝试使用以下代码获取 Google 结果：

Document doc = con.connect("http://www.google.com/search?q=lakshman").timeout(5000).get();

But I get this exception:

但我得到这个例外：

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403,URL=http://www.google.com/search?q=lakshman

A 403 error means the server is forbidding access, but I can load this URL in a web browser just fine. Why does Jsoup get a 403 error?

403 错误意味着服务器禁止访问，但我可以在 Web 浏览器中加载此 URL 就好了。为什么 Jsoup 会出现 403 错误？

Answer 1

回答by Liang

You just need to add the UserAgent property to HTTP header as follows:

您只需要将 UserAgent 属性添加到 HTTP 标头中，如下所示：

Jsoup.connect(itemUrl)
     .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
     .get()

Answer 2

回答by Rowandish

Google doesn't allow robots, you couldn't use jsoup to connect google. You can use the Google Web Search API (Deprecated)but the number of requests you may make per day will be limited.

谷歌不允许机器人，你不能使用 jsoup 连接谷歌。您可以使用Google Web Search API（已弃用），但您每天可以发出的请求数量会受到限制。

Answer 3

回答by Phani Rahul

Actually, you can evade 403 error by just adding a user-agent

实际上，您可以通过添加用户代理来规避 403 错误

doc = Jsoup.connect(url).timeout(timeout)
                    .userAgent("Mozilla")

But that is against the google policy I think.

但这违反了我认为的谷歌政策。

EDIT: Google catches robots quicker than you think. You can however, use this as a temporary solution.

编辑：谷歌比你想象的更快地捕捉到机器人。但是，您可以将其用作临时解决方案。

Answer 4

回答by Mahdi-Malv

try this:

试试这个：

Document doc =con.connect("http://www.google.com/search?q=lakshman").ignoreHttpErrors(true).timeout(5000).get();

in case userAgent did not work Just like it didn't for me.

万一 userAgent 不起作用就像它不适合我一样。

Answer 5

回答by Oleg

In some cases you need to set a referrer. It helped in my case.

在某些情况下，您需要设置推荐人。它对我有帮助。

The full source here

完整来源在这里

    try{

        String strText = 
                Jsoup
                .connect("http://www.whatismyreferer.com")
                .referrer("http://www.google.com")
                .get()
                .text();

        System.out.println(strText);

    }catch(IOException ioe){
        System.out.println("Exception: " + ioe);
    }

Answer 6

回答by Java Enthusiast

Replace statement

替换语句

Document doc =con.connect("http://www.google.com/search?q=lakshman").timeout(5000).get();

with statement

带声明

Document doc=Jsoup.connect("http://www.google.com/search?q=lakshman").userAgent("Chrome").get();

java 使用jsoup获取谷歌结果时出现403错误

提问by lakshman

回答by Liang

回答by Rowandish

回答by Phani Rahul

回答by Mahdi-Malv

回答by Oleg

回答by Java Enthusiast

相关推荐

最近更新

标签

java 使用jsoup获取谷歌结果时出现403错误

提问by lakshman

回答by Liang

回答by Rowandish

回答by Phani Rahul

回答by Mahdi-Malv

回答by Oleg

回答by Java Enthusiast

相关推荐

java Spring Rest 服务中的可选请求头

java 用多行遍历字符串

java 使用 WebDriver 获取子元素

使用 Java 的 ReferenceQueue

相关推荐

最近更新

标签