Java JSoup 获取 URL 错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36780047/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java JSoup error fetching URL
提问by PICKAB00
I'm creating an application which will enable me to fetch values from a specific website to the console. The value is from a <span>
element and I'm using JSoup.
我正在创建一个应用程序,它使我能够从特定网站获取值到控制台。该值来自一个<span>
元素,我正在使用JSoup。
My challenge has to do with this error:
我的挑战与这个错误有关:
Error fetching URL
获取网址时出错
Here is my Java code:
这是我的Java代码:
public class TestSl {
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect("https://stackoverflow.com/questions/11970938/java-html-parser-to-extract-specific-data").get();
Elements spans = doc.select("span[class=hidden-text]");
for (Element span: spans) {
System.out.println(span.text());
}
}
}
And here is the error on Console:
这是控制台上的错误:
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=Java Html parser to extract specific data?at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:216) at TestSl.main(TestSl.java:19)
线程“main”org.jsoup.HttpStatusException 中的异常:获取 URL 的 HTTP 错误。Status=403, URL= Java Html 解析器提取具体数据?在 org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590) 在 org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540) 在 org.jsoup.helper.HttpConnection.execute(HttpConnection) .java:227) 在 org.jsoup.helper.HttpConnection.get(HttpConnection.java:216) 在 TestSl.main(TestSl.java:19)
What am I doing wrong and how can I resolve it?
我做错了什么,我该如何解决?
回答by Jared Rummler
Set the user-agent header:
设置用户代理标头:
.userAgent("Mozilla")
Example:
例子:
Document document = Jsoup.connect("https://stackoverflow.com/questions/11970938/java-html-parser-to-extract-specific-data").userAgent("Mozilla").get();
Elements elements = document.select("span.hidden-text");
for (Element element : elements) {
System.out.println(element.text());
}
Stack Exchange
Inbox
Reputation and Badges
堆栈交换
收件箱
声誉和徽章
source: https://stackoverflow.com/a/7523425/1048340
来源:https: //stackoverflow.com/a/7523425/1048340
Perhaps this is related: https://meta.stackexchange.com/questions/277369/a-terms-of-service-update-restricting-companies-that-scrape-your-profile-informa
也许这是相关的:https: //meta.stackexchange.com/questions/277369/a-terms-of-service-update-restricting-companies-that-scrape-your-profile-informa