如何在 Java 中获取 HTML
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31462/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to fetch HTML in Java
提问by pek
Without the use of any external library, what is the simplest way to fetch a website's HTML content into a String?
在不使用任何外部库的情况下,将网站的 HTML 内容提取到字符串中的最简单方法是什么?
采纳答案by pek
I'm currently using this:
我目前正在使用这个:
String content = null;
URLConnection connection = null;
try {
connection = new URL("http://www.google.com").openConnection();
Scanner scanner = new Scanner(connection.getInputStream());
scanner.useDelimiter("\Z");
content = scanner.next();
scanner.close();
}catch ( Exception ex ) {
ex.printStackTrace();
}
System.out.println(content);
But not sure if there's a better way.
但不确定是否有更好的方法。
回答by Justin Bennett
I just left this post in your other thread, though what you have above might work as well. I don't think either would be any easier than the other. The Apache packages can be accessed by just using import org.apache.commons.HttpClient
at the top of your code.
我只是把这篇文章留在了你的另一个帖子中,尽管你上面的内容可能也有效。我不认为任何一个会比另一个容易。只需import org.apache.commons.HttpClient
在代码顶部使用即可访问 Apache 包。
Edit: Forgot the link ;)
编辑:忘记链接;)
回答by Scott Bennett-McLeish
This has worked well for me:
这对我来说效果很好:
URL url = new URL(theURL);
InputStream is = url.openStream();
int ptr = 0;
StringBuffer buffer = new StringBuffer();
while ((ptr = is.read()) != -1) {
buffer.append((char)ptr);
}
Not sure at to whether the other solution(s) provided are any more efficient or not.
不确定提供的其他解决方案是否更有效。
回答by Scott Bennett-McLeish
Whilst not vanilla-Java, I'll offer up a simpler solution. Use Groovy ;-)
虽然不是 vanilla-Java,但我将提供一个更简单的解决方案。使用 Groovy ;-)
String siteContent = new URL("http://www.google.com").text
回答by dinesh kandpal
Its not library but a tool named curl generally installed in most of the servers or you can easily install in ubuntu by
它不是库,而是一个名为 curl 的工具,通常安装在大多数服务器中,或者您可以通过以下方式轻松安装在 ubuntu 中
sudo apt install curl
Then fetch any html page and store it to your local file like an example
然后获取任何 html 页面并将其存储到您的本地文件中,例如
curl https://www.facebook.com/ > fb.html
You will get the home page html.You can run it in your browser as well.
您将获得主页 html。您也可以在浏览器中运行它。