如何在 Java 中获取 HTML

Question

提问by pek

Without the use of any external library, what is the simplest way to fetch a website's HTML content into a String?

在不使用任何外部库的情况下，将网站的 HTML 内容提取到字符串中的最简单方法是什么？

Answer 1

采纳答案by pek

I'm currently using this:

我目前正在使用这个：

String content = null;
URLConnection connection = null;
try {
  connection =  new URL("http://www.google.com").openConnection();
  Scanner scanner = new Scanner(connection.getInputStream());
  scanner.useDelimiter("\Z");
  content = scanner.next();
  scanner.close();
}catch ( Exception ex ) {
    ex.printStackTrace();
}
System.out.println(content);

But not sure if there's a better way.

但不确定是否有更好的方法。

Answer 2

回答by Justin Bennett

I just left this post in your other thread, though what you have above might work as well. I don't think either would be any easier than the other. The Apache packages can be accessed by just using import org.apache.commons.HttpClientat the top of your code.

我只是把这篇文章留在了你的另一个帖子中，尽管你上面的内容可能也有效。我不认为任何一个会比另一个容易。只需import org.apache.commons.HttpClient在代码顶部使用即可访问 Apache 包。

Edit: Forgot the link ;)

编辑：忘记链接;)

Answer 3

回答by Scott Bennett-McLeish

This has worked well for me:

这对我来说效果很好：

URL url = new URL(theURL);
InputStream is = url.openStream();
int ptr = 0;
StringBuffer buffer = new StringBuffer();
while ((ptr = is.read()) != -1) {
    buffer.append((char)ptr);
}

Not sure at to whether the other solution(s) provided are any more efficient or not.

不确定提供的其他解决方案是否更有效。

Answer 4

回答by Scott Bennett-McLeish

Whilst not vanilla-Java, I'll offer up a simpler solution. Use Groovy ;-)

虽然不是 vanilla-Java，但我将提供一个更简单的解决方案。使用 Groovy ;-)

String siteContent = new URL("http://www.google.com").text

Answer 5

回答by dinesh kandpal

Its not library but a tool named curl generally installed in most of the servers or you can easily install in ubuntu by

它不是库，而是一个名为 curl 的工具，通常安装在大多数服务器中，或者您可以通过以下方式轻松安装在 ubuntu 中

sudo apt install curl

Then fetch any html page and store it to your local file like an example

然后获取任何 html 页面并将其存储到您的本地文件中，例如

curl https://www.facebook.com/ > fb.html

You will get the home page html.You can run it in your browser as well.

您将获得主页 html。您也可以在浏览器中运行它。

如何在 Java 中获取 HTML

提问by pek

采纳答案by pek

回答by Justin Bennett

回答by Scott Bennett-McLeish

回答by Scott Bennett-McLeish

回答by dinesh kandpal

相关推荐

最近更新

标签

如何在 Java 中获取 HTML

提问by pek

采纳答案by pek

回答by Justin Bennett

回答by Scott Bennett-McLeish

回答by Scott Bennett-McLeish

回答by dinesh kandpal

相关推荐

Java 如何使用 JUnit 测试类的验证注释？

Java Swing 的 IDE

Java 通过 Spring MVC 在 REST 服务中将对象转换为 JSON

Java 为什么枚举不可迭代？

相关推荐

最近更新

标签