如何使用 Java 从网站检索 URL?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/359439/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 12:04:13  来源:igfitidea点击:

How do I retrieve a URL from a web site using Java?

javahttpurlconnection

提问by Johnny Maelstrom

I want to use HTTP GET and POST commands to retrieve URLs from a website and parse the HTML. How do I do this?

我想使用 HTTP GET 和 POST 命令从网站检索 URL 并解析 HTML。我该怎么做呢?

回答by Rob Hruska

You can use HttpURLConnectionin combination with URL.

您可以将HttpURLConnectionURL结合使用。

URL url = new URL("http://example.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();

InputStream stream = connection.getInputStream();
// read the contents using an InputStreamReader

回答by Johnny Maelstrom

The ticked/approved answer for this is from robhruska - thank you. This shows the most basic way to do it, it's simple with an understanding of what's necessary to do a simple URL connection. However, the longer term strategy would be to use HTTP Client for more advanced and feature rich ways to complete this task.

对此打勾/批准的答案来自 robhruska - 谢谢。这显示了最基本的方法,它很简单,了解执行简单 URL 连接的必要条件。但是,长期策略是使用HTTP 客户端以更高级和功能丰富的方式来完成此任务。

Thank you everyone, here's the quick answer again:

谢谢大家,这里再次快速回答:

URL url = new URL("http://example.com");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();

InputStream stream = connection.getInputStream();
// read the contents using an InputStreamReader

回答by kgiannakakis

The easiest way to do a GET is to use the built in java.net.URL. However, as mentioned, httpclient is the proper way to go, as it will allow you among others to handle redirects.

执行 GET 的最简单方法是使用内置的 java.net.URL。但是,如前所述,httpclient 是正确的方法,因为它允许您在其他人中处理重定向。

For parsing the html, you can use html parser.

要解析 html,您可以使用html parser

回答by Markus

I have used JTidyin a project and it worked quite well. A list of other parsers is here, but besides from JTidy I don't know any of them.

我在一个项目中使用过JTidy,效果很好。其他解析器的列表在这里,但除了 JTidy 我不知道其中任何一个。