java 从 Apache Commons HTTP 请求获取页面内容

Question

提问by Chiggins

So I'm using Apache Commons HTTP to make a request to a webpage. I cannot for the life of me figure out how to get the actual content from the page, I can just get its header information. How can I get the actual content from it?

所以我使用 Apache Commons HTTP 向网页发出请求。我一生都无法弄清楚如何从页面中获取实际内容，我只能获取其标题信息。如何从中获取实际内容？

Here is my example code:

这是我的示例代码：

HttpGet request = new HttpGet("http://URL_HERE/");

HttpClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(request);

System.out.println("Response: " + response.toString());

Thanks!

谢谢！

Answer 1

回答by SecondSun24

BalusC's comment will work just fine. If you're using version 4 or newer of Apache HttpComponents, there is a convenience method you can use as well: EntityUtils.toString(HttpEntity);

BalusC 的评论将正常工作。如果您使用的是 Apache HttpComponents 的第 4 版或更新版本，您也可以使用一种方便的方法： EntityUtils.toString(HttpEntity);

Here's what it'll look like in your code:

这是您的代码中的样子：

HttpGet request = new HttpGet("http://URL_HERE/");
HttpClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();
String entityContents = EntityUtils.toString(entity);

I hope this is helpful to you.

我希望这对你有帮助。

Not sure if this is due to different versions, but I had to rewrite it like this:

不确定这是否是由于版本不同，但我不得不像这样重写它：

HttpGet request = new HttpGet("http://URL_HERE/");
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();
String entityContents = EntityUtils.toString(entity);

Answer 2

回答by BalusC

Use HttpResponse#getEntity()and then HttpEntity#getContent()to obtain it as an InputStream.

使用HttpResponse#getEntity()然后HttpEntity#getContent()将其作为InputStream.

InputStream input = response.getEntity().getContent();
// Read it the usual way.

Note that HttpClientisn't part of Apache Commons. It's part of Apache HttpComponents.

请注意，HttpClient不是Apache Commons 的一部分。它是Apache HttpComponents的一部分。

Answer 3

回答by Brian Roach

response.getEntity();

You really want to look at the Javadocs, the example for HttpClient shows you how to get at all the info in the response: http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/index.html

您真的很想查看 Javadocs，HttpClient 的示例向您展示了如何获取响应中的所有信息：http: //hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/index.html

Answer 4

回答by JeanK

If you just want the content of the URL, you can use the URL API, like this:

如果你只想要 URL 的内容，你可以使用 URL API，像这样：

import java.io.IOException;
import java.net.URL;
import java.util.Scanner;

public class URLTest {
    public static void main(String[] args) throws IOException {
        URL url = new URL("http://www.google.com.br");
        //here you have the input stream, so you can do whatever you want with it!
        Scanner in = new Scanner(url.openStream());
        in.nextLine();
    }
}

java 从 Apache Commons HTTP 请求获取页面内容

提问by Chiggins

回答by SecondSun24

回答by BalusC

回答by Brian Roach

回答by JeanK

相关推荐

最近更新

标签

java 从 Apache Commons HTTP 请求获取页面内容

提问by Chiggins

回答by SecondSun24

回答by BalusC

回答by Brian Roach

回答by JeanK

相关推荐

如何在我的 Java 代码中使用 LibSVM 和 Weka？

Java - 生成特定数字的随机范围而不重复这些数字 - 如何？

Java：用于存储无限游戏世界的坐标图的良好数据结构是什么？

java 使 javax 验证错误消息更具体

相关推荐

最近更新

标签