java 从 Apache Commons HTTP 请求获取页面内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5240241/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 10:09:41  来源:igfitidea点击:

Get page content from Apache Commons HTTP Request

javahttpapache-commons

提问by Chiggins

So I'm using Apache Commons HTTP to make a request to a webpage. I cannot for the life of me figure out how to get the actual content from the page, I can just get its header information. How can I get the actual content from it?

所以我使用 Apache Commons HTTP 向网页发出请求。我一生都无法弄清楚如何从页面中获取实际内容,我只能获取其标题信息。如何从中获取实际内容?

Here is my example code:

这是我的示例代码:

HttpGet request = new HttpGet("http://URL_HERE/");

HttpClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(request);

System.out.println("Response: " + response.toString());

Thanks!

谢谢!

回答by SecondSun24

BalusC's comment will work just fine. If you're using version 4 or newer of Apache HttpComponents, there is a convenience method you can use as well: EntityUtils.toString(HttpEntity);

BalusC 的评论将正常工作。如果您使用的是 Apache HttpComponents 的第 4 版或更新版本,您也可以使用一种方便的方法: EntityUtils.toString(HttpEntity);

Here's what it'll look like in your code:

这是您的代码中的样子:

HttpGet request = new HttpGet("http://URL_HERE/");
HttpClient httpClient = new DefaultHttpClient();
HttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();
String entityContents = EntityUtils.toString(entity);

I hope this is helpful to you.

我希望这对你有帮助。

Not sure if this is due to different versions, but I had to rewrite it like this:

不确定这是否是由于版本不同,但我不得不像这样重写它:

HttpGet request = new HttpGet("http://URL_HERE/");
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();
String entityContents = EntityUtils.toString(entity);

回答by BalusC

Use HttpResponse#getEntity()and then HttpEntity#getContent()to obtain it as an InputStream.

使用HttpResponse#getEntity()然后HttpEntity#getContent()将其作为InputStream.

InputStream input = response.getEntity().getContent();
// Read it the usual way.

Note that HttpClientisn't part of Apache Commons. It's part of Apache HttpComponents.

请注意,HttpClient不是Apache Commons 的一部分。它是Apache HttpComponents的一部分。

回答by Brian Roach

response.getEntity();

You really want to look at the Javadocs, the example for HttpClient shows you how to get at all the info in the response: http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/index.html

您真的很想查看 Javadocs,HttpClient 的示例向您展示了如何获取响应中的所有信息:http: //hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/index.html

回答by JeanK

If you just want the content of the URL, you can use the URL API, like this:

如果你只想要 URL 的内容,你可以使用 URL API,像这样:

import java.io.IOException;
import java.net.URL;
import java.util.Scanner;

public class URLTest {
    public static void main(String[] args) throws IOException {
        URL url = new URL("http://www.google.com.br");
        //here you have the input stream, so you can do whatever you want with it!
        Scanner in = new Scanner(url.openStream());
        in.nextLine();
    }
}