Java 从 URL 获取页面内容?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4216455/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 14:40:39  来源:igfitidea点击:

Get page content from URL?

javaurl

提问by tiendv

I want to get content of page from URL by this code :

我想通过以下代码从 URL 获取页面内容:

public static String getContentResult(URL url) throws IOException{

    InputStream in = url.openStream();
    StringBuffer sb = new StringBuffer();

    byte [] buffer = new byte[256];

    while(true){
        int byteRead = in.read(buffer);
        if(byteRead == -1)
            break;
        for(int i = 0; i < byteRead; i++){
            sb.append((char)buffer[i]);
        }
    }
    return sb.toString();
}

But with this URL : http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315i can't get Asbtract :Database management systems will continue to manage.....

但是有了这个 URL:http: //portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315 我无法获得 Asbtract :数据库管理系统将继续管理......

Can you give me solution for solve problem ? Thanks in advance

你能给我解决问题的解决方案吗?提前致谢

采纳答案by dacwe

Outputting the header of of the get request:

输出get请求的头部:

HTTP/1.1 302 Moved Temporarily
Connection: close
Date: Thu, 18 Nov 2010 15:35:24 GMT
Server: Microsoft-IIS/6.0
location: http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE
Content-Type: text/html; charset=UTF-8

This means that the server wants you to download the new locations address. So either you get the header directly from the UrlConnection and follow that link or you use HttpClientautomatically which automatically follow redirects. The code below is based on HttpClient:

这意味着服务器希望您下载新的位置地址。因此,您要么直接从 UrlConnection 获取标头并遵循该链接,要么自动使用HttpClient自动遵循重定向。下面的代码基于HttpClient

public class HttpTest {
    public static void main(String... args) throws Exception {

        System.out.println(readPage(new URL("http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315")));
    }

    private static String readPage(URL url) throws Exception {

        DefaultHttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet(url.toURI());
        HttpResponse response = client.execute(request);

        Reader reader = null;
        try {
            reader = new InputStreamReader(response.getEntity().getContent());

            StringBuffer sb = new StringBuffer();
            {
                int read;
                char[] cbuf = new char[1024];
                while ((read = reader.read(cbuf)) != -1)
                    sb.append(cbuf, 0, read);
            }

            return sb.toString();

        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

回答by Victor Sorokin

There's no "Database management..." on given url. Perhaps, it's loaded by javascript dynamically. You'll need to have more sophisticated application to download such content ;)

给定的 url 上没有“数据库管理...”。也许,它是由 javascript 动态加载的。您需要有更复杂的应用程序才能下载此类内容;)

回答by stacker

The content you're looking for is not included in this URL. Open your browser and view the source code. Instead many javascript files are loaded. I think the content is fetched later by AJAX calls. You would need to learn how the content is loaded.

您要查找的内容未包含在此 URL 中。打开浏览器并查看源代码。而是加载了许多 javascript 文件。我认为内容是稍后通过 AJAX 调用获取的。您需要了解内容是如何加载的。

The Firfox Plugin Firebug could be helpful for a more detaild analyse.

Firfox 插件 Firebug 可能有助于进行更详细的分析。

回答by user3111525

The url that you should be using is:

您应该使用的网址是:

http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE

Because the original url you posted (as mentioned by dacwe) sends redirect.

因为您发布的原始网址(如 dacwe 所述)发送重定向。