从 URL 连接读取 Java

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5371943/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 10:48:10  来源:igfitidea点击:

Reading from a URL Connection Java

javaurlhtml-parsingurlconnectiondatainputstream

提问by Penny

I'm trying to read html code from a URL Connection. In one case the html file I'm trying to read includes 5 line breaks before the actual doc type declaration. In this case the input reader throws an exception for EOF.

我正在尝试从 URL 连接读取 html 代码。在一种情况下,我试图读取的 html 文件在实际的 doc 类型声明之前包含 5 个换行符。在这种情况下,输入读取器会抛出 EOF 异常。

URL pageUrl = 
    new URL(
        "http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html"
    );

URLConnection getConn = pageUrl.openConnection();
getConn.connect();
DataInputStream dis = new DataInputStream(getConn.getInputStream());
//some read method here

Has anyone ran into a problem like this?

有没有人遇到过这样的问题?

URL pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
DataInputStream dis = new DataInputStream(getConn.getInputStream());
String urlData = "";
while ((urlData = dis.readUTF()) != null)
    System.out.println(urlData);

//exception thrown

//抛出异常

java.io.EOFException at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323) at java.io.DataInputStream.readUTF(DataInputStream.java:572) at java.io.DataInputStream.readUTF(DataInputStream.java:547)

在 java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323) 在 java.io.DataInputStream.readUTF(DataInputStream.java:572) 在 java.io.DataInputStream.readUTF(DataInputStream.java:547) 的 java.io.EOFException

in the case of bufferedreader, it just responds null and doesn't continue

在 bufferedreader 的情况下,它只响应 null 并且不继续

pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(getConn.getInputStream()));
String urlData = "";
while(true)
     urlData = br.readLine();
     System.out.println(urlData);

outputs null

输出空

回答by seh

You're using DataInputStreamto read data that wasn't encoded using DataOutputStream. Examine the documented behavior for your call to DataInputStream#readUtf(); it first reads two bytesto form a 16-bit integer, indicating the number of bytes that follow comprising the UTF-encoded string. The data you're reading from the HTTP server is not encoded in this format.

DataInputStream用于读取未使用DataOutputStream. 检查记录的行为以调用DataInputStream#readUtf(); 它首先读取两个字节以形成一个 16 位整数,指示后面包含 UTF 编码字符串的字节数。您从 HTTP 服务器读取的数据不是以这种格式编码的。

Instead, the HTTP server is sending headers encoded in ASCII, per RFC 2616sections 6.1 and 2.2. You need to read the headers as text, and then determine how the message body (the "entity") is encoded.

相反,根据RFC 2616第 6.1 和 2.2 节,HTTP 服务器发送以 ASCII 编码的标头。您需要将标题作为文本读取,然后确定消息正文(“实体”)的编码方式。

回答by duffymo

This works fine:

这工作正常:

package url;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;

/**
 * UrlReader
 * @author Michael
 * @since 3/20/11
 */
public class UrlReader
{

    public static void main(String[] args)
    {
        UrlReader urlReader = new UrlReader();

        for (String url : args)
        {
            try
            {
                String contents = urlReader.readContents(url);
                System.out.printf("url: %s contents: %s\n", url, contents);
            }
            catch (Exception e)
            {
                e.printStackTrace();
            }
        }
    }


    public String readContents(String address) throws IOException
    {
        StringBuilder contents = new StringBuilder(2048);
        BufferedReader br = null;

        try
        {
            URL url = new URL(address);
            br = new BufferedReader(new InputStreamReader(url.openStream()));
            String line = "";
            while (line != null)
            {
                line = br.readLine();
                contents.append(line);
            }
        }
        finally
        {
            close(br);
        }

        return contents.toString();
    }

    private static void close(Reader br)
    {
        try
        {
            if (br != null)
            {
                br.close();
            }
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }
}

回答by Brian Roach

This:

这:

public class Main {
    public static void main(String[] args) 
        throws MalformedURLException, IOException 
    {
        URL pageUrl = new URL("http://www.google.com");
        URLConnection getConn = pageUrl.openConnection();
        getConn.connect();
        BufferedReader dis = new BufferedReader( 
                                 new InputStreamReader(
                                     getConn.getInputStream()));
        String myString;
        while ((myString = dis.readLine()) != null)
        {
            System.out.println(myString);
        }
    }
}

Works perfectly. The URL you are supplying, however, returns nothing.

完美运行。但是,您提供的 URL 不返回任何内容。