Java 套接字 InputStream 和 UTF-8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24393112/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 11:56:47  来源:igfitidea点击:

Socket InputStream and UTF-8

javasocketsencodingutf-8

提问by Davide Rain

First thing first sorry for my bad English and for my noob question. I'm trying to make a chat with Java. Everything works fine, except that special characters doesn't work. I think that it's a encoding problem because in my Output stream i encode the string in UTF-8 like this:

首先对我的英语不好和我的菜鸟问题感到抱歉。我正在尝试与 Java 聊天。一切正常,除了特殊字符不起作用。我认为这是一个编码问题,因为在我的输出流中,我将字符串编码为 UTF-8,如下所示:

  protected void send(String msg) {

        try {
          msg+="\r\n";            
          OutputStream outStream = socket.getOutputStream();              
          outStream.write(msg.getBytes("UTF-8"));
          System.out.println(msg.getBytes("UTF-8"));
          outStream.flush();
        }
        catch(IOException ex) {
          ex.printStackTrace();
        }
      }

But in my Receive method i didn't find a way to do this:

但是在我的 Receive 方法中,我没有找到一种方法来做到这一点:

public String receive() throws IOException {

    String line = "";
    InputStream inStream = socket.getInputStream();    

    int read = inStream.read();
    while (read!=10 && read > -1) {
      line+=String.valueOf((char)read);
      read = inStream.read();
    }
    if (read==-1) return null;
    line+=String.valueOf((char)read);       
    return line; 

  }

So there is a quick way to specify that the bytes read by the buffer are encoded with UTF-8?

那么有没有一种快速的方法来指定缓冲区读取的字节是用 UTF-8 编码的?

EDIT: Okay i tried with the BufferedReader like this:

编辑:好的,我像这样尝试使用 BufferedReader:

 public String receive() throws IOException {

    String line = "";           
    in = new BufferedReader(new InputStreamReader(socket.getInputStream(), "UTF-8"));           
    String readLine = "";   

    while ((readLine = in.readLine()) != null) {
        line+=readLine;
    }

    System.out.println("Line:"+line);

    return line;

  }

But it doesn't work. It seems that the socket doesn't receive anything.

但它不起作用。似乎套接字没有收到任何东西。

采纳答案by Jean-Fran?ois Savard

try

尝试

BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream(), "UTF-8"));

then

然后

String readLine = "";
while ((readLine = in.readLine()) != null) {
    line+=readLine
}

回答by Brett Okken

Use an InputStreamReaderand OutputStreamWriterboth created with utf-8 as the character encoding.

使用以 utf-8 创建的InputStreamReaderOutputStreamWriter作为字符编码。

If you want to read entire lines of content, you can wrap the InputStreamReaderwith a BufferedReader. Similarly, you can use a BufferedWriteror PrintWriterwrapped around the OutputStreamWriterto write out data as lines.

如果你想阅读整行内容,你可以InputStreamReaderBufferedReader包装。同样,您可以使用环绕 的BufferedWriterPrintWriterOutputStreamWriter数据作为行写出。

回答by yshavit

You should understand the difference between unicode chars and bytes. The short of it is that unicode character points (Java chars, more or less) are the same regardless of the encoding. The encoding changes what chars a given bytesequence translates to.

您应该了解unicode chars 和 bytes 之间区别。简而言之,char无论编码如何,unicode 字符点(Java s,或多或少)都是相同的。编码改变了给定byte序列转换成的字符。

In your code, you've got a String, which is really just a sequence of chars. You translate that to a sequence of bytes using getBytes("UTF-8"). When you read it back, you're reading back each individual byte(as an int, but that's a detail) -- noteach char. You try to convert these bytes to charsusing plain casting, which only works when the code point value of the char is exactly equal to the int value of the byte; for UTF-8, this is only the case for "normal" characters.

在您的代码中,您有一个String,它实际上只是一个chars序列。您可以使用 将其转换为bytes序列getBytes("UTF-8")。当你回读它时,你是在回读每个人byte(作为一个int,但这是一个细节)——而不是每个char。您尝试将这些字节转换为chars使用普通类型转换,这仅在 char 的代码点值与字节的 int 值完全相等时才有效;对于 UTF-8,这仅适用于“普通”字符。

You should instead reconstruct a Stringbased on the bytes from the input stream, and the charset. One way to do this is to read the InputStreaminto a byte[]and then call new String(byte[] bytes, String charset).

您应该String根据输入流中的字节和字符集重建 a 。一种方法是将InputStream入 abyte[]然后调用new String(byte[] bytes, String charset).

You could also use a Readerwhich represents a readable stream of characters. InputStreamReaderreads an InputStreamas the source of its character stream, and BufferedReadercan then take thatcharacter stream and use it to produce Strings, one line at a time, as ProgrammerJeff's answer illustrates.

您还可以使用Reader代表可读字符流的 a 。InputStreamReader读取 anInputStream作为其字符流的源,然后BufferedReader可以使用字符流并使用它来生成Strings,一次一行,如 ProgrammerJeff 的回答所示。

回答by hagrawal

Trying to throw more light for future visitors.

试图为未来的访客提供更多的光线。

Rule of thumb:Server and client HAS TO sync between encoding scheme, because if client is sending data encoded using some encoding scheme and server is reading the data using other encoding scheme, then exepcted results can NEVER be achieved.

经验法则:服务器和客户端必须在编码方案之间同步,因为如果客户端发送使用某种编码方案编码的数据而服务器正在使用其他编码方案读取数据,则永远无法实现预期的结果。

Important thing to notefor the folks who try to test this is that do not encoded in ASCII at client side (or in other words using ASCII encoding at client side) and decode using UTF8 at server side (or in other words using UTF8 encoding at server side) because UTF8 is backward compatible with ASCII, so may feel that "Rule of thumb" is wrong, but no, its not, so better use UTF8 at client side and UTF16 at server side and you will understand.

对于尝试测试这一点的人要注意的重要一点是不要在客户端使用 ASCII 编码(或者换句话说在客户端使用 ASCII 编码)并在服务器端使用 UTF8 解码(或者换句话说使用 UTF8 编码在服务器端)因为 UTF8 向后兼容 ASCII,所以可能会觉得“经验法则”是错误的,但不,它不是,所以最好在客户端使用 UTF8,在服务器端使用 UTF16,你就会明白。

Encoding with sockets

使用套接字编码

I guess single most important thing to understand is: finally over the socket you are going to send BYTES but it all depends how those bytes are encoded.

我想要理解的最重要的事情是:最后通过套接字您将发送 BYTES 但这一切都取决于这些字节的编码方式

For example, if I send input to server (over client-server socket) using my windows command prompt then the data will be encoded using some encoding scheme (I really do not know which), and if I send data to server using another client code/program then I can specify the encoding scheme which I want to use for my client socket's o/p stream, and then all the data will be converted/encoded into BYTES using that encoding scheme and sent over the socket.

例如,如果我使用 Windows 命令提示符将输入发送到服务器(通过客户端 - 服务器套接字),那么数据将使用某种编码方案进行编码(我真的不知道是哪个),如果我使用另一个客户端将数据发送到服务器代码/程序然后我可以指定我想用于我的客户端套接字的 o/p 流的编码方案,然后所有数据将使用该编码方案转换/编码为 BYTES 并通过套接字发送。

Now, finally I am still sending the BYTES over the wire but those are encoded using the encoding scheme which I specified. And if suppose at server side, I use another encoding scheme while reading over the socket's i/p stream then expected results cannot be achieved, and if I use same encoding scheme (same as client's encoding scheme) at server as well then everything will be perfect.

现在,最后我仍然通过线路发送 BYTES,但这些是使用我指定的编码方案进行编码的。如果假设在服务器端,我在读取套接字的 i/p 流时使用另一种编码方案,则无法实现预期的结果,如果我在服务器上也使用相同的编码方案(与客户端的编码方案相同),那么一切都将是完美

Answering this question

回答这个问题

In Java, there are special "bridge" streams (read here) which you can use to specify encoding of the stream.

在 Java 中,有特殊的“桥接”流(在此处阅读),您可以使用它们来指定流的编码。

PLEASE NOTE:in Java InputStreamand OutputStreamare BYTE streams, so everything read from and written into using these streams will be BYTES, you cannot specify encoding using objects of InputStreamand OutputStreamclasses, so you can use Java bridge classes.

请注意:在 Java 中InputStreamOutputStream是 BYTE 流,因此使用这些流读取和写入的所有内容都将是 BYTES,您不能使用InputStreamOutputStream类的对象指定编码,因此您可以使用 Java 桥接类。

Below is the code snippet of client and server, where I am trying to show how to specify encoding over the client's output stream and server's input stream.

下面是客户端和服务器的代码片段,我试图展示如何在客户端的输出流和服务器的输入流上指定编码

As long as I specify same encoding at both end, everything will be perfect.

只要我在两端指定相同的编码,一切都会很完美。

Client side:

客户端:

        Socket clientSocket = new Socket("abc.com", 25050);
        OutputStreamWriter clientSocketWriter = (new OutputStreamWriter(clientSocket.getOutputStream(), "UTF8"));

Server side:

服务器端:

    ServerSocket serverSocket = new ServerSocket(8001);
    Socket clientSocket = serverSocket.accept();
    // PLEASE NOTE: important thing below is I am specifying the encoding over my socket's input stream, and since Java's <<InputStream>> is a BYTE stream,  
    // so in order to specify the encoding I am using Java I/O's bridge class <<InputStreamReader>> and specifying my UTF8 encoding.
    // So, with this all my data (BYTES really) will be read from client socket as bytes "BUT" those will be read as UTF8 encoded bytes.
    // Suppose if I specify different encoding here, than what client is specifying in its o/p stream than data cannot read properly and may be all "?"
    InputStreamReader clientSocketReader = (new InputStreamReader(clientSocket.getInputStream(), "UTF8"));

回答by Abdelmonem Mahmoud Amer

This worked for me, Server side code:

这对我有用,服务器端代码:

    try {   
    Scanner input = new Scanner(new File("myfile.txt"),"UTF-8");
    //send the first line only
    String line=input.nextLine();
    ServerSocket server = new ServerSocket(12345);
    Socket client = server.accept();
    PrintWriter out = new PrintWriter(
    new BufferedWriter(new OutputStreamWriter(
        client.getOutputStream(), "UTF-8")), true);
    out.println(line);
    out.flush();
    input.close();
    server.close();
    }catch (Exception e) {
        e.printStackTrace();
    }

Client side:

客户端:

Socket mysocket = new Socket(SERVER_ADDR, 12345);
       bfr = new BufferedReader(new 
                InputStreamReader(mysocket.getInputStream(), "UTF-8"));
String tmp=bfr.readLine();

The text file should be encoded as UTF-8

文本文件应编码为 UTF-8