java 使用 Scanner 读取 UTF-8 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11963563/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 07:06:20  来源:igfitidea点击:

Reading UTF-8 characters using Scanner

javautf-8io

提问by user962206

public boolean isValid(String username, String password)  {
        boolean valid = false;
        DataInputStream file = null;

        try{
            Scanner files = new Scanner(new BufferedReader(new FileReader("files/students.txt")));

            while(files.hasNext()){
                System.out.println(files.next());
            }

        }catch(Exception e){
            e.printStackTrace();
        }
        return valid;
    }

How come when I am reading a file that has been written by UTF-8(By another java program) it displays with weird symbols followed by its String name?

为什么当我读取由 UTF-8(由另一个 Java 程序)编写的文件时,它会显示奇怪的符号,后跟其字符串名称?

I wrote it using this

    private static void  addAccount(String username,String password){
        File file = new File(file_name);
        try{
            DataOutputStream dos = new DataOutputStream(new FileOutputStream(file,true));
            dos.writeUTF((username+"::"+password+"\n"));
        }catch(Exception e){

        }
    } 

回答by Jonathan Garcia Rey

Here is a simple way to do that:

这是一个简单的方法:

File words = new File(path);
Scanner s = new Scanner(words,"utf-8");

回答by Louis Wasserman

From the FileReaderJavadoc:

来自FileReaderJavadoc:

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

读取字符文件的便利类。此类的构造函数假定默认字符编码和默认字节缓冲区大小是合适的。要自己指定这些值,请在 FileInputStream 上构造 InputStreamReader。

So perhaps something like new InputStreamReader(new FileInputStream(file), "UTF-8"))

所以也许像 new InputStreamReader(new FileInputStream(file), "UTF-8"))

回答by oldrinb

When using DataOutput.writeUTF/DataInput.readUTF, the first 2 bytes form an unsigned 16-bit big-endian integer denoting the size of the string.

使用DataOutput.writeUTF/ 时DataInput.readUTF,前 2 个字节形成一个无符号的 16 位大端整数,表示字符串的大小。

First, two bytes are read and used to construct an unsigned 16-bit integer in exactly the manner of the readUnsignedShortmethod . This integer value is called the UTF lengthand specifies the number of additional bytes to be read. These bytes are then converted to characters by considering them in groups. The length of each group is computed from the value of the first byte of the group. The byte following a group, if any, is the first byte of the next group.

首先,完全按照readUnsignedShort方法的方式读取两个字节并用于构造一个无符号的 16 位整数。此整数值称为UTF 长度,指定要读取的附加字节数。然后通过将它们分组考虑将这些字节转换为字符。每个组的长度是根据组的第一个字节的值计算的。组后面的字节(如果有)是下一组的第一个字节。

These are likely the cause for your issues. You'd need to skip the first 2 bytes and then specify your Scanneruse UTF-8 to read properly.

这些很可能是导致您出现问题的原因。您需要跳过前 2 个字节,然后指定Scanner使用 UTF-8 才能正确读取。

That being said, I do not see any reason to use DataOutput/DataInputhere. You can merely use FileReaderand FileWriterinstead. These will use the default system encoding.

话虽如此,我看不出有任何理由在此处使用DataOutput/ DataInput。您只能使用FileReaderandFileWriter代替。这些将使用默认系统编码。