java 使用 Scanner 读取 UTF-8 字符

Question

提问by user962206

public boolean isValid(String username, String password)  {
        boolean valid = false;
        DataInputStream file = null;

        try{
            Scanner files = new Scanner(new BufferedReader(new FileReader("files/students.txt")));

            while(files.hasNext()){
                System.out.println(files.next());
            }

        }catch(Exception e){
            e.printStackTrace();
        }
        return valid;
    }

How come when I am reading a file that has been written by UTF-8(By another java program) it displays with weird symbols followed by its String name?

为什么当我读取由 UTF-8（由另一个 Java 程序）编写的文件时，它会显示奇怪的符号，后跟其字符串名称？

I wrote it using this

    private static void  addAccount(String username,String password){
        File file = new File(file_name);
        try{
            DataOutputStream dos = new DataOutputStream(new FileOutputStream(file,true));
            dos.writeUTF((username+"::"+password+"\n"));
        }catch(Exception e){

        }
    }

Answer 1

回答by Jonathan Garcia Rey

Here is a simple way to do that:

这是一个简单的方法：

File words = new File(path);
Scanner s = new Scanner(words,"utf-8");

Answer 2

回答by Louis Wasserman

From the FileReaderJavadoc:

来自FileReaderJavadoc：

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

读取字符文件的便利类。此类的构造函数假定默认字符编码和默认字节缓冲区大小是合适的。要自己指定这些值，请在 FileInputStream 上构造 InputStreamReader。

So perhaps something like new InputStreamReader(new FileInputStream(file), "UTF-8"))

所以也许像 new InputStreamReader(new FileInputStream(file), "UTF-8"))

Answer 3

回答by oldrinb

When using DataOutput.writeUTF/DataInput.readUTF, the first 2 bytes form an unsigned 16-bit big-endian integer denoting the size of the string.

使用DataOutput.writeUTF/ 时DataInput.readUTF，前 2 个字节形成一个无符号的 16 位大端整数，表示字符串的大小。

First, two bytes are read and used to construct an unsigned 16-bit integer in exactly the manner of the readUnsignedShortmethod . This integer value is called the UTF lengthand specifies the number of additional bytes to be read. These bytes are then converted to characters by considering them in groups. The length of each group is computed from the value of the first byte of the group. The byte following a group, if any, is the first byte of the next group.

首先，完全按照readUnsignedShort方法的方式读取两个字节并用于构造一个无符号的 16 位整数。此整数值称为UTF 长度，指定要读取的附加字节数。然后通过将它们分组考虑将这些字节转换为字符。每个组的长度是根据组的第一个字节的值计算的。组后面的字节（如果有）是下一组的第一个字节。

These are likely the cause for your issues. You'd need to skip the first 2 bytes and then specify your Scanneruse UTF-8 to read properly.

这些很可能是导致您出现问题的原因。您需要跳过前 2 个字节，然后指定Scanner使用 UTF-8 才能正确读取。

That being said, I do not see any reason to use DataOutput/DataInputhere. You can merely use FileReaderand FileWriterinstead. These will use the default system encoding.

话虽如此，我看不出有任何理由在此处使用DataOutput/ DataInput。您只能使用FileReaderandFileWriter代替。这些将使用默认系统编码。

java 使用 Scanner 读取 UTF-8 字符

提问by user962206

回答by Jonathan Garcia Rey

回答by Louis Wasserman

回答by oldrinb

相关推荐

最近更新

标签

java 使用 Scanner 读取 UTF-8 字符

提问by user962206

回答by Jonathan Garcia Rey

回答by Louis Wasserman

回答by oldrinb

相关推荐

java 使用休眠保存/更新对象

为什么我在 Java 中收到此代码的 IllegalFormatConversionException？

java 在没有 Spring 的情况下使用 AspectJ 日志记录

在 Java 中将 nvarchar(max) 数据类型转换为字符串

相关推荐

最近更新

标签