Java 为什么我的字符串返回“\ufffd\ufffdN am e”

Question

提问by Xavier

This is my method

这是我的方法

public void readFile3()throws IOException
{
    try
    {
        FileReader fr = new FileReader(Path3);
        BufferedReader br = new BufferedReader(fr);
        String s = br.readLine();
        int a =1;
        while( a != 2)
        {
            s = br.readLine();
            a ++; 

        }
        Storage.add(s);

        br.close();

    }
    catch(IOException e)
    {
        System.out.println(e.getMessage());
    }
}

For some reason I am unable to read the file which only contains this " Name Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz "

出于某种原因，我无法读取仅包含此“名称 Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz”的文件

When i debug the code the String s is being returned as "\ufffd\ufffdN a m e" and i have no clue as to where those extra characters are coming from.. This is preventing me from properly reading the file.

当我调试代码时， String s 被返回为“\ufffd\ufffdN am e”，我不知道这些额外字符来自哪里。这阻止了我正确读取文件。

Answer 1

采纳答案by Serge Ballesta

\ufffd is the replacement character in unicode, it is used when you try to read a code that has no representation in unicode. I suppose you are on a Windows platform (or at least the file you read was created on Windows). Windows supports many formats for text files, the most common is Ansi : each character is represented but its ansi code.

\ufffd 是 unicode 中的替换字符，当您尝试读取在 unicode 中没有表示的代码时使用它。我想您使用的是 Windows 平台（或者至少您读取的文件是在 Windows 上创建的）。Windows 支持多种文本文件格式，最常见的是 Ansi ：每个字符都表示但它的 ansi 代码。

But Windows can directly use UTF16, where each character is represented by its unicode code as a 16bits integer so with 2 bytes per character. Those files uses special markers (Byte Order Mark in Windows dialect) to say :

但是 Windows 可以直接使用 UTF16，其中每个字符由其 unicode 代码表示为 16 位整数，因此每个字符有 2 个字节。这些文件使用特殊标记（Windows 方言中的字节顺序标记）表示：

that the file is encoded with 2 (or even 4) bytes per character
the encoding is little or big endian

该文件是用每个字符 2（甚至 4）个字节编码的
编码是小端或大端

(Reference : Using Byte Order Markson MSDN)

（参考：在 MSDN 上使用字节顺序标记）

As you write after the first two replacement characters N a m eand not Name, I suppose you have an UTF16 encoded text file. Notepad can transparently edit those files (without even saying you the actual format) but other tools do have problems with those ... The excellent vimcan read files with different encodings and convert between them.

当您在前两个替换字符N a m e而不是之后写入时Name，我想您有一个 UTF16 编码的文本文件。记事本可以透明地编辑这些文件（甚至没有告诉你实际的格式），但其他工具确实有这些问题......优秀的vim可以读取不同编码的文件并在它们之间进行转换。

If you want to use directly this kind of file in java, you have to use the UTF-16 charset. From JaveSE 7 javadoc on Charset: UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

如果你想在java中直接使用这种文件，你必须使用UTF-16字符集。从 JaveSE 7 javadoc 开始Charset：UTF-16 十六位 UCS 转换格式，由可选字节顺序标记标识的字节顺序

Answer 2

回答by Honesty

Check to see if the file is .odt, .rtf, or something other than .txt. This may be what's causing the extra UTF-16 characters to appear. Also, make sure that (even if it is a .txt file) your file is encoded in UTF-8 characters.

检查文件是否为 .odt、.rtf 或 .txt 以外的其他格式。这可能是导致出现额外 UTF-16 字符的原因。此外，请确保（即使它是 .txt 文件）您的文件以 UTF-8 字符编码。

Perhaps you have UTF-16 characters such as '?' in your document.

也许您有 UTF-16 字符，例如 '?' 在您的文档中。

Answer 3

回答by alex.pulver

You must specify the encoding when reading the file, in your case probably is UTF-16.

您必须在读取文件时指定编码，在您的情况下可能是 UTF-16。

Reader reader = new InputStreamReader(new FileInputStream(fileName), "UTF-16");
BufferedReader br = new BufferedReader(reader);

Check the documentation for more details: InputStreamReader class.

查看文档以获取更多详细信息：InputStreamReader class。

Java 为什么我的字符串返回“\ufffd\ufffdN am e”

提问by Xavier

采纳答案by Serge Ballesta

回答by Honesty

回答by alex.pulver

相关推荐

最近更新

标签

Java 为什么我的字符串返回“\ufffd\ufffdN am e”

提问by Xavier

采纳答案by Serge Ballesta

回答by Honesty

回答by alex.pulver

相关推荐

Java 拖动并调整未修饰的 JFrame 的大小

Java中的构造函数链

Java System.out.println 中的错误

Java 为什么OperatingSystemMxBean 的访问仅限于jre6/lib/rt.jar？

相关推荐

最近更新

标签