通过 C# 读取 txt 文件（unicode 和 utf8）

Question

提问by mtkachenko

I created two txt files (windows notepad) with the same content "thank you - спасибо" and saved them in utf8 and unicode. In notepad they look fine. Then I tried to read them using .Net:

我创建了两个具有相同内容的 txt 文件（windows 记事本）“谢谢 - спасибо”，并将它们保存在 utf8 和 unicode 中。在记事本中，它们看起来不错。然后我尝试使用 .Net 阅读它们：

...File.ReadAllText(utf8FileFullName, Encoding.UTF8);

and

和

...File.ReadAllText(unicodeFileFullName, Encoding.Unicode);

But in both cases I got this "thank you - ???????". What's wrong?

但在这两种情况下，我都得到了“谢谢 - ???????”。怎么了？

Upd: code for utf8

更新：utf8的代码

static void Main(string[] args)
        {
            var encoding = Encoding.UTF8;
            var file = new FileInfo(@"D:\encodes\enc.txt");
            Console.OutputEncoding = encoding;
            var content = File.ReadAllText(file.FullName, encoding);
            Console.WriteLine("encoding: " + encoding);
            Console.WriteLine("content: " + content);
            Console.ReadLine();
        }

Result: thanks ?D?D°?D?D±D?

结果： 谢谢？D？D°？D？D±D？

Answer 1

回答by keyboardP

Edited as UTF8should support the characters. It seems that you're outputting to a console or a location which hasn't had its encoding set. If so, you need to set the encoding. For the console you can do this

编辑为UTF8应支持字符。您似乎正在输出到控制台或尚未设置编码的位置。如果是这样，您需要设置编码。对于控制台，您可以执行此操作

string allText = File.ReadAllText(unicodeFileFullName, Encoding.UTF8);
Console.OutputEncoding = Encoding.UTF8;
Console.WriteLine(allText);

Answer 2

回答by Warren Rox

When outputting Unicode or UTF-8 encoded multi-byte characters to the console you will need to set the encoding as well as ensure that the console has a font set that supports the multi-byte character in order to display the corresponding glyph. With your existing code a MessageBox.Show(content) or display on a Windows or Web Form would appear correctly.

将 Unicode 或 UTF-8 编码的多字节字符输出到控制台时，您需要设置编码并确保控制台具有支持多字节字符的字体集，以便显示相应的字形。使用现有代码，MessageBox.Show(content) 或 Windows 或 Web 窗体上的显示将正确显示。

Have a look at http://msdn.microsoft.com/en-us/library/system.console.aspxfor an explanation on setting fonts within the console window.

请查看http://msdn.microsoft.com/en-us/library/system.console.aspx，了解有关在控制台窗口中设置字体的说明。

"Support for Unicode characters requires the encoder to recognize a particular Unicode character, and also requires a font that has the glyphs needed to render that character. To successfully display Unicode characters to the console, the console font must be set to a non-raster or TrueType font such as Consolas or Lucida Console."

"对 Unicode 字符的支持要求编码器识别特定的 Unicode 字符，并且还需要一种具有呈现该字符所需字形的字体。要成功地向控制台显示 Unicode 字符，必须将控制台字体设置为非光栅或 TrueType 字体，例如 Consolas 或 Lucida Console。”

As a side note, you can use the FileStream class to read the first three bytes of the file and look for the byte order mark indicator to automatically set the encoding when reading the file. For example, if byte[0] == 0xEF && byte[1] == 0xBB && byte[2] == 0xBF then you have a UTF-8 encoded file. Refer to http://en.wikipedia.org/wiki/Byte_order_markfor more information.

作为旁注，您可以使用 FileStream 类读取文件的前三个字节，并查找字节顺序标记指示器以在读取文件时自动设置编码。例如，如果 byte[0] == 0xEF && byte[1] == 0xBB && byte[2] == 0xBF 那么你有一个 UTF-8 编码的文件。有关更多信息，请参阅http://en.wikipedia.org/wiki/Byte_order_mark。

Answer 3

回答by alireza amini

Use the Encoding type Default

使用编码类型默认

File.ReadAllText(unicodeFileFullName, Encoding.Default);

It will fix the ????Chracters.

它将修复????字符。

通过 C# 读取 txt 文件（unicode 和 utf8）

提问by mtkachenko

回答by keyboardP

回答by Warren Rox

回答by alireza amini

相关推荐

最近更新

标签

通过 C# 读取 txt 文件（unicode 和 utf8）

提问by mtkachenko

回答by keyboardP

回答by Warren Rox

回答by alireza amini

相关推荐

C# 等效于 C++ 中的 64 位 unsigned long long

C# 警告不等待此调用，继续执行当前方法

C# 将角度（以度为单位）转换为向量

C# 从 URL 读取 xml

相关推荐

最近更新

标签