通过 C# 读取 txt 文件(unicode 和 utf8)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18871603/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Read txt files (in unicode and utf8) by means of C#
提问by mtkachenko
I created two txt files (windows notepad) with the same content "thank you - спасибо" and saved them in utf8 and unicode. In notepad they look fine. Then I tried to read them using .Net:
我创建了两个具有相同内容的 txt 文件(windows 记事本)“谢谢 - спасибо”,并将它们保存在 utf8 和 unicode 中。在记事本中,它们看起来不错。然后我尝试使用 .Net 阅读它们:
...File.ReadAllText(utf8FileFullName, Encoding.UTF8);
and
和
...File.ReadAllText(unicodeFileFullName, Encoding.Unicode);
But in both cases I got this "thank you - ???????". What's wrong?
但在这两种情况下,我都得到了“谢谢 - ???????”。怎么了?
Upd: code for utf8
更新:utf8的代码
static void Main(string[] args)
{
var encoding = Encoding.UTF8;
var file = new FileInfo(@"D:\encodes\enc.txt");
Console.OutputEncoding = encoding;
var content = File.ReadAllText(file.FullName, encoding);
Console.WriteLine("encoding: " + encoding);
Console.WriteLine("content: " + content);
Console.ReadLine();
}
Result: thanks ?D?D°?D?D±D?
结果: 谢谢?D?D°?D?D±D?
回答by keyboardP
Edited as UTF8
should support the characters. It seems that you're outputting to a console or a location which hasn't had its encoding set. If so, you need to set the encoding. For the console you can do this
编辑为UTF8
应支持字符。您似乎正在输出到控制台或尚未设置编码的位置。如果是这样,您需要设置编码。对于控制台,您可以执行此操作
string allText = File.ReadAllText(unicodeFileFullName, Encoding.UTF8);
Console.OutputEncoding = Encoding.UTF8;
Console.WriteLine(allText);
回答by Warren Rox
When outputting Unicode or UTF-8 encoded multi-byte characters to the console you will need to set the encoding as well as ensure that the console has a font set that supports the multi-byte character in order to display the corresponding glyph. With your existing code a MessageBox.Show(content) or display on a Windows or Web Form would appear correctly.
将 Unicode 或 UTF-8 编码的多字节字符输出到控制台时,您需要设置编码并确保控制台具有支持多字节字符的字体集,以便显示相应的字形。使用现有代码,MessageBox.Show(content) 或 Windows 或 Web 窗体上的显示将正确显示。
Have a look at http://msdn.microsoft.com/en-us/library/system.console.aspxfor an explanation on setting fonts within the console window.
请查看http://msdn.microsoft.com/en-us/library/system.console.aspx,了解有关在控制台窗口中设置字体的说明。
"Support for Unicode characters requires the encoder to recognize a particular Unicode character, and also requires a font that has the glyphs needed to render that character. To successfully display Unicode characters to the console, the console font must be set to a non-raster or TrueType font such as Consolas or Lucida Console."
"对 Unicode 字符的支持要求编码器识别特定的 Unicode 字符,并且还需要一种具有呈现该字符所需字形的字体。要成功地向控制台显示 Unicode 字符,必须将控制台字体设置为非光栅或 TrueType 字体,例如 Consolas 或 Lucida Console。”
As a side note, you can use the FileStream class to read the first three bytes of the file and look for the byte order mark indicator to automatically set the encoding when reading the file. For example, if byte[0] == 0xEF && byte[1] == 0xBB && byte[2] == 0xBF then you have a UTF-8 encoded file. Refer to http://en.wikipedia.org/wiki/Byte_order_markfor more information.
作为旁注,您可以使用 FileStream 类读取文件的前三个字节,并查找字节顺序标记指示器以在读取文件时自动设置编码。例如,如果 byte[0] == 0xEF && byte[1] == 0xBB && byte[2] == 0xBF 那么你有一个 UTF-8 编码的文件。有关更多信息,请参阅http://en.wikipedia.org/wiki/Byte_order_mark。
回答by alireza amini
Use the Encoding type Default
使用编码类型默认
File.ReadAllText(unicodeFileFullName, Encoding.Default);
It will fix the ????
Chracters.
它将修复????
字符。