C# 如何读取带有 ANSI 编码和非英文字母的文本文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12130290/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read text files with ANSI encoding and non-English letters?
提问by MichaelT
I have a file that contains non-English chars and was saved in ANSI encoding using a non-English codepage. How can I read this file in C# and see the file content correctly?
我有一个包含非英语字符的文件,并使用非英语代码页以 ANSI 编码保存。如何在 C# 中读取此文件并正确查看文件内容?
Not working
不工作
StreamReader sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.ASCII);
var ags = sr.ReadToEnd();
sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.UTF8);
ags = sr.ReadToEnd();
sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.Unicode);
ags = sr.ReadToEnd();
Working but I need to know what is the code page in advance, which is not possible.
工作,但我需要提前知道代码页是什么,这是不可能的。
sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.GetEncoding(1252));
ags = sr.ReadToEnd();
采纳答案by L.B
var text = File.ReadAllText(file, Encoding.GetEncoding(codePage));
List of codepages : http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx
代码页列表:http: //msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v= vs.85).aspx
回答by KF2
If I remember correctly the XmlDocument.Load(string) method always assumes UTF-8, regardless of the XML encoding. You would have to create a StreamReader with the correct encoding and use that as the parameter.
如果我没记错的话,XmlDocument.Load(string) 方法总是假设 UTF-8,不管 XML 编码如何。您必须使用正确的编码创建一个 StreamReader 并将其用作参数。
xmlDoc.Load(new StreamReader(
File.Open("file.xml"),
Encoding.GetEncoding("iso-8859-15")));
I just stumbled across KB308061 from Microsoft. There's an interesting passage: Specify the encoding declaration in the XML declaration section of the XML document. For example, the following declaration indicates that the document is in UTF-16 Unicode encoding format:
我刚刚偶然发现了 Microsoft 的 KB308061。有一段很有趣:在 XML 文档的 XML 声明部分指定编码声明。例如,以下声明表明文档采用 UTF-16 Unicode 编码格式:
<?xml version="1.0" encoding="UTF-16"?>
Note that this declaration only specifies the encoding format of an XML document and does not modify or control the actual encoding format of the data.
请注意,此声明仅指定 XML 文档的编码格式,并不修改或控制数据的实际编码格式。
Link Source:
链接来源:
回答by Snizzle
You get the question-mark-diamond characters when your textfile uses high-ANSI encoding -- meaning it uses characters between 127 and 255. Those characters have the eighth (i.e. the most significant) bit set. When ASP.NET reads the textfile it assumes UTF-8 encoding, and that most significant bit has a special meaning.
当您的文本文件使用高 ANSI 编码时,您会得到问号菱形字符——这意味着它使用 127 到 255 之间的字符。这些字符设置了第八位(即最重要的)位。当 ASP.NET 读取文本文件时,它假定 UTF-8 编码,并且最重要的位具有特殊含义。
You must force ASP.NET to interpret the textfile as high-ANSI encoding, by telling it the codepage is 1252:
您必须强制 ASP.NET 将文本文件解释为高 ANSI 编码,告诉它代码页是 1252:
String textFilePhysicalPath = System.Web.HttpContext.Current.Server.MapPath("~/textfiles/MyInputFile.txt");
String contents = File.ReadAllText(textFilePhysicalPath, System.Text.Encoding.GetEncoding(1252));
lblContents.Text = contents.Replace("\n", "<br />"); // change linebreaks to HTML
回答by sebastin jiffin a j
using (StreamWriter writer = new StreamWriter(File.Open(@"E:\Sample.txt", FileMode.Append), Encoding.GetEncoding(1250))) ////File.Create(path)
{
writer.Write("Sample Text");
}

