C# 如何读取带有 ANSI 编码和非英文字母的文本文件？

Question

提问by MichaelT

I have a file that contains non-English chars and was saved in ANSI encoding using a non-English codepage. How can I read this file in C# and see the file content correctly?

我有一个包含非英语字符的文件，并使用非英语代码页以 ANSI 编码保存。如何在 C# 中读取此文件并正确查看文件内容？

Not working

不工作

StreamReader sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.ASCII);
var ags = sr.ReadToEnd();
sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.UTF8);
ags = sr.ReadToEnd();
sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.Unicode);
ags = sr.ReadToEnd();

Working but I need to know what is the code page in advance, which is not possible.

工作，但我需要提前知道代码页是什么，这是不可能的。

sr=new StreamReader(@"C:\APPLICATIONS.xml",Encoding.GetEncoding(1252));
ags = sr.ReadToEnd();

Answer 1

采纳答案by L.B

 var text = File.ReadAllText(file, Encoding.GetEncoding(codePage));

List of codepages : http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx

代码页列表：http: //msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v= vs.85).aspx

Answer 2

回答by KF2

If I remember correctly the XmlDocument.Load(string) method always assumes UTF-8, regardless of the XML encoding. You would have to create a StreamReader with the correct encoding and use that as the parameter.

如果我没记错的话，XmlDocument.Load(string) 方法总是假设 UTF-8，不管 XML 编码如何。您必须使用正确的编码创建一个 StreamReader 并将其用作参数。

xmlDoc.Load(new StreamReader(
                     File.Open("file.xml"), 
                     Encoding.GetEncoding("iso-8859-15")));

I just stumbled across KB308061 from Microsoft. There's an interesting passage: Specify the encoding declaration in the XML declaration section of the XML document. For example, the following declaration indicates that the document is in UTF-16 Unicode encoding format:

我刚刚偶然发现了 Microsoft 的 KB308061。有一段很有趣：在 XML 文档的 XML 声明部分指定编码声明。例如，以下声明表明文档采用 UTF-16 Unicode 编码格式：

<?xml version="1.0" encoding="UTF-16"?>

Note that this declaration only specifies the encoding format of an XML document and does not modify or control the actual encoding format of the data.

请注意，此声明仅指定 XML 文档的编码格式，并不修改或控制数据的实际编码格式。

Link Source:

链接来源：

XmlDocument.Load() method fails to decode (euro)

XmlDocument.Load() 方法无法解码（欧元）

Answer 3

回答by Snizzle

You get the question-mark-diamond characters when your textfile uses high-ANSI encoding -- meaning it uses characters between 127 and 255. Those characters have the eighth (i.e. the most significant) bit set. When ASP.NET reads the textfile it assumes UTF-8 encoding, and that most significant bit has a special meaning.

当您的文本文件使用高 ANSI 编码时，您会得到问号菱形字符——这意味着它使用 127 到 255 之间的字符。这些字符设置了第八位（即最重要的）位。当 ASP.NET 读取文本文件时，它假定 UTF-8 编码，并且最重要的位具有特殊含义。

You must force ASP.NET to interpret the textfile as high-ANSI encoding, by telling it the codepage is 1252:

您必须强制 ASP.NET 将文本文件解释为高 ANSI 编码，告诉它代码页是 1252：

String textFilePhysicalPath = System.Web.HttpContext.Current.Server.MapPath("~/textfiles/MyInputFile.txt");
String contents = File.ReadAllText(textFilePhysicalPath, System.Text.Encoding.GetEncoding(1252));
lblContents.Text = contents.Replace("\n", "<br />");  // change linebreaks to HTML

Answer 4

回答by sebastin jiffin a j

using (StreamWriter writer = new StreamWriter(File.Open(@"E:\Sample.txt", FileMode.Append), Encoding.GetEncoding(1250)))  ////File.Create(path)
        {
            writer.Write("Sample Text");
        }

C# 如何读取带有 ANSI 编码和非英文字母的文本文件？

提问by MichaelT

采纳答案by L.B

回答by KF2

回答by Snizzle

回答by sebastin jiffin a j

相关推荐

最近更新

标签

C# 如何读取带有 ANSI 编码和非英文字母的文本文件？

提问by MichaelT

采纳答案by L.B

回答by KF2

回答by Snizzle

回答by sebastin jiffin a j

相关推荐

C# 如何在每次迭代中重用 SqlCommand 参数？

使用 C# 将 CSV 文件读入数组

C# 使用 Linq-to-SQL 添加多条记录

C# System.Security.Cryptography.CryptographicException：密钥集不存在

相关推荐

最近更新

标签