C# 如何更改 XmlReader 的字符编码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/961699/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to change character encoding of XmlReader
提问by dstr
I have a simple XmlReader:
我有一个简单的 XmlReader:
XmlReader r = XmlReader.Create(fileName);
while (r.Read())
{
Console.WriteLine(r.Value);
}
The problem is, the Xml file has ISO-8859-9
characters in it, which makes XmlReader throw "Invalid character in the given encoding.
" exception. I can solve this problem with adding <?xml version="1.0" encoding="ISO-8859-9" ?>
line in the beginning but I'd like to solve this in another way in case I can't modify the source file. How can I change the encoding of XmlReader?
问题是,Xml 文件中有ISO-8859-9
字符,这使得 XmlReader 抛出“ Invalid character in the given encoding.
”异常。我可以通过<?xml version="1.0" encoding="ISO-8859-9" ?>
在开头添加行来解决这个问题,但我想以另一种方式解决这个问题,以防我无法修改源文件。如何更改 XmlReader 的编码?
采纳答案by Christian Hayter
To force .NET to read the file in as ISO-8859-9, just use one of the many XmlReader.Create overloads, e.g.
要强制 .NET 以 ISO-8859-9 格式读取文件,只需使用众多 XmlReader.Create 重载之一,例如
using(XmlReader r = XmlReader.Create(new StreamReader(fileName, Encoding.GetEncoding("ISO-8859-9")))) {
while(r.Read()) {
Console.WriteLine(r.Value);
}
}
However, that may not work because, IIRC, the W3C XML standard says something about when the XML declaration line has been read, a compliant parser should immediately switch to the encoding specified in the XML declaration regardless of what encoding it was using before. In your case, if the XML file has no XML declaration, the encoding will be UTF-8 and it will still fail. I may be talking nonsense here so try it and see. :-)
但是,这可能不起作用,因为 IIRC,W3C XML 标准说明了何时读取 XML 声明行,兼容解析器应立即切换到 XML 声明中指定的编码,而不管它之前使用的是什么编码。在您的情况下,如果 XML 文件没有 XML 声明,则编码将为 UTF-8,并且仍然会失败。我可能在这里胡说八道,所以试试看。:-)
回答by Noldorin
The XmlTextReader
class (which is what the static Create
method is actually returning, since XmlReader
is the abstract base class) is designed to automatically detect encoding from the XML file itself - there's no way to set it manually.
在XmlTextReader
类(这是静态Create
方法真的返回,因为XmlReader
是抽象基类)被设计为自动检测从XML文件本身的编码-有没有办法手动设置。
Simply insure that you include the following XML declaration in the file you are reading:
只需确保您正在阅读的文件中包含以下 XML 声明:
<?xml version="1.0" encoding="ISO-8859-9"?>
回答by ChrisF
If you can't ensure that the input file has the right header, you could look at one of the other 11 overloads to the XmlReader.Create method.
如果您不能确保输入文件具有正确的标头,您可以查看 XmlReader.Create 方法的其他 11 个重载之一。
Some of these take an XmlReaderSettings
variable or XmlParserContext
variable, or both. I haven't investigated these, but there is a possibility that setting the appropriate values might help here.
其中一些需要一个XmlReaderSettings
变量或XmlParserContext
变量,或两者兼而有之。我还没有研究过这些,但有可能设置适当的值可能会有所帮助。
There is the XmlReaderSettings.CheckCharacters property - the help for this states:
有 XmlReaderSettings.CheckCharacters 属性 - 对此状态的帮助:
Instructs the reader to check characters and throw an exception if any characters are outside the range of legal XML characters. Character checking includes checking for illegal characters in the document, as well as checking the validity of XML names (for example, an XML name may not start with a numeral).
指示阅读器检查字符并在任何字符超出合法 XML 字符范围时抛出异常。字符检查包括检查文档中的非法字符,以及检查 XML 名称的有效性(例如,XML 名称不能以数字开头)。
So setting this to false
might help. However, the help also states:
因此,将其设置为false
可能会有所帮助。但是,帮助还指出:
If the XmlReader is processing text data, it always checks that the XML names and text content are valid, regardless of the property setting. Setting CheckCharacters to false turns off character checking for character entity references.
如果 XmlReader 正在处理文本数据,它总是检查 XML 名称和文本内容是否有效,而不管属性设置如何。将 CheckCharacters 设置为 false 将关闭字符实体引用的字符检查。
So further investigation is warranted.
所以有必要进一步调查。
回答by Math
Use a XmlTextReader
instead of a XmlReader
:
使用 aXmlTextReader
代替 a XmlReader
:
System.Text.Encoding.UTF8.GetString(YourXmlTextReader.Encoding.GetBytes(YourXmlTextReader.Value))