C# XML 异常:无效字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/854335/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 04:18:02  来源:igfitidea点击:

XML Exception: Invalid Character(s)

c#xmllinq-to-xml

提问by Meiscooldude

I am working on a small project that is receiving XML data in string form from a long running application. I am trying to load this string data into an XDocument(System.Xml.Linq.XDocument), and then from there do some XML Magic and create an xlsx file for a report on the data.

我正在处理一个从长期运行的应用程序接收字符串形式的 XML 数据的小项目。我正在尝试将此字符串数据加载到XDocument( System.Xml.Linq.XDocument) 中,然后从那里执行一些 XML Magic 并为数据报告创建一个 xlsx 文件。

On occasion, I receive the data that has invalid XML characters, and when trying to parse the string into an XDocument, I get this error.

有时,我会收到包含无效 XML 字符的数据,并且在尝试将字符串解析为 时XDocument,会收到此错误。

[System.Xml.XmlException] Message: '?', hexadecimal value 0x1C, is an invalid character.

[System.Xml.XmlException] 消息:'?',十六进制值 0x1C,是一个无效字符。

Since I have no control over the remote application, you could expect ANY kind of character.

由于我无法控制远程应用程序,因此您可以期待任何类型的字符。

I am well aware that XML has a way where you can put characters in it such as &#x1Cor something like that.

我很清楚 XML 有一种方法可以将字符放入其中,例如&#x1C或类似的东西。

If at all possible I would SERIOUSLY like to keep ALL the data. If not, than let it be.

如果可能的话,我非常想保留所有数据。如果没有,那就让它吧。



I have thought about editing the response string programatically, then going back and trying to re-parse should an exception be thrown, but I have tried a few methods and none of them seem successful.

我曾考虑过以编程方式编辑响应字符串,然后返回并尝试重新解析是否抛出异常,但我尝试了几种方法,但似乎都没有成功。

Thank you for your thought.

谢谢你的想法。

Code is something along the line of this:

代码是这样的:

TextReader  tr;
XDocument  doc;

string           response; //XML string received from server. 
... 
tr = new StringReader (response);   

try
{
    doc = XDocument.Load(tr);
}
catch (XmlException e)
{
    //handle here?
}

采纳答案by great_llama

XML can handle just about any character, but there are ranges, control codes and such, that it won't.

XML 可以处理几乎任何字符,但有范围、控制代码等,它不会。

Your best bet, if you can't get them to fix their output, is to sanitize the raw data you're receiving. You need replace illegal characters with the character reference format you noted.

如果你不能让他们修复他们的输出,你最好的选择是清理你收到的原始数据。您需要用您记下的字符参考格式替换非法字符。

(You can't even resort to CDATA, as there is no way to escape these characters there.)

(您甚至不能求助于 CDATA,因为无法在那里转义这些字符。)

回答by alamar

If your input is not XML, you should use something like Tidy or Tagsoup to clean the mess up.

如果您的输入不是 XML,您应该使用 Tidy 或 Tagsoup 之类的东西来清理混乱。

They would take any input and try, hopefully, to make a useful DOM from it.

他们会接受任何输入,并希望从中创建有用的 DOM。

I don't know how relevant dark side libraries are called.

我不知道如何调用相关的暗面库。

回答by Richard Morgan

Would something as described in this blog postbe helpful?

这篇博文中描述的内容会有所帮助吗?

Basically, he creates a sanitizing xml stream.

基本上,他创建了一个清理 xml 流。

回答by John Saunders

Garbage In, Garbage Out. If the remote application is sending you garbage, then that's all you'll get. If they think they're sending XML, then they need to be fixed. In this case, you're not doing them any favors by working around their bug.

垃圾进垃圾出。如果远程应用程序向您发送垃圾,那么这就是您所得到的。如果他们认为他们正在发送 XML,那么他们需要被修复。在这种情况下,您解决他们的错误并没有给他们任何好处。

You should also make sure of what they think they're sending. What did the %1C mean to them? What did they want it to be?

您还应该确保他们认为他们发送的是什么。%1C 对他们意味着什么?他们想要它是什么?

回答by Darin Dimitrov

IMHO the best solution would be to modify the code/program/whatever produced the invalid XML that is being fed to your program. Unfortunately this is not always possible. In this case you need to escape all characters < 0x20 before trying to load the document.

恕我直言,最好的解决方案是修改代码/程序/任何产生被提供给您的程序的无效 XML 的东西。不幸的是,这并不总是可能的。在这种情况下,您需要在尝试加载文档之前转义所有 < 0x20 的字符。

回答by Matthew Flaschen

If you really can't fix the source XML data, consider taking an approach like I described in this answer. Basically, you create a TextReadersubclass (e.g StripTextReader) that wraps an existing TextReader (tr) and discards invalid characters.

如果您确实无法修复源 XML 数据,请考虑采用我在此答案中描述的方法。基本上,您创建一个TextReader子类(例如StripTextReader)来包装现有的TextReader (tr) 并丢弃无效字符。

回答by paulselles

You can use the XmlReaderand set the XmlReaderSettings.CheckCharactersproperty to false. This will let you to read the XML file despite the invalid characters. From there you can import pass it to a XmlDocument or XDocument object.

您可以使用XmlReader并将XmlReaderSettings.CheckCharacters属性设置为false。尽管有无效字符,这将使您能够读取 XML 文件。从那里您可以将其导入传递给 XmlDocument 或 XDocument 对象。

You can read a little more about in my blog.

您可以在我的博客中阅读更多相关信息。

To load the data to a System.Xml.Linq.XDocumentit will look a little something like this:

要将数据加载到System.Xml.Linq.XDocument 中,它看起来有点像这样:

XDocument xDocument = null;
XmlReaderSettings xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false };
using (XmlReader xmlReader = XmlReader.Create(filename, xmlReaderSettings))
{
    xmlReader.MoveToContent();
    xDocument = XDocument.Load(xmlReader);
}

More information can be found here.

可以在此处找到更多信息。