.net 十六进制值 0x00 是无效字符

Question

提问by jb.

I am generating an XML document from a StringBuilder, basically something like:

我正在从 StringBuilder 生成一个 XML 文档，基本上类似于：

string.Format("<text><row>{0}</row><col>{1}</col><textHeight>{2}</textHeight><textWidth>{3}</textWidth><data>{4}</data><rotation>{5}</rotation></text>

Later, something like:

后来，类似：

XmlDocument document = new XmlDocument();
document.LoadXml(xml);
XmlNodeList labelSetNodes = document.GetElementsByTagName("labels");
for (int index = 0; index < labelSetNodes.Count; index++)
{
    //do something
}

All the data comes from a database. Recently I've had a few issues with the error:

所有的数据都来自一个数据库。最近我遇到了一些错误问题：

Hexadecimal value 0x00 is a invalid character, line 1, position nnnnn

十六进制值 0x00 是无效字符，第 1 行，位置 nnnnn

But its not consistent. Sometimes some 'blank' data will work. The 'faulty' data works on some PCs, but not others.

但它的不一致。有时一些“空白”数据会起作用。“错误”数据适用于某些 PC，但不适用于其他 PC。

In the database, the data is always a blank string. It is never 'null' and in the XML file, it comes out as < data>< /data>, i.e. no character between opening and closing. (but not sure if this can be relied on as I am pulling it from the 'immediate' window is vis studio and pasting it into textpad).

在数据库中，数据始终为空字符串。它永远不会是 'null' 并且在 XML 文件中，它显示为< data>< /data>，即在打开和关闭之间没有字符。（但不确定这是否可以依赖，因为我从“立即”窗口中将其拉出 vis studio 并将其粘贴到文本板中）。

There is possibly differences in the versions of sql server (2008 is where it would fail, 2005 would work) and collation too. Not sure if any of these are likely causes?

sql server 的版本可能存在差异（2008 是它会失败的地方，2005 会工作）和排序规则。不确定这些是否是可能的原因？

But exactly the same code and data will sometimes fail. Any ideas where the problem lies?

但是完全相同的代码和数据有时会失败。任何想法问题出在哪里？

Answer 1

采纳答案by Dour High Arch

Without your actual data or source, it will be hard for us to diagnose what is going wrong. However, I can make a few suggestions:

如果没有您的实际数据或来源，我们将很难诊断出了什么问题。不过，我可以提出几点建议：

Unicode NUL (0x00) is illegal in all versions of XML and validating parsers must reject input that contains it.
Despite the above; real-world non-validated XML can contain any kind of garbage ill-formed bytes imaginable.
XML 1.1 allows zero-width and nonprinting control characters (except NUL), so you cannot look at an XML 1.1 file in a text editor and tell what characters it contains.

Unicode NUL (0x00) 在所有版本的 XML 中都是非法的，验证解析器必须拒绝包含它的输入。
尽管有上述情况；现实世界中未经验证的 XML 可以包含任何可以想象的垃圾格式错误的字节。
XML 1.1 允许零宽度和非打印控制字符（NUL 除外），因此您无法在文本编辑器中查看 XML 1.1 文件并判断它包含哪些字符。

Given what you wrote, I suspect whatever converts the database data to XML is broken; it's propagating non-XML characters.

鉴于您所写的内容，我怀疑将数据库数据转换为 XML 的任何内容都已损坏；它正在传播非 XML 字符。

Create some database entries with non-XML characters (NULs, DELs, control characters, et al.) and run your XML converter on it. Output the XML to a file and look at it in a hex editor. If this contains non-XML characters, your converter is broken. Fix it or, if you cannot, create a preprocessor that rejects output with such characters.

使用非 XML 字符（NUL、DEL、控制字符等）创建一些数据库条目，并在其上运行 XML 转换器。将 XML 输出到文件并在十六进制编辑器中查看它。如果这包含非 XML 字符，则您的转换器已损坏。修复它，或者，如果不能，创建一个预处理器来拒绝带有此类字符的输出。

If the converter output looks good, the problem is in your XML consumer; it's inserting non-XML characters somewhere. You will have to break your consumption process into separate steps, examine the output at each step, and narrow down what is introducing the bad characters.

如果转换器输出看起来不错，则问题出在您的 XML 使用者；它在某处插入非 XML 字符。您必须将消费过程分解为单独的步骤，检查每个步骤的输出，并缩小引入不良字符的范围。

Check file encoding (for UTF-16)

检查文件编码（对于 UTF-16）

Update: I just ran into an example of this myself! What was happening is that the producer was encoding the XML as UTF16 and the consumer was expecting UTF8. Since UTF16 uses 0x00 as the high byte for all ASCII characters and UTF8 doesn't, the consumer was seeing every second byte as a NUL. In my case I could change encoding, but suggested all XML payloads start with a BOM.

更新：我自己刚刚遇到了一个例子！发生的事情是生产者将 XML 编码为 UTF16，而消费者期望使用 UTF8。由于 UTF16 使用 0x00 作为所有 ASCII 字符的高字节而 UTF8 不使用，因此消费者将每第二个字节视为 NUL。就我而言，我可以更改编码，但建议所有 XML 有效负载都以 BOM 开头。

Answer 2

回答by sonjz

In my case, it took some digging, but found it.

就我而言，它需要进行一些挖掘，但找到了。

My Context

我的背景

I'm looking at exception/error logs from the website using Elmah. Elmah returns the state of the server at the of time the exception, in the form of a large XML document. For our reporting engine I pretty-print the XML with XmlWriter.

我正在使用 Elmah 查看来自网站的异常/错误日志。Elmah 以大型 XML 文档的形式返回发生异常时服务器的状态。对于我们的报告引擎，我使用 XmlWriter 漂亮地打印了 XML。

During a website attack, I noticed that some xmls weren't parsing and was receiving this '.', hexadecimal value 0x00, is an invalid character.exception.

在网站攻击期间，我注意到一些 xml 没有解析并收到此'.', hexadecimal value 0x00, is an invalid character.异常。

NON-RESOLUTION:I converted the document to a byte[]and sanitized it of 0x00, but it found none.

非解决方案：我将文档转换为 abyte[]并将其清理为 0x00，但没有找到。

When I scanned the xml document, I found the following:

当我扫描xml文档时，我发现了以下内容：

...
<form>
...
<item name="SomeField">
   <value
     string="C:\boot.ini&#x0;.htm" />
 </item>
...

There was the nul byte encoded as an html entity !!!

有 nul 字节编码为 html 实体！！！

RESOLUTION:To fix the encoding, I replaced the value before loading it into my XmlDocument, because loading it will create the nul byte and it will be difficult to sanitize it from the object. Here's my entire process:

解决方案：为了修复编码，我在将值加载到我的之前替换了它XmlDocument，因为加载它会创建空字节并且很难从对象中清除它。这是我的整个过程：

XmlDocument xml = new XmlDocument();
details.Xml = details.Xml.Replace("&#x0;", "[0x00]");  // in my case I want to see it, otherwise just replace with ""
xml.LoadXml(details.Xml);

string formattedXml = null;

// I have this in a helper function, but for this example I have put it in-line
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings {
    OmitXmlDeclaration = true,
    Indent = true,
    IndentChars = "\t",
    NewLineHandling = NewLineHandling.None,
};
using (XmlWriter writer = XmlWriter.Create(sb, settings)) {
    xml.Save(writer);
    formattedXml = sb.ToString();
}

LESSON LEARNED:sanitize for illegal bytes using the associated html entity, if your incoming data is html encoded on entry.

学习到的课程：如果您的传入数据在条目时是 html 编码的，则使用关联的 html 实体清理非法字节。

Answer 3

回答by Mike-Monkey

To add to Sonz's answer above, following worked for us.

添加到上面 Sonz 的答案中，以下内容对我们有用。

//Instead of 
XmlString.Replace("&#x0;", "[0x00]");
// use this
XmlString.Replace("\x00", "[0x00]");

Answer 4

回答by DEEPAK SHARMA

I also get the same error in an ASP.NET application when I saved some unicode data (Hindi) in the Web.config file and saved it with "Unicode" encoding.

当我在 Web.config 文件中保存一些 unicode 数据（印地语）并使用“Unicode”编码保存时，我在 ASP.NET 应用程序中也遇到了同样的错误。

It fixed the error for me when I saved the Web.config file with "UTF-8" encoding.

当我使用“UTF-8”编码保存 Web.config 文件时，它为我修复了错误。

Answer 5

回答by Stefan Steiger

As kind of a late answer:

作为一种迟到的答案：

I've had this problem with SSRS ReportService2005.asmx when uploading a report.

我在上传报告时遇到了 SSRS ReportService2005.asmx 的这个问题。

    Public Shared Sub CreateReport(ByVal strFileNameAndPath As String, ByVal strReportName As String, ByVal strReportingPath As String, Optional ByVal bOverwrite As Boolean = True)
        Dim rs As SSRS_2005_Administration_WithFOA = New SSRS_2005_Administration_WithFOA
        rs.Credentials = ReportingServiceInterface.GetMyCredentials(strCredentialsURL)
        rs.Timeout = ReportingServiceInterface.iTimeout
        rs.Url = ReportingServiceInterface.strReportingServiceURL
        rs.UnsafeAuthenticatedConnectionSharing = True

        Dim btBuffer As Byte() = Nothing

        Dim rsWarnings As Warning() = Nothing
        Try
            Dim fstrStream As System.IO.FileStream = System.IO.File.OpenRead(strFileNameAndPath)
            btBuffer = New Byte(fstrStream.Length - 1) {}
            fstrStream.Read(btBuffer, 0, CInt(fstrStream.Length))
            fstrStream.Close()
        Catch ex As System.IO.IOException
            Throw New Exception(ex.Message)
        End Try

        Try
            rsWarnings = rs.CreateReport(strReportName, strReportingPath, bOverwrite, btBuffer, Nothing)

            If Not (rsWarnings Is Nothing) Then
                Dim warning As Warning
                For Each warning In rsWarnings
                    Log(warning.Message)
                Next warning
            Else
                Log("Report: {0} created successfully with no warnings", strReportName)
            End If

        Catch ex As System.Web.Services.Protocols.SoapException
            Log(ex.Detail.InnerXml.ToString())
        Catch ex As Exception
            Log("Error at creating report. Invalid server name/timeout?" + vbCrLf + vbCrLf + "Error Description: " + vbCrLf + ex.Message)
            Console.ReadKey()
            System.Environment.Exit(1)
        End Try
    End Sub ' End Function CreateThisReport

The problem occurs when you allocate a byte array that is at least 1 byte larger than the RDL (XML) file.

当您分配的字节数组至少比 RDL (XML) 文件大 1 个字节时，就会出现此问题。

Specifically, I used a C# to vb.net converter, that converted

具体来说，我使用了一个 C# 到 vb.net 的转换器，它转换了

  btBuffer = new byte[fstrStream.Length];

into

进入

  btBuffer = New Byte(fstrStream.Length) {}

But because in C# the number denotes the NUMBER OF ELEMENTS in the array, and in VB.NET, that number denotes the UPPER BOUND of the array, I had an excess byte, causing this error.

但是因为在 C# 中，该数字表示数组中的 NUMBER OF ELEMENTS，而在 VB.NET 中，该数字表示数组的上限，我有一个多余的字节，导致此错误。

So the problem's solution is simply:

所以问题的解决方法很简单：

  btBuffer = New Byte(fstrStream.Length - 1) {}

Answer 6

回答by BrunoJCM

I'm using IronPython here (same as .NET API) and reading the file as UTF-8 in order to properly handle the BOM fixed the problem for me:

我在这里使用 IronPython（与 .NET API 相同）并将文件读取为 UTF-8，以便正确处理 BOM 为我解决了这个问题：

xmlFile = Path.Combine(directory_str, 'file.xml')
doc = XPathDocument(XmlTextReader(StreamReader(xmlFile.ToString(), Encoding.UTF8)))

It would work as well with the XmlDocument:

它也适用于XmlDocument：

doc = XmlDocument()
doc.Load(XmlTextReader(StreamReader(xmlFile.ToString(), Encoding.UTF8)))

.net 十六进制值 0x00 是无效字符

提问by jb.

采纳答案by Dour High Arch

Check file encoding (for UTF-16)

检查文件编码（对于 UTF-16）

回答by sonjz

回答by Mike-Monkey

回答by DEEPAK SHARMA

回答by Stefan Steiger

回答by BrunoJCM

相关推荐

最近更新

标签

.net 十六进制值 0x00 是无效字符

提问by jb.

采纳答案by Dour High Arch

Check file encoding (for UTF-16)

检查文件编码（对于 UTF-16）

回答by sonjz

回答by Mike-Monkey

回答by DEEPAK SHARMA

回答by Stefan Steiger

回答by BrunoJCM

相关推荐

.net 很抱歉，在执行您的要求时发生了一个错误

.net 如何使实体框架数据上下文只读

.net Dispatcher.CurrentDispatcher 与 Application.Current.Dispatcher

.NET 4.0 和 .NET 4.5 在 .NET 中的高级差异

相关推荐

最近更新

标签