关于 Java 上无效 XML 字符的错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2362302/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Error about invalid XML characters on Java
提问by Giancarlo
Parsing an xml file on Java I get the error:
在 Java 上解析 xml 文件时出现错误:
An invalid XML character (Unicode: 0x0) was found in the element content of the document.
An invalid XML character (Unicode: 0x0) was found in the element content of the document.
The xml comes from a webservice.
xml 来自网络服务。
The problem is that I get the error only when the webservice is running on localhost (windows+tomcat), but not when the webservice is online (linux+tomcat).
问题是只有当 webservice 在 localhost (windows+tomcat) 上运行时我才会收到错误,但当 webservice 在线时 (linux+tomcat) 不会。
How can I replace the invalid char?? Thanks.
如何替换无效字符?谢谢。
采纳答案by Giancarlo
fixed with this code:
使用此代码修复:
String cleanXMLString = null;
Pattern pattern = null;
Matcher matcher = null;
pattern = Pattern.compile("[\000]*");
matcher = pattern.matcher(dirtyXMLString);
if (matcher.find()) {
cleanXMLString = matcher.replaceAll("");
}
回答by Karussell
This is an encoding issue. Either you read it the inputstream as UTF8 and it isn't or the other way around.
这是编码问题。要么您将输入流读为 UTF8 而不是,要么相反。
You should specify the encoding explicitly when you read the content. E.g. via
您应该在阅读内容时明确指定编码。例如通过
new InputStreamReader(getInputStream(), "UTF-8")
Another problem could be the tomcat. Try to add URIEncoding="UTF-8" in your tomcat's connector settings in the server.xml file. Because:
另一个问题可能是tomcat。尝试在 server.xml 文件的 tomcat 连接器设置中添加 URIEncoding="UTF-8"。因为:
It turned out that the JSP specification says that if the page encoding of the JSP pages is not explicitely declared, then ISO-8859-1 should be used (!).
原来,JSP 规范说,如果未明确声明 JSP 页面的页面编码,则应使用 ISO-8859-1(!)。
Taken from here.
取自这里。
回答by Mark Davidson
A bit of looking around reveals that 0x0 is a null character, someone else had the same problem with XML and null characters here http://forums.sun.com/thread.jspa?threadID=579849. Not sure how you are parsing the XML but if you get it as a string first there is some discusion on how to replace the null here http://forums.sun.com/thread.jspa?threadID=628189.
环顾四周,发现 0x0 是一个空字符,其他人在http://forums.sun.com/thread.jspa?threadID=579849 处遇到了与 XML 和空字符相同的问题。不确定您是如何解析 XML 的,但是如果您首先将其作为字符串获取,则有一些关于如何在http://forums.sun.com/thread.jspa?threadID=628189 中替换 null 的讨论。
回答by Buhake Sindi
Unicode character 0x0
represents NULL
meaning that the data you're pulling contains a NULL somewhere (which is not allowed in XML and hence your error).
Unicode 字符0x0
表示NULL
您正在提取的数据在某处包含 NULL(这在 XML 中是不允许的,因此您的错误)。
Make sure that you find out what causes the NULL in the first place.
确保您首先找出导致 NULL 的原因。
Also, how are you interacting with the WebService? If you're using Axis, make sure that the WSDL has some encoding specified for data in and out.
另外,您如何与 WebService 交互?如果您使用的是 Axis,请确保 WSDL 为输入和输出数据指定了某种编码。