Java xml 解析中的 UTF-8 问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9696220/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UTF-8 Issue in xml parsing
提问by divz
I am using the following codes to convert XML contents to UTF-8, but they are not working properly:
我正在使用以下代码将 XML 内容转换为 UTF-8,但它们无法正常工作:
1.
1.
InputStream is = new ByteArrayInputStream(strXMLAlert.getBytes("UTF-8"));
Document doc = db.parse(is);
2.
2.
InputSource is = new InputSource(new ByteArrayInputStream(strXMLAlert.getBytes()));
is.setCharacterStream(new StringReader(strXMLAlert));
is.setEncoding("UTF-8");
Document doc = db.parse(is);
采纳答案by Mike Mansell
We probably need a bit more information to answer the question properly. For example, what problem are you seeing? Which Java version are you running?
我们可能需要更多信息才能正确回答问题。例如,您看到了什么问题?您运行的是哪个 Java 版本?
However, expanding your first example to
但是,将您的第一个示例扩展为
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
String strXMLAlert = "<a>永</a>";
InputStream is = new ByteArrayInputStream(strXMLAlert.getBytes("UTF-8"));
Document document = db.parse(is);
Node item = document.getDocumentElement().getChildNodes().item(0);
String nodeValue = item.getNodeValue();
System.out.println(nodeValue);
In this example, there is a Chinese character in the string. It successfully prints out
在这个例子中,字符串中有一个汉字。它成功打印出来
永
Your second example should also work, although you are providing the content twice. Either provide it as a set of bytes and provide the encoding, or just provide it as characters (the StringReader) and you don't need the encoding (since as characters, it's already been decoded from bytes to characters).
您的第二个示例也应该有效,尽管您提供了两次内容。要么将其作为一组字节提供并提供编码,要么仅将其作为字符(StringReader)提供而您不需要编码(因为作为字符,它已经从字节解码为字符)。