Java、XML DocumentBuilder - 解析时设置编码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3578395/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java, XML DocumentBuilder - setting the encoding when parsing
提问by Ralph D
I'm trying to save a tree (extends JTree
) which holds an XML
document to a DOM Object
having changed it's structure.
我正在尝试保存一棵树(扩展JTree
),该树将XML
文档保存为DOM Object
已更改其结构的文件。
I have created a new document object, traversed the tree to retrieve the contents successfully (including the original encoding of the XML
document), and now have a ByteArrayInputStream
which has the tree contents (XML
document) with the correct encoding.
我创建了一个新的文档对象,遍历树以成功检索内容(包括XML
文档的原始编码),现在有一个ByteArrayInputStream
具有XML
正确编码的树内容(文档)。
The problem is when I parse the ByteArrayInputStream
the encoding is changed to UTF-8
(in the XML
document) automatically.
问题是当我解析ByteArrayInputStream
编码时自动更改为UTF-8
(在XML
文档中)。
Is there a way to prevent this and use the correct encoding as provided in the ByteArrayInputStream
.
有没有办法防止这种情况发生并使用ByteArrayInputStream
.
It's also worth adding that I have already used thetransformer.setOutputProperty(OutputKeys.ENCODING, encoding)
method to retrieve the right encoding.
还值得补充的是,我已经使用该transformer.setOutputProperty(OutputKeys.ENCODING, encoding)
方法来检索正确的编码。
Any help would be appreciated.
任何帮助,将不胜感激。
回答by Ralph D
I solved it, given alot of trial and errors.
我解决了它,经过大量的试验和错误。
I was using
我正在使用
OutputFormat format = new OutputFormat(document);
but changed it to
但将其更改为
OutputFormat format = new OutputFormat(d, encoding, true);
and this solved my problem.
这解决了我的问题。
encoding
is what I set it to betrue
refers to whether or not indent is set.
encoding
是我设置它是true
指是否设置了缩进。
Note to self - read more carefully - I had looked at the javadoc hours ago - if only I'd have read more carefully.
自我注意-更仔细地阅读-我在几个小时前看过javadoc-如果我能更仔细地阅读就好了。
回答by Andrey
// Read XML
String xml = "xml"
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
// Append formatting
OutputFormat format = new OutputFormat(document);
if (document.getXmlEncoding() != null) {
format.setEncoding(document.getXmlEncoding());
}
format.setLineWidth(100);
format.setIndenting(true);
format.setIndent(5);
Writer out = new StringWriter();
XMLSerializer serializer = new XMLSerializer(out, format);
serializer.serialize(document);
String result = out.toString();
回答by Cyril N.
Here's an updated answer since OutputFormat is deprecated :
这是一个更新的答案,因为 OutputFormat 已被弃用:
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
String output = writer.getBuffer().toString().replaceAll("\n|\r", "");
The second part will return the XML Document as String
第二部分将 XML 文档作为字符串返回
回答by Matt
This worked for me and is very simple. No need for a transformer or output formatter:
这对我有用,而且非常简单。不需要转换器或输出格式化程序:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(inputStream);
is.setEncoding("ISO-8859-1"); // set your encoding here
Document document = builder.parse(is);