从 UTF-8 到 ISO-8859-1 的 Java 编码到 XML 文件中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21967222/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 11:43:43  来源:igfitidea点击:

Java encoding from UTF-8 to ISO-8859-1 into a XML file

javaxmlencodingutf-8iso-8859-1

提问by Cyril N.

I have been trying to convert a UTF-8 String to its relative in ISO-8859-1 for outputting it in an XML document, and no matter what I try, the output is always wrongly displayed.

我一直在尝试将 UTF-8 字符串转换为 ISO-8859-1 中的相对字符串,以将其输出到 XML 文档中,无论我尝试什么,输出总是错误地显示。

For simplifying the question, I created a code snippet with all the tests I did, and I copy/paste after that the generated document.

为了简化问题,我创建了一个代码片段,其中包含我所做的所有测试,然后复制/粘贴生成的文档。

You can also be sure I tried all the combination possible between new String(xxx.getBytes("UTF-8"), "ISO-8859-1"), by switching UTF & ISO, and sometimes also by setting the same value. Nothing works !

您也可以确定我尝试了 new 之间的所有可能组合String(xxx.getBytes("UTF-8"), "ISO-8859-1"),通过切换 UTF 和 ISO,有时还通过设置相同的值。什么都行不通!

Here's the snippet :

这是片段:

// @see http://stackoverflow.com/questions/229015/encoding-conversion-in-java
private static String changeEncoding(String input) throws Exception {
    // Create the encoder and decoder for ISO-8859-1
    Charset charset = Charset.forName("ISO-8859-1");
    CharsetDecoder decoder = charset.newDecoder();
    CharsetEncoder encoder = charset.newEncoder();

    // Convert a string to ISO-LATIN-1 bytes in a ByteBuffer
    // The new ByteBuffer is ready to be read.
    ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(input));

    // Convert ISO-LATIN-1 bytes in a ByteBuffer to a character ByteBuffer and then to a string.
    // The new ByteBuffer is ready to be read.
    CharBuffer cbuf = decoder.decode(bbuf);
    return cbuf.toString();
}

// @see http://stackoverflow.com/questions/655891/converting-utf-8-to-iso-8859-1-in-java-how-to-keep-it-as-single-byte
private static String byteEncoding(String input) throws Exception {
    Charset utf8charset = Charset.forName("UTF-8");
    Charset iso88591charset = Charset.forName("ISO-8859-1");

    ByteBuffer inputBuffer = ByteBuffer.wrap(input.getBytes());

    // decode UTF-8
    CharBuffer data = utf8charset.decode(inputBuffer);

    // encode ISO-8559-1
    ByteBuffer outputBuffer = iso88591charset.encode(data);
    byte[] outputData = outputBuffer.array();
    return new String(outputData, "ISO-8859-1");
}

public static Result home() throws Exception {
    DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder docBuilder = docFactory.newDocumentBuilder();

    //root elements
    Document doc = docBuilder.newDocument();
    doc.setXmlVersion("1.0");
    doc.setXmlStandalone(true);

    Element rootElement = doc.createElement("test");
    doc.appendChild(rootElement);

    rootElement.setAttribute("original", "héllo");

    rootElement.setAttribute("stringToString", new String("héllo".getBytes("UTF-8"), "ISO-8859-1"));

    rootElement.setAttribute("stringToBytes", changeEncoding("héllo"));

    rootElement.setAttribute("stringToBytes2", byteEncoding("héllo"));

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer transformer = tf.newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");

    StringWriter writer = new StringWriter();
    transformer.transform(new DOMSource(doc), new StreamResult(writer));
    String output = writer.getBuffer().toString().replaceAll("\n|\r", "");

    // The following is Play!Framework specifics for rendering an url, but I believe this is not the problem (I checked in the developer console, the document is correctly in "ISO-8859-1"
    response().setHeader("Content-Type", "text/xml; charset=ISO-8859-1");
    return ok(output).as("text/xml");
}

And the result :

结果:

<?xml version="1.0" encoding="ISO-8859-1"?>
<test original="h??llo" stringToBytes="h??llo" stringToBytes2="h??llo" stringToString="h????llo"/>

How can I proceed?

我该如何继续?

采纳答案by Cyril N.

For a reason I can't explain, by writing to a file and returning this file to the output fixed the problem of encoding.

由于我无法解释的原因,通过写入文件并将该文件返回到输出修复了编码问题。

I decided to keep this question in case other people had a similar problem.

我决定保留这个问题,以防其他人遇到类似问题。

Here's the snippet :

这是片段:

TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");

File file = new File("Path/to/file.xml");
transformer.transform(new DOMSource(doc), new StreamResult(file));

response().setHeader("Content-Disposition", "attachment;filename=" + file.getName());
response().setHeader("Content-Type", "text/xml; charset=ISO-8859-1");
return ok(file).as("text/xml");