windows 4 字节 UTF-8 序列的字节 2 无效,但仅在执行 JAR 时?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8074068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Invalid byte 2 of 4-byte UTF-8 sequence, but only when executing JAR?
提问by Daniel Montes de Oca
I have this java program where I transform with TransformerFactory a XML string that I get from a SQL Server database and write it to a file, and then use this file to generate a PDF.
我有这个 java 程序,我用 TransformerFactory 转换了我从 SQL Server 数据库获取的 XML 字符串并将其写入文件,然后使用该文件生成 PDF。
The thing is that it works fine when I execute it with netbeans, but if I execute the jar in the project dist folder I get a "Invalid byte 2 of 4-byte UTF-8 sequence".
问题是当我使用 netbeans 执行它时它工作正常,但是如果我在项目 dist 文件夹中执行 jar,我会得到“4 字节 UTF-8 序列的无效字节 2”。
After changing the encoding of the XML string to UTF-8 now it works fine from the jar too.
将 XML 字符串的编码更改为 UTF-8 后,它现在也可以从 jar 中正常工作。
So my question is, why would it work when running the project in NetBeans but not from the JAR file before changing the encoding?
所以我的问题是,为什么在更改编码之前在 NetBeans 中运行项目而不是在 JAR 文件中运行项目时它会起作用?
Have tried this only in Windows.
仅在 Windows 中尝试过。
Code:
代码:
Here is the SQL Server query (original):
这是 SQL Server 查询(原始):
SQLXML xml = null;
String xmlString = "";
while (rs.next()){
xml = rs.getSQLXML(1);
xmlString = xml.getString();
}
return xmlString;
...and modified:
...并修改:
SQLXML xml = null;
String xmlString = "";
while (rs.next()){
xml = rs.getSQLXML(1);
// Note explicit UTF-8 encoding specified
xmlString = new String(xml.getString().getBytes(),"UTF8");
}
return xmlString;
And here the transformation:
这里的转换:
public static void serialize(Document doc, OutputStream out) throws Exception {
TransformerFactory tfactory = TransformerFactory.newInstance();
try {
Transformer serializer = tfactory.newTransformer();
serializer.setOutputProperty("indent", "yes");
serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
serializer.transform(new DOMSource(doc), new StreamResult(out));
} catch (TransformerException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}
采纳答案by Luciano
I've tried a simple Application in Netbeans that displays the Charset.defaultCharset(), and it returns "UTF-8". The same one in Eclipse returns "MacRoman". I'm on a Mac, on Windows it'd return "cp-1252".
我在 Netbeans 中尝试了一个简单的应用程序,它显示Charset.defaultCharset(),它返回“UTF-8”。Eclipse 中的同一个返回“MacRoman”。我在 Mac 上,在 Windows 上它会返回“cp-1252”。
So yes, when you run an Application in Netbeans, it defaults to UTF-8 encoding, that's why you didn't have any issues when parsing the XML.
所以是的,当您在 Netbeans 中运行应用程序时,它默认为 UTF-8 编码,这就是您在解析 XML 时没有任何问题的原因。