windows 从可运行的 Jar 在 Java 中创建 UTF-8 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3033081/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating UTF-8 files in Java from a runnable Jar
提问by RuntimeError
I have a little Java project where I've set the properties of the class files to UTF-8 (I use a lot of foreign characters not found on the default CP1252).
我有一个小 Java 项目,我将类文件的属性设置为 UTF-8(我使用了很多在默认 CP1252 上找不到的外来字符)。
The goal is to create a text file (in Windows) containing a list of items. When running the class files from Eclipse itself (hitting Ctrl+F11) it creates the file flawlessly and opening it in another editor (I'm using Notepad++) I can see the characters as I wanted.
目标是创建一个包含项目列表的文本文件(在 Windows 中)。当从 Eclipse 本身运行类文件(按 Ctrl+F11)时,它会完美地创建文件并在另一个编辑器中打开它(我使用的是 Notepad++),我可以看到我想要的字符。
┌──────────────────────────────────────────────────┐
│ Universidade2010 (18/18)│
│ hidden: 0│
├──────────────────────────────────────────────────┤
But, when I export the project (using Eclipse) as a runnable Jar and run it using 'javaw -jar project.jar' the new file created is a mess of question marks
但是,当我将项目(使用 Eclipse)导出为可运行的 Jar 并使用“javaw -jar project.jar”运行它时,创建的新文件是一堆问号
????????????????????????????????????????????????????
? Universidade2010 (19/19)?
? hidden: 0?
????????????????????????????????????????????????????
I've followed some tips on how to use UTF-8 (which seems to be broken by default on Java) to try to correct this so now I'm using
我遵循了一些关于如何使用 UTF-8(在 Java 上似乎默认被破坏)的提示来尝试纠正这个问题,所以现在我正在使用
Writer w = new OutputStreamWriter(fos, "UTF-8");
and writing the BOM header to the file like in this question already answeredbut still without luck when exporting to Jar
并将 BOM 标头写入文件,就像在这个问题中已经回答但在导出到 Jar 时仍然没有运气
Am I missing some property or command-line command so Java knows I want to create UTF-8 files by default ?
我是否缺少某些属性或命令行命令以便 Java 知道我想默认创建 UTF-8 文件?
the problem is not on the creating the file itself , because while developing the file is outputted correctly (with the unicode characters)
问题不在于创建文件本身,因为在开发文件时会正确输出(使用 unicode 字符)
The class that creates the file is now (and following the suggestion of using the Charset class) like this:
创建文件的类现在(并遵循使用 Charset 类的建议)如下所示:
public class Printer {
File f;
FileOutputStream fos;
Writer w;
final byte[] utf8_bom = { (byte) 0xEF, (byte) 0xBB, (byte) 0xBF };
public Printer(String filename){
f = new File(filename);
try {
fos = new FileOutputStream(f);
w = new OutputStreamWriter(fos, Charset.forName("UTF-8"));
fos.write(utf8_bom);
} catch (FileNotFoundException e) {
} catch (IOException e) {
e.printStackTrace();
}
}
public void print(String s) {
if(fos != null){
try {
fos.write(s.getBytes());
fos.flush();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
And all characters being used are defined as such:
所有使用的字符都定义为:
private final char pipe = '\u2502'; /* │ */
private final char line = '\u2500'; /* ─ */
private final char pipeleft = '\u251c'; /* ├ */
private final char piperight = '\u2524'; /* ┤ */
private final char cupleft = '\u250c'; /* ┌ */
private final char cupright = '\u2510'; /* ┐ */
private final char cdownleft = '\u2514'; /* └ */
private final char cdownright = '\u2518'; /* ┘ */
The problem remains, when outputting to a file simply by running the project on Eclipse, the file comes out perfect, but after deploying the project to a Jar and running it the outputted file has the formatting destroyed (I've found out that they are replaced by the '?' char)
问题仍然存在,当仅通过在 Eclipse 上运行项目输出到文件时,文件就完美了,但是在将项目部署到 Jar 并运行它之后,输出的文件的格式被破坏了(我发现它们是替换为 '?' 字符)
I've come to thinking this is not a problem with the code, is a problem from deploying it into a Jar file, I think Eclipse is compiling the source files to CP1252 or something, but even replacing all unicode chars by their code constants didn't help
我开始认为这不是代码的问题,是将它部署到 Jar 文件中的问题,我认为 Eclipse 正在将源文件编译为 CP1252 或其他内容,但即使用代码常量替换所有 unicode 字符也没有没有帮助
回答by McDowell
I've followed some tips on how to use UTF-8 (which seems to be broken by default on Java)
我遵循了一些有关如何使用 UTF-8 的提示(在 Java 上似乎默认情况下已损坏)
For historical reasons, Java's encoding defaults to the system encoding (something that made more sense back on Windows 95). This behaviour isn't likely to change. To my knowledge, there isn't anything broken about Java's encoder implementation.
由于历史原因,Java 的编码默认为系统编码(这在 Windows 95 上更有意义)。这种行为不太可能改变。据我所知,Java 的编码器实现没有任何问题。
private static final String BOM = "\ufeff";
public static void main(String[] args) throws IOException {
String data = "\u250c\u2500\u2500\u2510\r\n\u251c\u2500\u2500\u2524";
OutputStream out = new FileOutputStream("data.txt");
Closeable resource = out;
try {
Writer writer = new OutputStreamWriter(out, Charset.forName("UTF-8"));
resource = writer;
writer.write(BOM);
writer.write(data);
} finally {
resource.close();
}
}
The above code will emit the following text prefixed with a byte order mark:
上面的代码将发出以下带有字节顺序标记前缀的文本:
┌──┐
├──┤
┌──┐
├──┤
Windows apps like Notepad can infer the encoding from the BOM and decode the file correctly.
记事本等 Windows 应用程序可以从 BOM 推断编码并正确解码文件。
Without code, it isn't possible to spot any errors.
没有代码,就不可能发现任何错误。
Am I missing some property or command-line command so Java knows I want to create UTF-8 files by default?
我是否缺少某些属性或命令行命令以便 Java 知道我想默认创建 UTF-8 文件?
No - there is no such setting. Some might suggest setting file.encoding
on the command line, but this is a bad idea.
不 - 没有这样的设置。有些人可能会建议file.encoding
在命令行上进行设置,但这是一个坏主意。
I wrote a more comprehensive blog post on the subject here.
我在这里写了一篇关于这个主题的更全面的博客文章。
This is a reworking of your code:
这是您的代码的返工:
public class Printer implements Closeable {
private PrintWriter pw;
private boolean error;
public Printer(String name) {
try {
pw = new PrintWriter(name, "UTF-8");
pw.print('\uFEFF'); // BOM
error = false;
} catch (IOException e) {
error = true;
}
}
public void print(String s) {
if (pw == null) return;
pw.print(s);
pw.flush();
}
public boolean checkError() { return error || pw.checkError(); }
@Override public void close() { if (pw != null) pw.close(); }
}
Most of the functionality you want already exists in PrintWriter
. Note that you should provide some mechanism to check for underlying errors and close the stream (or you risk leaking file handles).
您想要的大部分功能已经存在于PrintWriter
. 请注意,您应该提供一些机制来检查底层错误并关闭流(否则可能会泄漏文件句柄)。