java控制台输出的默认字符编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24803733/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 14:55:25  来源:igfitidea点击:

Default character encoding for java console output

javawindowsutf-8character-encodingconsole

提问by michas

How does Java determine the encoding used for System.out?

Java 如何确定用于的编码System.out

Given the following class:

鉴于以下类:

import java.io.File;
import java.io.PrintWriter;

public class Foo
{
    public static void main(String[] args) throws Exception
    {
        String s = "xx??xx";
        System.out.println(s);
        PrintWriter out = new PrintWriter(new File("test.txt"), "UTF-8");
        out.println(s);
        out.close();
    }
}

It is saved as UTF-8 and compiled with javac -encoding UTF-8 Foo.javaon a Windows system.

它被保存为 UTF-8 并javac -encoding UTF-8 Foo.java在 Windows 系统上编译。

Afterwards on a git-bash console (using UTF-8 charset) I do:

之后在 git-bash 控制台上(使用 UTF-8 字符集)我这样做:

$ java Foo
xx?±xx
$ java -Dfile.encoding=UTF-8 Foo
xx├?├?xx
$ cat test.txt
xx??xx
$ java Foo | cat
xx??xx
$ java -Dfile.encoding=UTF-8 Foo | cat
xx??xx

What is going on here?

这里发生了什么?

Obviously java checks if it is connected to a terminal and is changing its encoding in that case. Is there a way to force Java to simply output plain UTF-8?

显然,java 检查它是否连接到终端并在这种情况下更改其编码。有没有办法强制 Java 简单地输出纯 UTF-8?



I tried the same with the cmd console, too. Redirecting STDOUT does not seem to make any difference there. Without the file.encoding parameter it outputs ansi encoding with the parameter it outputs utf8 encoding.

我也用 cmd 控制台尝试了同样的方法。重定向 STDOUT 在那里似乎没有任何区别。如果没有 file.encoding 参数,它会输出 ansi 编码,而它会输出 utf8 编码的参数。

采纳答案by McDowell

I'm assuming that your console still runs under cmd.exe. I doubt your console is really expecting UTF-8 - I expect it is really an OEM DOS encoding (e.g. 850 or 437.)

我假设您的控制台仍然在 cmd.exe 下运行。我怀疑你的控制台真的期待 UTF-8 - 我希望它真的是一个 OEM DOS 编码(例如850 或 437。)

Java will encode bytes using the default encodingset during JVM initialization.

Java 将在 JVM 初始化期间使用默认编码集对字节进行编码

Reproducing on my PC:

在我的电脑上复制:

java Foo

Java encodes as windows-1252; console decodes as IBM850. Result: Mojibake

Java 编码为 windows-1252;控制台解码为 IBM850。结果:Mojibake

java -Dfile.encoding=UTF-8 Foo

Java encodes as UTF-8; console decodes as IBM850. Result: Mojibake

Java 编码为 UTF-8;控制台解码为 IBM850。结果:Mojibake

cat test.txt

cat decodes file as UTF-8; cat encodes as IBM850; console decodes as IBM850.

cat 将文件解码为 UTF-8;cat 编码为 IBM850;控制台解码为 IBM850。

java Foo | cat

Java encodes as windows-1252; cat decodes as windows-1252; cat encodes as IBM850; console decodes as IBM850

Java 编码为 windows-1252;cat 解码为 windows-1252;cat 编码为 IBM850;控制台解码为 IBM850

java -Dfile.encoding=UTF-8 Foo | cat

Java encodes as UTF-8; cat decodes as UTF-8; cat encodes as IBM850; console decodes as IBM850

Java 编码为 UTF-8;cat 解码为 UTF-8;cat 编码为 IBM850;控制台解码为 IBM850

This implementation of catmust use heuristics to determine if the character data is UTF-8 or not, then transcodes the data from either UTF-8 or ANSI (e.g. windows-1252) to the console encoding (e.g. IBM850.)

cat 的这种实现必须使用启发式方法来确定字符数据是否为 ​​UTF-8,然后将数据从 UTF-8 或 ANSI(例如 windows-1252)转码为控制台编码(例如 IBM850)。

This can be confirmed with the following commands:

这可以通过以下命令确认:

$ java HexDump utf8.txt
78 78 c3 a4 c3 b1 78 78

$ cat utf8.txt
xx??xx

$ java HexDump ansi.txt
78 78 e4 f1 78 78

$ cat ansi.txt
xx??xx

The catcommand can make this determination because e4 f1is not a valid UTF-8 sequence.

命令可以做出此决定,因为e4 f1不是有效的UTF-8序列。

You can correct the Java output by:

您可以通过以下方式更正 Java 输出:

HexDumpis a trivial Java application:

HexDump是一个简单的 Java 应用程序:

import java.io.*;
class HexDump {
  public static void main(String[] args) throws IOException {
    try (InputStream in = new FileInputStream(args[0])) {
      int r;
      while((r = in.read()) != -1) {
        System.out.format("%02x ", 0xFF & r);
      }
      System.out.println();
    }
  }
}