设置默认的 Java 字符编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/361975/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 13:50:20  来源:igfitidea点击:

Setting the default Java character encoding

javautf-8character-encoding

提问by

How do I properly set the default character encoding used by the JVM (1.5.x) programmatically?

如何以编程方式正确设置 JVM (1.5.x) 使用的默认字符编码?

I have read that -Dfile.encoding=whateverused to be the way to go for older JVMs. I don't have that luxury for reasons I wont get into.

我读过这-Dfile.encoding=whatever曾经是旧 JVM 的方法。由于我不会进入的原因,我没有那种奢侈。

I have tried:

我试过了:

System.setProperty("file.encoding", "UTF-8");

And the property gets set, but it doesn't seem to cause the final getBytescall below to use UTF8:

并且属性被设置,但它似乎不会导致getBytes下面的最终调用使用 UTF8:

System.setProperty("file.encoding", "UTF-8");

byte inbytes[] = new byte[1024];

FileInputStream fis = new FileInputStream("response.txt");
fis.read(inbytes);
FileOutputStream fos = new FileOutputStream("response-2.txt");
String in = new String(inbytes, "UTF8");
fos.write(in.getBytes());

采纳答案by erickson

Unfortunately, the file.encodingproperty has to be specified as the JVM starts up; by the time your main method is entered, the character encoding used by String.getBytes()and the default constructors of InputStreamReaderand OutputStreamWriterhas been permanently cached.

不幸的是,file.encoding必须在 JVM 启动时指定该属性;通过输入您的主要方法时,编码中使用的字符由String.getBytes()和默认的构造函数InputStreamReader,并OutputStreamWriter已被永久缓存。

As Edward Grech points out,in a special case like this, the environment variable JAVA_TOOL_OPTIONScanbe used to specify this property, but it's normally done like this:

正如Edward Grech 指出的那样,在这样的特殊情况下,JAVA_TOOL_OPTIONS可以使用环境变量来指定此属性,但通常是这样完成的:

java -Dfile.encoding=UTF-8 … com.x.Main

Charset.defaultCharset()will reflect changes to the file.encodingproperty, but most of the code in the core Java libraries that need to determine the default character encoding do not use this mechanism.

Charset.defaultCharset()将反映对file.encoding属性的更改,但是需要确定默认字符编码的核心 Java 库中的大多数代码不使用此机制。

When you are encoding or decoding, you can query the file.encodingproperty or Charset.defaultCharset()to find the current default encoding, and use the appropriate method or constructor overload to specify it.

在编码或解码时,您可以查询file.encoding属性或Charset.defaultCharset()查找当前默认编码,并使用适当的方法或构造函数重载来指定它。

回答by Marc Novakowski

I can't answer your original question but I would like to offer you some advice -- don't depend on the JVM's default encoding. It's always best to explicitly specify the desired encoding (i.e. "UTF-8") in your code. That way, you know it will work even across different systems and JVM configurations.

我无法回答您最初的问题,但我想为您提供一些建议——不要依赖于 JVM 的默认编码。最好在代码中明确指定所需的编码(即“UTF-8”)。这样,您就知道它甚至可以跨不同的系统和 JVM 配置工作。

回答by Dov Wasserman

I think a better approach than setting the platform's default character set, especially as you seem to have restrictions on affecting the application deployment, let alone the platform, is to call the much safer String.getBytes("charsetName"). That way your application is not dependent on things beyond its control.

我认为比设置平台的默认字符集更好的方法是调用更安全的String.getBytes("charsetName"). 这样你的应用程序就不会依赖于它无法控制的东西。

I personally feel that String.getBytes()should be deprecated, as it has caused serious problems in a number of cases I have seen, where the developer did not account for the default charset possibly changing.

我个人认为String.getBytes()应该弃用它,因为它在我见过的许多情况下引起了严重的问题,其中开发人员没有考虑可能更改的默认字符集。

回答by Dov Wasserman

Not clear on what you do and don't have control over at this point. If you can interpose a different OutputStream class on the destination file, you could use a subtype of OutputStream which converts Strings to bytes under a charset you define, say UTF-8 by default. If modified UTF-8 is suffcient for your needs, you can use DataOutputStream.writeUTF(String):

不清楚你现在做什么和没有控制权。如果您可以在目标文件上插入不同的 OutputStream 类,则可以使用 OutputStream 的子类型,它将字符串转换为您定义的字符集下的字节,默认情况下为 UTF-8。如果修改后的 UTF-8 足以满足您的需求,您可以使用DataOutputStream.writeUTF(String)

byte inbytes[] = new byte[1024];
FileInputStream fis = new FileInputStream("response.txt");
fis.read(inbytes);
String in = new String(inbytes, "UTF8");
DataOutputStream out = new DataOutputStream(new FileOutputStream("response-2.txt"));
out.writeUTF(in); // no getBytes() here

If this approach is not feasible, it may help if you clarify here exactly what you can and can't control in terms of data flow and execution environment (though I know that's sometimes easier said than determined). Good luck.

如果这种方法不可行,那么如果您在这里明确说明在数据流和执行环境方面您可以控制和不能控制的内容可能会有所帮助(尽管我知道有时说起来容易做起来难)。祝你好运。

回答by Edward Grech

From the JVM? Tool Interfacedocumentation…

JVM?工具接口文档...

Since the command-line cannot always be accessed or modified, for example in embedded VMs or simply VMs launched deep within scripts, a JAVA_TOOL_OPTIONSvariable is provided so that agents may be launched in these cases.

由于无法始终访问或修改命令行,例如在嵌入式 VM 中或仅在脚本深处启动的 VM,JAVA_TOOL_OPTIONS提供了一个变量,以便在这些情况下可以启动代理。

By setting the (Windows) environment variable JAVA_TOOL_OPTIONSto -Dfile.encoding=UTF8, the (Java) Systemproperty will be set automatically every time a JVM is started. You will know that the parameter has been picked up because the following message will be posted to System.err:

通过将 (Windows) 环境变量设置JAVA_TOOL_OPTIONS-Dfile.encoding=UTF8System每次启动 JVM 时都会自动设置(Java)属性。您将知道该参数已被拾取,因为以下消息将发布到System.err

Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8

Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8

回答by lizi

We set there two system properties together and it makes the system take everything into utf8

我们将两个系统属性设置在一起,它使系统将所有内容都转换为 utf8

file.encoding=UTF8
client.encoding.override=UTF-8

回答by Emmanuel.B

Try this :

尝试这个 :

    new OutputStreamWriter( new FileOutputStream("Your_file_fullpath" ),Charset.forName("UTF8"))

回答by naskoos

I have a hacky way that definitely works!!

我有一个绝对有效的hacky方式!!

System.setProperty("file.encoding","UTF-8");
Field charset = Charset.class.getDeclaredField("defaultCharset");
charset.setAccessible(true);
charset.set(null,null);

This way you are going to trick JVM which would think that charset is not set and make it to set it again to UTF-8, on runtime!

通过这种方式,您将欺骗 JVM,JVM 会认为未设置字符集,并在运行时将其再次设置为 UTF-8!

回答by D Bright

We were having the same issues. We methodically tried several suggestions from this article (and others) to no avail. We also tried adding the -Dfile.encoding=UTF8and nothing seemed to be working.

我们遇到了同样的问题。我们有条不紊地尝试了这篇文章(和其他文章)中的一些建议,但无济于事。我们还尝试添加-Dfile.encoding=UTF8,但似乎没有任何效果。

For people that are having this issue, the following article finally helped us track down describes how the locale setting can break unicode/UTF-8in Java/Tomcat

对于遇到此问题的人,以下文章最终帮助我们找到了描述语言环境设置如何入侵unicode/UTF-8的方法Java/Tomcat

http://www.jvmhost.com/articles/locale-breaks-unicode-utf-8-java-tomcat

http://www.jvmhost.com/articles/locale-breaks-unicode-utf-8-java-tomcat

Setting the locale correctly in the ~/.bashrcfile worked for us.

~/.bashrc文件中正确设置语言环境对我们有用。

回答by Lavixu

I have tried a lot of things, but the sample code here works perfect. Link

我尝试了很多东西,但这里的示例代码完美无缺。 关联

The crux of the code is:

代码的关键是:

String s = "?? ??? ??? ?? ?????";
String out = new String(s.getBytes("UTF-8"), "ISO-8859-1");