Java 依赖默认编码，我应该使用什么以及为什么？

Question

提问by Nikolas

FindBugs reports a bug:

FindBugs 报告了一个错误：

Reliance on default encoding Found a call to a method which will perform a byte to String (or String to byte) conversion, and will assume that the default platform encoding is suitable. This will cause the application behaviour to vary between platforms. Use an alternative API and specify a charset name or Charset object explicitly.

对默认编码的依赖找到了一个方法调用，该方法将执行字节到字符串（或字符串到字节）的转换，并假定默认平台编码是合适的。这将导致应用程序行为因平台而异。使用替代 API 并明确指定字符集名称或字符集对象。

I used FileReader like this (just a piece of code):

我像这样使用 FileReader（只是一段代码）：

public ArrayList<String> getValuesFromFile(File file){
    String line;
    StringTokenizer token;
    ArrayList<String> list = null;
    BufferedReader br = null;
    try {
        br = new BufferedReader(new FileReader(file));
        list = new ArrayList<String>();
        while ((line = br.readLine())!=null){
            token = new StringTokenizer(line);
            token.nextToken();
            list.add(token.nextToken());
    ...

To correct the bug I need to change

为了纠正我需要改变的错误

br = new BufferedReader(new FileReader(file));

to

到

br = new BufferedReader(new InputStreamReader(new FileInputStream(file), Charset.defaultCharset()));

And when I use PrintWriter the same error occurred. So now I have a question. When I can (should) use FileReader and PrintWriter, if it's not good practice rely on default encoding? And the second question is to properly use Charset.defaultCharset ()? I decided use this method for automatically defining charset of the user's OS.

当我使用 PrintWriter 时，发生了同样的错误。所以现在我有一个问题。我什么时候可以（应该）使用 FileReader 和 PrintWriter，如果依赖默认编码不是一个好习惯？而第二个问题是如何正确使用Charset.defaultCharset()？我决定使用这种方法来自动定义用户操作系统的字符集。

Answer 1

采纳答案by JB Nizet

If the file is under the control of your application, and if you want the file to be encoded in the platform's default encoding, then you can use the default platform encoding. Specifying it explicitely makes it clearer, for you and future maintainers, that this is your intention. This would be a reasonable default for a text editor, for example, which would then write files that any other editor on this platform would then be able to read.

如果文件在您的应用程序的控制之下，并且您希望文件以平台的默认编码进行编码，那么您可以使用默认平台编码。对于你和未来的维护者来说，明确地指定它会让你更清楚这是你的意图。例如，这对于文本编辑器来说是一个合理的默认设置，然后它会写入该平台上的任何其他编辑器都可以读取的文件。

If, on the other hand, you want to make sure that any possible character can be written in your file, you should use a universal encoding like UTF8.

另一方面，如果您想确保可以在您的文件中写入任何可能的字符，您应该使用像 UTF8 这样的通用编码。

And if the file comes from an external application, or is supposed to be compatible with an external application, then you should use the encoding that this external application expects.

如果文件来自外部应用程序，或者应该与外部应用程序兼容，那么您应该使用该外部应用程序期望的编码。

What you must realize is that if you write a file like you're doing on a machine, and read it as you're doing on another machine, which doesn't have the same default encoding, you won't necessarily be able to read what you have written. Using a specific encoding, to write and read, like UTF8 makes sure the file will always be the same, whatever platform is used when writing the file.

你必须意识到的是，如果你像在一台机器上那样写一个文件，然后像在另一台机器上那样读取它，这台机器没有相同的默认编码，你不一定能够阅读你所写的。使用特定的编码来写入和读取，如 UTF8，可确保文件始终相同，无论在写入文件时使用什么平台。

Answer 2

回答by TwoThe

You should use default encoding whenever you read a file that is outside your application and can be assumed to be in the user's local encoding, for example user written text files. You might want to use the default encoding when writing such files, depending on what the user is going to do with that file later.

每当您读取应用程序之外的文件并且可以假定为用户的本地编码时，您应该使用默认编码，例如用户编写的文本文件。您可能希望在写入此类文件时使用默认编码，具体取决于用户稍后将对该文件执行的操作。

You should notuse default encoding for any other file, especially application relevant files.

你应该不使用默认编码的任何其他文件，尤其是应用程序相关的文件。

If you application for example writes configuration files in text format, you should always specify the encoding. In general UTF-8 is always a good choice, as it is compatible to almost everything. Not doing so might cause surprise crashes by users in other countries.

例如，如果您的应用程序以文本格式写入配置文件，则应始终指定编码。一般来说，UTF-8 总是一个不错的选择，因为它几乎兼容所有东西。不这样做可能会导致其他国家/地区的用户意外崩溃。

This is not only limited to character encoding, but as well to date/time, numeric or other language specific formats. If you for example use default encoding and default date/time strings on a US machine, then try to read that file on a German server, you might be surprised why one half is gibberish and the other half has month/days confused or is off by one hour because of daylight saving time.

这不仅限于字符编码，还包括日期/时间、数字或其他语言特定格式。例如，如果您在美国机器上使用默认编码和默认日期/时间字符串，然后尝试在德国服务器上读取该文件，您可能会惊讶为什么一半是胡言乱语而另一半有月/日混淆或关闭由于夏令时，提前一小时。

Answer 3

回答by McDowell

Ideally, it should be:

理想情况下，它应该是：

try (InputStream in = new FileInputStream(file);
     Reader reader = new InputStreamReader(in, StandardCharsets.UTF_8);
     BufferedReader br = new BufferedReader(reader)) {

...or:

...或者：

try (BufferedReader br = Files.newBufferedReader(path, StandardCharsets.UTF_8)) {

...assuming the file is encoded as UTF-8.

...假设文件编码为 UTF-8。

Pretty much every encoding that isn't a Unicode Transformation Format is obsolete for natural language data. There are languages you cannot support without Unicode.

对于自然语言数据，几乎所有不是 Unicode 转换格式的编码都已过时。如果没有 Unicode，您将无法支持某些语言。

Answer 4

回答by prime

When you are using a PrintWriter,

当您使用 PrintWriter 时，

File file = new File(file_path);
Writer w = new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_16.name());
PrintWriter pw = new PrintWriter(w);
pw.println(content_to_write);
pw.close();

Java 依赖默认编码，我应该使用什么以及为什么？

提问by Nikolas

采纳答案by JB Nizet

回答by TwoThe

回答by McDowell

回答by prime

相关推荐

最近更新

标签

Java 依赖默认编码，我应该使用什么以及为什么？

提问by Nikolas

采纳答案by JB Nizet

回答by TwoThe

回答by McDowell

回答by prime

相关推荐

Java 从 Spring MVC 中的控制器操作重定向到 JSP 文件

Java 在 Windows 64 位平台上安装 32 位 JRE

JavaFX：如何在 JavaFX 中正确使用 ProgressIndicator

Java 我可以查询列表吗？爪哇

相关推荐

最近更新

标签