java java中的文件名字符集问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3832761/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 03:35:22  来源:igfitidea点击:

File name charset problem in java

javaencodingjboss

提问by Llistes Sugra

Trying to open a file it states it cannot be found, due to a charset mismatch, when file names have accents. I work using UTF-8 on a linux system (/etc/locales sets UTF-8 as well). Running jboss with -Dfile.encoding=UTF-8 and environment variable JBOSS_ENCODING="UTF-8"

当文件名带有重音符号时,尝试打开一个文件,它指出由于字符集不匹配而无法找到它。我在 linux 系统上使用 UTF-8(/etc/locales 也设置了 UTF-8)。使用 -Dfile.encoding=UTF-8 和环境变量 JBOSS_ENCODING="UTF-8" 运行 jboss

With a JSP I am getting the name of the file :

使用 JSP 我得到文件的名称:

String fileName = element.getChildText("FileName");
out.println("File to be opened : " + filename);

Displays :

显示:

File to be opened : aaaaaà.txt

要打开的文件:aaaaaà.txt

But, a new File(fileName) won't work. Just file.exists() is false.

但是,新的 File(fileName) 将不起作用。只是 file.exists() 是假的。

Trying to:

尝试去:

File[] files = dir.listFiles();
for (int i=0; i<files.length; i++){
      out.println(fileName);

I get : aaaaa? .txt

我得到:aaaaa?。文本文件

Why is it reading and trying to open the file taking of the file in HDD as ISO-8859-1? Is it a JBoss config? A java config? How can I force java.io.File to read the file using the UTF-8 as the charset of the file name?

为什么它会读取并尝试将 HDD 中的文件作为 ISO-8859-1 打开?它是 JBoss 配置吗?一个java配置?如何强制 java.io.File 使用 UTF-8 作为文件名的字符集读取文件?

I've used other tools and the name is always read fine, using UTF-8.

我使用过其他工具,使用 UTF-8 时,名称始终可以正常读取。

(note I'm always talking about the name of the file, never the content, it could be a void file)

(注意我总是在谈论文件的名称,而不是内容,它可能是一个无效文件)

回答by Roland Illig

I am trying to track down the problem. Here is what I already have:

我正在努力追查问题。这是我已经拥有的:

There is Exists.java:

Exists.java

import java.io.*;

public class Exists {
  public static void main(String[] args) {
    new File("aaa").exists();
    new File("aaa\u00E4").exists();
    new File("aaa\u00C3\u00A4").exists();
  }
}

And there is java -version:

还有java -version

java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

Now to the interesting part:

现在到有趣的部分:

$ strace -f -o strace.out java Exists && grep 'stat("aaa' strace.out
31942 stat("aaa", 0x41464950)           = -1 ENOENT (No such file or directory)
31942 stat("aaa34", 0x41464950)   = -1 ENOENT (No such file or directory)
31942 stat("aaa3324", 0x41464950) = -1 ENOENT (No such file or directory)

The nice thing is that straceworks on byte-level, not character-level like Java. So everything is ok in this case. I have the environment variable LANGset to en_US.UTF-8, all of the LC_*variables are unset.

好的是它strace适用于字节级别,而不是像 Java 那样的字符级别。所以在这种情况下一切正常。我将环境变量LANG设置为en_US.UTF-8,所有LC_*变量都未设置。

Now tracking down the problem to a minimal working example:

现在将问题追踪到一个最小的工作示例:

$ strace -f -o strace.out env - LC_ALL=en_US.UTF-8 /home/roland/bin/java Exists && grep 'stat("aaa' strace.out
31968 stat("aaa", 0x41a75950)           = -1 ENOENT (No such file or directory)
31968 stat("aaa34", 0x41a75950)   = -1 ENOENT (No such file or directory)
31968 stat("aaa3324", 0x41a75950) = -1 ENOENT (No such file or directory)

That still works. So let's try another encoding:

那仍然有效。所以让我们尝试另一种编码:

$ strace -f -o strace.out env - LANG=en_US.ISO-8859-1 /home/roland/bin/java Exists && grep 'stat("aaa' strace.out
32070 stat("aaa", 0x407a3950)           = -1 ENOENT (No such file or directory)
32070 stat("aaa?", 0x407a3950)          = -1 ENOENT (No such file or directory)
32070 stat("aaa??", 0x407a3950)         = -1 ENOENT (No such file or directory)

So this doesn't work. One possible reason might be that I selected a locale that is not in the list printed by locale -a. But this shouldn't be the reason for Java to convert the letters to question marks.

所以这行不通。一个可能的原因可能是我选择了不在由locale -a. 但这不应该是 Java 将字母转换为问号的原因。

As soon as LANG points to a non-existing locale, the setting of the sun.jnu.encodingproperty doesn't have any effect anymore. So I'm out of ideas now.

只要 LANG 指向不存在的语言环境,该sun.jnu.encoding属性的设置就不再有任何影响。所以我现在没有想法了。