java 使用 Unicode 字符作为 zip 存档中的文件名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9974779/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 23:07:27  来源:igfitidea点击:

Using Unicode characters for file names inside a zip archive

javafilezip

提问by Maddy

I am zipping a file name contains some special characters like Péréquation LES HOPITAUX NEUFS.xlsto a different folder, say temp.

我正在将包含一些特殊字符(如Péréquation LES HOPITAUX NEUFS.xls)的文件名压缩到不同的文件夹,例如temp

I am able to zip the file but the problem is the name of file is changing automatically to P+?r+?quation LES HOPITAUX NEUFS.xls.

我能够压缩文件,但问题是文件名自动更改为 P+?r+?quation LES HOPITAUX NEUFS.xls

How can I support unicode characters for file names inside a zip archive?

如何支持 zip 存档中文件名的 unicode 字符?

回答by Adriano Repetti

It depends a little bit on what code you're using to create the archive. The oldJava compression classes are not so flexible as you need.

这在一定程度上取决于您用于创建存档的代码。在的Java压缩类不是很灵活,因为你需要。

You may use Apache Commons Compress. Michael Simonswrote this nice piece of code:

您可以使用Apache Commons CompressMichael Simons写了一段不错的代码:

ZipArchiveOutputStream ostream = ...; // Your initialization code here
ostream.setEncoding("Cp437"); // This should handle your "special" characters
ostream.setFallbackToUTF8(true); // For "unknown" characters!
ostream.setUseLanguageEncodingFlag(true);                               
ostream.setCreateUnicodeExtraFields(
    ZipArchiveOutputStream.UnicodeExtraFieldPolicy.NOT_ENCODEABLE);

If you're using Java 7then you finally have a Charsetparameter (that can be UTF-8) on the ZipOutputStream constructor

如果您使用的是 Java 7,那么您最终CharsetZipOutputStream 构造函数上有一个参数(可以是 UTF-8)

The big problem, anyway, is that many implementations don't understand Unicode encoding because originalZIP file format is ASCII and there is not an official standard for Unicode. See this postfor further details.

无论如何,最大的问题是许多实现不理解 Unicode 编码,因为原始ZIP 文件格式是 ASCII 并且没有 Unicode 的官方标准。有关更多详细信息,请参阅此帖子

回答by dharam

The Zip specification (historically) does not specify what character encoding to be used for the embedded file names and comments, the original IBM PC character encoding set, commonly referred to as IBM Code Page 437, is supposed to be the only encoding supported. Jar specification meanwhile explicitly specifies to use UTF-8 as the encoding to encode and decode all file names and comments in Jar files. Our java.util.jar and java.util.zip implementation therefor strictly followed Jar specification to use UTF-8 as the sole encoding when dealing with the file names and comments stored in Jar/Zip files.

Zip 规范(历史上)没有指定用于嵌入文件名和注释的字符编码,原始的 IBM PC 字符编码集,通常称为 IBM 代码页 437,应该是唯一支持的编码。Jar 规范同时明确规定使用 UTF-8 作为编码,对 Jar 文件中的所有文件名和注释进行编码和解码。我们的 java.util.jar 和 java.util.zip 实现严格遵循 Jar 规范,在处理存储在 Jar/Zip 文件中的文件名和注释时,使用 UTF-8 作为唯一编码。

Consequence? the ZIP file created by "traditional" ZIP tool is not accessible for java.util.jar/zip based tool, and vice versa, if the file name contains characters that are not compatible between Cp437 (as an alternative, tools might simply use the default platform encoding) and UTF-8

结果?基于 java.util.jar/zip 的工具无法访问由“传统”ZIP 工具创建的 ZIP 文件,反之亦然,如果文件名包含 Cp437 之间不兼容的字符(作为替代,工具可能只是使用默认平台编码)和 UTF-8

For most European, you're "lucky":-) that you only need to avoid a "handful" of characters, such as the umlauts (OK, I'm just kidding), but for Japanese and Chinese, most of the characters are simply out of luck. This is why bug 4244499 had been the No.1 on the Top 25 Java Bugs for so many years. The bug is no longer on the list:-) it has been finally "fixed" in OpenJDK 7, b57. I still keep a snapshot as the record/kudo for myself:-)

对于大多数欧洲人来说,你很“幸运”:-) 你只需要避免“少数”字符,例如变音符号(好吧,我只是在开玩笑),但对于日语和中文,大多数字符只是运气不好。这就是为什么错误 4244499 多年来一直是前 25 个 Java 错误中的第一名。该错误不再在列表中:-) 它最终在 OpenJDK 7 b57 中“修复”。我仍然保留一个快照作为自己的记录/荣誉:-)

The solution (I would use "solution" than a "fix") in JDK7 b57 is to introduce in a new set of ZipInputStream ZipOutStream and ZipFile constructors with a specific "charset" as the parameter, as showed below.

JDK7 b57 中的解决方案(我会使用“解决方案”而不是“修复”)是在一组新的 ZipInputStream ZipOutStream 和 ZipFile 构造函数中引入特定的“字符集”作为参数,如下所示。

ZipFile(File, Charset)

ZipFile(文件,字符集)

ZipInputStream(InputStream, Charset)

ZipInputStream(InputStream, Charset)

ZipOutputStream(OutputStream, Charset)

ZipOutputStream(输出流,字符集)

With these new constructors, applications can now access those non-UTF-8 ZIP files via ZipInputStream or ZipFile objects created with the specific encoding, or create a Zip files encoded in non-UTF-8 via the new ZipOutputStream(os, charset) constructor, if necessary.

使用这些新的构造函数,应用程序现在可以通过使用特定编码创建的 ZipInputStream 或 ZipFile 对象访问那些非 UTF-8 ZIP 文件,或者通过新的 ZipOutputStream(os, charset) 构造函数创建以非 UTF-8 编码的 Zip 文件,如有必要。

zip is a stripped-down version of the Jar tool with a "-encoding" option to support non-UTF8 encoding for entry name and comment, it can serve as a demo for how to use the new APIs (I used it as a unit test). I'm still debating with myself if it is a good idea to officially introduce "-encoding" into the Jar tool...

zip 是 Jar 工具的精简版,带有“-encoding”选项以支持条目名称和注释的非 UTF8 编码,它可以作为如何使用新 API 的演示(我用它作为一个单元测试)。我仍在与自己争论将“-encoding”正式引入 Jar 工具是否是个好主意......