在 Java 中将非 ASCII 文件名添加到 zip

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/106367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 08:22:22  来源:igfitidea点击:

Add non-ASCII file names to zip in Java

javaencodingzip

提问by Micke

What is the best way to add non-ASCIIfile names to a zip fileusing Java, in such a way that the files can be properly read in both Windowsand Linux?

使用Java非 ASCII文件名添加到zip 文件的最佳方法是什么,以便可以在WindowsLinux 中正确读取文件

Here is one attempt, adapted from https://truezip.dev.java.net/tutorial-6.html#Example, which works in Windows Vista but fails in Ubuntu Hardy. In Hardy the file name is shown as abc-ЖДФ.txt in file-roller.

这是改编自https://truezip.dev.java.net/tutorial-6.html#Example 的一种尝试,它适用于 Windows Vista,但在 Ubuntu Hardy 中失败。在 Hardy 中,文件名在文件滚轮中显示为 abc-ЖДФ.txt。

import java.io.IOException;
import java.io.PrintStream;

import de.schlichtherle.io.File;
import de.schlichtherle.io.FileOutputStream;

public class Main {

    public static void main(final String[] args) throws IOException {

        try {
            PrintStream ps = new PrintStream(new FileOutputStream(
                    "outer.zip/abc-???.txt"));
            try {
                ps.println("The characters ??? works here though.");
            } finally {
                ps.close();
            }
        } finally {
            File.umount();
        }
    }
}

Unlike java.util.zip, truezip allows specifying zip file encoding. Here's another sample, this time explicitly specifiying the encoding. Neither IBM437, UTF-8 nor ISO-8859-1 works in Linux. IBM437 works in Windows.

与 java.util.zip 不同,truezip 允许指定 zip 文件编码。这是另一个示例,这次明确指定了编码。IBM437、UTF-8 和 ISO-8859-1 都不适用于 Linux。IBM437 适用于 Windows。

import java.io.IOException;

import de.schlichtherle.io.FileOutputStream;
import de.schlichtherle.util.zip.ZipEntry;
import de.schlichtherle.util.zip.ZipOutputStream;

public class Main {

    public static void main(final String[] args) throws IOException {

        for (String encoding : new String[] { "IBM437", "UTF-8", "ISO-8859-1" }) {
            ZipOutputStream zipOutput = new ZipOutputStream(
                    new FileOutputStream(encoding + "-example.zip"), encoding);
            ZipEntry entry = new ZipEntry("abc-???.txt");
            zipOutput.putNextEntry(entry);
            zipOutput.closeEntry();
            zipOutput.close();
        }
    }
}

回答by stephbu

Did it actually fail or was just a font issue? (e.g. font having different glyphs for those charcodes) I've seen similar issues in Windows where rendering "broke" because the font didn't support the charset but the data was actually intact and correct.

它真的失败了还是只是字体问题?(例如,这些字符代码的字体具有不同的字形)我在 Windows 中看到过类似的问题,其中呈现“损坏”,因为字体不支持字符集,但数据实际上是完整和正确的。

回答by bobince

Non-ASCII file names are not reliable across ZIP implementations and are best avoided. There is no provision for storing a charset setting in ZIP files; clients tend to guess with 'the current system codepage', which is unlikely to be what you want. Many combinations of client and codepage can result in inaccessible files.

非 ASCII 文件名在 ZIP 实现中不可靠,最好避免使用。没有规定在 ZIP 文件中存储字符集设置;客户倾向于猜测“当前系统代码页”,这不太可能是您想要的。客户端和代码页的许多组合可能会导致文件无法访问。

Sorry!

对不起!

回答by McDowell

From a quick look at the TrueZIP manual- they recommend the JAR format:

快速浏览一下 TrueZIP手册- 他们推荐 JAR 格式:

It uses UTF-8 for file name encoding and comments - unlike ZIP, which only uses IBM437.

它使用 UTF-8 进行文件名编码和注释——不像 ZIP,它只使用 IBM437。

This probably means that the API is using the java.util.zippackage for its implementation; that documentation states that it is still using a ZIP format from 1996. Unicode support wasn't added to the PKWARE .ZIP File Format Specificationuntil 2006.

这可能意味着 API 使用java.util.zip包来实现;该文档指出它仍在使用1996 年以来ZIP 格式。Unicode 支持直到 2006 年才添加到PKWARE .ZIP 文件格式规范中

回答by Mnementh

The encoding for the File-Entries in ZIP is originally specified as IBM Code Page 437. Many characters used in other languages are impossible to use that way.

ZIP 中文件条目的编码最初指定为 IBM 代码页 437。在其他语言中使用的许多字符不可能以这种方式使用。

The PKWARE-specificationrefers to the problem and adds a bit. But that is a later addition (from 2007, thanks to Cheeso for clearing that up, see comments). If that bit is set, the filename-entry have to be encoded in UTF-8. This extension is described in 'APPENDIX D - Language Encoding (EFS)', that is at the end of the linked document.

PKWARE规格是指问题,并增加了一点。但这是后来的补充(从 2007 年开始,感谢 Cheeso 澄清这一点,请参阅评论)。如果设置了该位,则文件名条目必须以 UTF-8 编码。此扩展在“附录 D - 语言编码 (EFS)”中进行了描述,位于链接文档的末尾。

For Java it is a known bug, to get into trouble with non-ASCII-characters. See bug #4244499and the high number of related bugs.

对于 Java 来说,这是一个已知的错误,会遇到非 ASCII 字符的麻烦。请参阅错误 #4244499和大量相关错误。

My colleague used as workaround URL-Encoding for the filenames before storing them into the ZIP and decoding after reading them. If you control both, storing and reading, that may be a workaround.

我的同事在将文件名存储到 ZIP 中并在读取后解码之前将其用作文件名的解决方法 URL 编码。如果您同时控制存储和读取,那可能是一种解决方法。

EDIT: At the bug someone suggests using the ZipOutputStream from Apache Ant as workaround. This implementation allows the specification of an encoding.

编辑:在这个错误中,有人建议使用来自 Apache Ant 的 ZipOutputStream 作为解决方法。此实现允许指定编码。

回答by Cheeso

In Zip files, according to the spec owned by PKWare, the encoding of file names and file comments is IBM437. In 2007 PKWare extended the spec to also allow UTF-8. This says nothing about the encoding of the files contained within the zip. Only the encoding of the filenames.

在 Zip 文件中,根据 PKWare 拥有的规范,文件名和文件注释的编码是 IBM437。2007 年,PKWare 扩展了规范以允许使用 UTF-8。这并没有说明 zip 中包含的文件的编码。只有文件名的编码。

I think all tools and libraries (Java and non Java) support IBM437 (which is a superset of ASCII), and fewer tools and libraries support UTF-8. Some tools and libs support other code pages. For example if you zip something using WinRar on a computer running in Shanghai, you will get the Big5 code page. This is not "allowed" by the zip spec but it happens anyway.

我认为所有工具和库(Java 和非 Java)都支持 IBM437(它是 ASCII 的超集),支持 UTF-8 的工具和库较少。一些工具和库支持其他代码页。例如,如果您在上海运行的计算机上使用 WinRar 压缩某些内容,您将获得 Big5 代码页。这不是 zip 规范“允许的”,但无论如何都会发生。

The DotNetZiplibrary for .NET does Unicode, but of course that doesn't help you if you are using Java!

.NET的DotNetZip库支持 Unicode,但如果您使用的是 Java,那当然无济于事!

Using the Java built-in support for ZIP, you will always get IBM437. If you want an archive with something other than IBM437, then use a third party library, or create a JAR.

使用 Java 对 ZIP 的内置支持,您将始终获得 IBM437。如果您想要使用 IBM437 以外的其他内容的存档,请使用第三方库,或创建 JAR。

回答by Anton K

Miracles indeed happen, and Sun/Oracle did really fix the long-living bug/rfe:

奇迹确实发生了,Sun/Oracle 确实修复了长期存在的错误/rfe:

Now it's possible to set up filename encodings upon creatingthe zip file/stream (requires Java 7).

现在可以在创建zip 文件/流时设置文件名编码需要 Java 7)。

回答by Fengtan

You can still use the Apache Commons implementation of the zip stream : http://commons.apache.org/compress/apidocs/org/apache/commons/compress/archivers/zip/ZipArchiveOutputStream.html#setEncoding%28java.lang.String%29

您仍然可以使用 zip 流的 Apache Commons 实现:http: //commons.apache.org/compress/apidocs/org/apache/commons/compress/archivers/zip/ZipArchiveOutputStream.html#setEncoding%28java.lang.String %29

Calling setEncoding("UTF-8") on your stream should be enough.

在您的流上调用 setEncoding("UTF-8") 就足够了。