Java Base64 编码文件并压缩它

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9681239/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-16 06:04:49  来源:igfitidea点击:

Base64-encode a file and compress it

javaencodingbase64apache-commons-codec

提问by dmurali

My goal is to encode a file and zip it in a folder in java. I have to use the Apache's Commons-codec library. I am able to encode and zip it and it works fine but when i decode it back to its original form, it looks like the file has not completely been encoded. Looks like a few parts are missing. Can anybody tell me why this happens?

我的目标是对文件进行编码并将其压缩在 java 中的文件夹中。我必须使用 Apache 的 Commons-codec 库。我能够对其进行编码和压缩,并且工作正常,但是当我将其解码回其原始形式时,看起来该文件尚未完全编码。好像少了几个零件。谁能告诉我为什么会这样?

I am also attaching the part of my code for your reference so that you can guide me accordingly.

我还附上了我的代码部分供您参考,以便您可以相应地指导我。

private void zip() {
    int BUFFER_SIZE = 4096;
    byte[] buffer = new byte[BUFFER_SIZE];

    try {
        // Create the ZIP file
        String outFilename = "H:\OUTPUT.zip";
        ZipOutputStream out = new ZipOutputStream(new FileOutputStream(
                outFilename));

        // Compress the files
        for (int i : list.getSelectedIndices()) {
            System.out.println(vector.elementAt(i));
            FileInputStream in = new FileInputStream(vector.elementAt(i));
            File f = vector.elementAt(i);

            // Add ZIP entry to output stream.
            out.putNextEntry(new ZipEntry(f.getName()));

            // Transfer bytes from the file to the ZIP file
            int len;

            while ((len = in.read(buffer)) > 0) {
                buffer = org.apache.commons.codec.binary.Base64
                        .encodeBase64(buffer);
                out.write(buffer, 0, len);

            }

            // Complete the entry
            out.closeEntry();
            in.close();

        }

        // Complete the ZIP file
        out.close();
    } catch (IOException e) {
        System.out.println("caught exception");
        e.printStackTrace();
    }
}

采纳答案by DRCB

BASE64 encoded data are usually longer than source, however you are using the length of the source data to write encoded to output stream.

BASE64 编码数据通常比源长,但是您使用源数据的长度将编码写入输出流。

You have use size of the generated array instead of your variable len.

您使用了生成数组的大小而不是变量len

Second notice - do not redefine buffereach time you encode a byte. Just write result into output.

第二个注意事项 -buffer每次编码一个字节时不要重新定义。只需将结果写入输出即可。

 while ((len = in.read(buffer)) > 0)  {                         
     byte [] enc = Base64.encodeBase64(Arrays.copyOf(buffer, len));
     out.write(enc, 0, enc.length);
 }

UPDATE: Use Arrays.copyOf(...)to set length of the input buffer for encoding.

更新:使用Arrays.copyOf(...)设置用于编码的输入缓冲区的长度。

回答by Robert

Your main problem is that base64 encoding can not be applied block-wise (especially not the apache-commons implementation). This problem is getting worse because you don't even know how large your blocks are as this depends on the bytes read by in.read(..).

您的主要问题是不能按块应用 base64 编码(尤其是 apache-commons 实现)。这个问题越来越严重,因为你甚至不知道你的块有多大,因为这取决于读取的字节数in.read(..)

Therefore you have two alternatives:

因此,您有两种选择:

  1. Load the complete file to memory and then apply the base64 encoding.
  2. use an alternative Base64 encoder implementation that works stream-based (the Apache Batik project seems to contain such an implementation: org.apache.batik.util.Base64EncoderStream)
  1. 将完整文件加载到内存中,然后应用 base64 编码。
  2. 使用基于流的替代 Base64 编码器实现(Apache Batik 项目似乎包含这样的实现:org.apache.batik.util.Base64EncoderStream

回答by Roger Lindsj?

When you read the file content into bufferyou get lenbytes. When base64 encoding this you get more than lenbytes, but you still only write lenbytes to the file. This beans that the last part of your read chunks will be truncated.

当您将文件内容读入缓冲区时,您将获得len字节。当 base64 编码时,您会得到超过len个字节,但您仍然只将len个字节写入文件。这个 bean 将截断您读取的块的最后一部分。

Also, if your read does not fill the entire buffer you should not base64 encode more than lenbytes as you will otherwise get trailing 0s in the padding of the last bytes.

此外,如果您的读取未填满整个缓冲区,则不应使用 base64 编码超过len个字节,否则您将在最后一个字节的填充中得到尾随 0。

Combining the information above this means that you must base64 encode the whole file (read it all into a byte[]) unless you can guarantee that each chunk you read can fit exactly into a base64 encoded message. If your files are not very large I would recommend reading the whole file.

结合上面的信息,这意味着您必须对整个文件进行 base64 编码(将其全部读入 byte[]),除非您可以保证您读取的每个块都可以完全适合 base64 编码的消息。如果您的文件不是很大,我建议您阅读整个文件。

A smaller problem is that when reading in your loop you should probably check for "> -1", not "> 0", but int his case it does not make a difference.

一个较小的问题是,在循环读取时,您可能应该检查“> -1”,而不是“> 0”,但在他的情况下,这没有什么区别。