base64 java中的失败编码文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31182657/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 10:45:56  来源:igfitidea点击:

Failure encoding files in base64 java

javafilebase64

提问by JGG

I have this class to encode and decode a file. When I run the class with .txt files the result is successfully. But when I run the code with .jpg or .doc I can not open the file or it is not equals to original. I don't know why this is happening. I have modified this class http://myjeeva.com/convert-image-to-string-and-string-to-image-in-java.html. But i want change this line

我有这个类来编码和解码文件。当我使用 .txt 文件运行该类时,结果是成功的。但是当我用 .jpg 或 .doc 运行代码时,我无法打开文件或者它不等于原始文件。我不知道为什么会这样。我修改了这个类 http://myjeeva.com/convert-image-to-string-and-string-to-image-in-java.html。但我想改变这条线

byte imageData[] = new byte[(int) file.length()];

for

为了

byte example[] = new byte[1024];

and read the file so many times how we need. Thanks.

并根据需要多次读取文件。谢谢。

import java.io.*;
import java.util.*;

  public class Encode {

Input = Input file root - Output = Output file root - imageDataString =String encoded

输入 = 输入文件根 - 输出 = 输出文件根 - imageDataString = 字符串编码

  String input;
  String output;
  String imageDataString;


  public void setFileInput(String input){
    this.input=input;
  }

  public void setFileOutput(String output){
    this.output=output;
  }

  public String getFileInput(){
    return input;
  }

  public String getFileOutput(){
    return output;
  }

  public String getEncodeString(){
    return  imageDataString;
  }

  public String processCode(){
    StringBuilder sb= new StringBuilder();

    try{
        File fileInput= new File( getFileInput() );
        FileInputStream imageInFile = new FileInputStream(fileInput);

i have seen in examples that people create a byte[] with the same length than the file. I don′t want this because i will not know what length will have the file.

我在示例中看到人们创建了一个与文件长度相同的 byte[]。我不想要这个,因为我不知道文件的长度。

        byte buff[] = new byte[1024];

        int r = 0;

        while ( ( r = imageInFile.read( buff)) > 0 ) {

          String imageData = encodeImage(buff);

          sb.append( imageData);

          if ( imageInFile.available() <= 0 ) {
            break;
          }
        }



       } catch (FileNotFoundException e) {
        System.out.println("File not found" + e);
      } catch (IOException ioe) {
        System.out.println("Exception while reading the file " + ioe);

    } 

        imageDataString = sb.toString();

       return imageDataString;
}  


  public  void processDecode(String str) throws IOException{

      byte[] imageByteArray = decodeImage(str);
      File fileOutput= new File( getFileOutput());
      FileOutputStream imageOutFile = new FileOutputStream( fileOutput);

      imageOutFile.write(imageByteArray);
      imageOutFile.close();

}

 public static String encodeImage(byte[] imageByteArray) {

      return  Base64.getEncoder().withoutPadding().encodeToString( imageByteArray);

    }

    public static byte[] decodeImage(String imageDataString) {
      return  Base64.getDecoder().decode(  imageDataString);  

    }


  public static void main(String[] args) throws IOException {

    Encode a = new Encode();

    a.setFileInput( "C://Users//xxx//Desktop//original.doc");
    a.setFileOutput("C://Users//xxx//Desktop//original-copied.doc");

    a.processCode( );

    a.processDecode( a.getEncodeString());

    System.out.println("C O P I E D");
  }
}

I tried changing

我试着改变

String imageData = encodeImage(buff);

for

为了

String imageData = encodeImage(buff,r);

and the method encodeImage

和方法 encodeImage

public static String encodeImage(byte[] imageByteArray, int r) {

     byte[] aux = new byte[r];

     for ( int i = 0; i < aux.length; i++) {
       aux[i] = imageByteArray[i];

       if ( aux[i] <= 0 ) {
         break;
       }
     }
return  Base64.getDecoder().decode(  aux);
}

But i have the error:

但我有错误:

Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits   

采纳答案by RealSkeptic

You have two problems in your program.

你的程序有两个问题。

The first, as mentioned in by @Joop Eggen, is that you are not handling your input correctly.

第一个,正如@Joop Eggen 提到的,是你没有正确处理你的输入。

In fact, Java does not promise you that even in the middle of the file, you'll be reading the entire 1024 bytes. It could just read 50 bytes, and tell you it read 50 bytes, and then the next time it will read 50 bytes more.

事实上,Java 并没有向您保证,即使在文件的中间,您也将读取整个 1024 字节。它可以只读取 50 个字节,并告诉您它读取了 50 个字节,然后下一次它会再读取 50 个字节。

Suppose you read 1024 bytes in the previous round. And now, in the current round, you're only reading 50. Your byte array now contains 50 of the new bytes, and the rest are the old bytes from the previous read!

假设您在上一轮中读取了 1024 个字节。现在,在本轮中,您只读取了 50 个。您的字节数组现在包含 50 个新字节,其余是上次读取的旧字节!

So you always need to copy the exact number of bytes copied to a new array, and pass that on to your encoding function.

因此,您始终需要将复制的确切字节数复制到新数组中,并将其传递给您的编码函数。

So, to fix this particular problem, you'll need to do something like:

因此,要解决此特定问题,您需要执行以下操作:

 while ( ( r = imageInFile.read( buff)) > 0 ) {

      byte[] realBuff = Arrays.copyOf( buff, r );

      String imageData = encodeImage(realBuff);

      ...
 }


However, this is not the only problem here. Your real problem is with the Base64 encoding itself.

然而,这并不是这里唯一的问题。您真正的问题在于 Base64 编码本身。

What Base64 does is take your bytes, break them into 6-bit chunks, and then treat each of those chunks as a number between N 0 and 63. Then it takes the Nth character from its character table, to represent that chunk.

Base64 所做的是将您的字节分成 6 位块,然后将这些块中的每一个视为 N 0 和 63 之间的数字。然后从其字符表中取出第 N 个字符来表示该块。

But this means it can't just encode a single byte or two bytes, because a byte contains 8 bits, and which means one chunk of 6 bits, and 2 leftover bits. Two bytes have 16 bits. Thats 2 chunks of 6 bits, and 4 leftover bits.

但这意味着它不能只编码一个字节或两个字节,因为一个字节包含 8 位,这意味着一个 6 位的块和 2 个剩余位。两个字节有 16 位。那是 2 个 6 位块和 4 个剩余位。

To solve this problem, Base64 always encodes 3 consecutive bytes. If the input does not divide evenly by three, it adds additional zero bits.

为了解决这个问题,Base64 总是编码 3 个连续的字节。如果输入没有被 3 整除,它会添加额外的零位

Here is a little program that demonstrates the problem:

这是一个演示问题的小程序:

package testing;

import java.util.Base64;

public class SimpleTest {

    public static void main(String[] args) {

        // An array containing six bytes to encode and decode.
        byte[] fullArray = { 0b01010101, (byte) 0b11110000, (byte)0b10101010, 0b00001111, (byte)0b11001100, 0b00110011 };

        // The same array broken into three chunks of two bytes.

        byte[][] threeTwoByteArrays = {
            {       0b01010101, (byte) 0b11110000 },
            { (byte)0b10101010,        0b00001111 },
            { (byte)0b11001100,        0b00110011 }
        };
        Base64.Encoder encoder = Base64.getEncoder().withoutPadding();

        // Encode the full array

        String encodedFullArray = encoder.encodeToString(fullArray);

        // Encode the three chunks consecutively 

        StringBuilder encodedStringBuilder = new StringBuilder();
        for ( byte [] twoByteArray : threeTwoByteArrays ) {
            encodedStringBuilder.append(encoder.encodeToString(twoByteArray));
        }
        String encodedInChunks = encodedStringBuilder.toString();

        System.out.println("Encoded full array: " + encodedFullArray);
        System.out.println("Encoded in chunks of two bytes: " + encodedInChunks);

        // Now  decode the two resulting strings

        Base64.Decoder decoder = Base64.getDecoder();

        byte[] decodedFromFull = decoder.decode(encodedFullArray);   
        System.out.println("Byte array decoded from full: " + byteArrayBinaryString(decodedFromFull));

        byte[] decodedFromChunked = decoder.decode(encodedInChunks);
        System.out.println("Byte array decoded from chunks: " + byteArrayBinaryString(decodedFromChunked));
    }

    /**
     * Convert a byte array to a string representation in binary
     */
    public static String byteArrayBinaryString( byte[] bytes ) {
        StringBuilder sb = new StringBuilder();
        sb.append('[');
        for ( byte b : bytes ) {
            sb.append(Integer.toBinaryString(Byte.toUnsignedInt(b))).append(',');
        }
        if ( sb.length() > 1) {
            sb.setCharAt(sb.length() - 1, ']');
        } else {
            sb.append(']');
        }
        return sb.toString();
    }
}

So, imagine my 6-byte array is your image file. And imagine that your buffer is not reading 1024 bytes but 2 bytes each time. This is going to be the output of the encoding:

所以,想象一下我的 6 字节数组是你的图像文件。并想象您的缓冲区每次读取的不是 1024 字节而是 2 字节。这将是编码的输出:

Encoded full array: VfCqD8wz
Encoded in chunks of two bytes: VfAqg8zDM

As you can see, the encoding of the full array gave us 8 characters. Each group of three bytes is converted into four chunks of 6 bits, which in turn are converted into four characters.

如您所见,完整数组的编码为我们提供了 8 个字符。每组三个字节被转换成四个 6 位的块,然后又被转换成四个字符。

But the encoding of the three two-byte arrays gave you a string of 9 characters. It's a completely different string! Each group of two bytes was extended to three chunks of 6 bits by padding with zeros. And since you asked for no padding, it produces only 3 characters, without the extra =that usually marks when the number of bytes is not divisible by 3.

但是三个两字节数组的编码为您提供了一个 9 个字符的字符串。这是一个完全不同的字符串!通过用零填充,每组两个字节被扩展为三个 6 位的块。而且由于您要求没有填充,它只产生 3 个字符,没有=通常标记字节数不能被 3 整除的额外字符。

The output from the part of the program that decodes the 8-character, correct encoded string is fine:

对 8 个字符的正确编码字符串进行解码的程序部分的输出很好:

Byte array decoded from full: [1010101,11110000,10101010,1111,11001100,110011]

But the result from attempting to decode the 9-character, incorrect encoded string is:

但是尝试解码 9 个字符的错误编码字符串的结果是:

Exception in thread "main" java.lang.IllegalArgumentException: Last unit does not have enough valid bits
    at java.util.Base64$Decoder.decode0(Base64.java:734)
    at java.util.Base64$Decoder.decode(Base64.java:526)
    at java.util.Base64$Decoder.decode(Base64.java:549)
    at testing.SimpleTest.main(SimpleTest.java:34)

Not good! A good base64 string should always have multiples of 4 characters, and we only have 9.

不好!一个好的 base64 字符串应该总是有 4 个字符的倍数,而我们只有 9 个。

Since you chose a buffer size of 1024, which is not a multiple of 3, that problem willhappen. You need to encode a multiple of 3 bytes each time to produce the proper string. So in fact, you need to create a buffer sized 3072or something like that.

由于您选择的缓冲区大小为 1024,它不是 3 的倍数,因此发生该问题。您每次需要对 3 个字节的倍数进行编码以生成正确的字符串。所以事实上,你需要创建一个缓冲区大小3072或类似的东西。

But because of the first problem, be very careful at what you pass to the encoder. Because it can always happen that you'll be reading less than 3072bytes. And then, if the number is not divisible by three, the same problem will occur.

但是由于第一个问题,在传递给编码器的内容时​​要非常小心。因为总是会发生读取少于3072字节的情况。然后,如果数字不能被三整除,就会出现同样的问题。

回答by Joop Eggen

Look at:

看着:

    while ( ( r = imageInFile.read( buff)) > 0 ) {
      String imageData = encodeImage(buff);

readreturns -1 on end-of-file orthe actual number of bytesthat were read.

read在文件结尾读取的实际字节数上返回 -1 。

So the last buffmight not be totally read, and even contain garbage from any prior read. So you need to use r.

所以最后一个buff可能不会被完全读取,甚至包含来自任何先前读取的垃圾。所以你需要使用r.

As this is an assignment, the rest is up to you.

由于这是一项任务,其余的取决于您。

By the way:

顺便一提:

 byte[] array = new byte[1024]

is more conventional in Java. The syntax:

在 Java 中更传统。语法:

 byte array[] = ...

was for compatibility with C/C++.

是为了与 C/C++ 兼容。