java GZIP 解压字符串和字节转换

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10974941/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 03:16:59  来源:igfitidea点击:

GZIP decompress string and byte conversion

javagzipgzipoutputstream

提问by Alexandr Erofeev

I have a problem in code:

我的代码有问题:

private static String compress(String str)
{
    String str1 = null;
    ByteArrayOutputStream bos = null;
    try
    {
        bos = new ByteArrayOutputStream();
        BufferedOutputStream dest = null;

        byte b[] = str.getBytes();
        GZIPOutputStream gz = new GZIPOutputStream(bos,b.length);
        gz.write(b,0,b.length);
        bos.close();
        gz.close();

    }
    catch(Exception e) {
        System.out.println(e);
        e.printStackTrace();
    }
    byte b1[] = bos.toByteArray();
    return new String(b1);
}

private static String deCompress(String str)
{
    String s1 = null;

    try
    {
        byte b[] = str.getBytes();
        InputStream bais = new ByteArrayInputStream(b);
        GZIPInputStream gs = new GZIPInputStream(bais);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        int numBytesRead = 0;
        byte [] tempBytes = new byte[6000];
        try
        {
            while ((numBytesRead = gs.read(tempBytes, 0, tempBytes.length)) != -1)
            {
                baos.write(tempBytes, 0, numBytesRead);
            }

            s1 = new String(baos.toByteArray());
            s1= baos.toString();
        }
        catch(ZipException e)
        {
            e.printStackTrace();
        }
    }
    catch(Exception e) {
        e.printStackTrace();
    }
    return s1;
}

public String test() throws Exception
    {
        String str = "teststring";
        String cmpr = compress(str);
        String dcmpr = deCompress(cmpr);
}

This code throw java.io.IOException: unknown format (magic number ef1f)

此代码抛出 java.io.IOException: unknown format (magic number ef1f)

GZIPInputStream gs = new GZIPInputStream(bais);

It turns out that when converting byte new String (b1)and the byte b [] = str.getBytes ()bytes are "spoiled." At the output of the line we have already more bytes. If you avoid the conversion to a string and work on the line with bytes - everything works. Sorry for my English.

事实证明,在转换字节时new String (b1)byte b [] = str.getBytes ()字节被“破坏”了。在该行的输出处,我们已经有更多字节了。如果您避免转换为字符串并在字节行上工作 - 一切正常。对不起我的英语不好。



public String unZip(String zipped) throws DataFormatException, IOException {
    byte[] bytes = zipped.getBytes("WINDOWS-1251");
    Inflater decompressed = new Inflater();
    decompressed.setInput(bytes);

    byte[] result = new byte[100];
    ByteArrayOutputStream buffer = new ByteArrayOutputStream();

    while (decompressed.inflate(result) != 0)
        buffer.write(result);

    decompressed.end();

    return new String(buffer.toByteArray(), charset);
}

I'm use this function to decompress server responce. Thanks for help.

我正在使用此功能来解压缩服务器响应。感谢帮助。

回答by Jon Skeet

You have two problems:

你有两个问题:

  • You're using the default character encoding to convert the original string into bytes. That will vary by platform. It's better to specify an encoding - UTF-8 is usually a good idea.
  • You're trying to represent the opaque binary data of the result of the compression as a string by just calling the String(byte[])constructor. That constructor is onlymeant for data which is encoded text... which this isn't. You should use base64 for this. There's a public domain base64 librarywhich makes this easy. (Alternatively, don't convert the compressed data to text at all - just return a byte array.)
  • 您正在使用默认字符编码将原始字符串转换为字节。这将因平台而异。最好指定编码 - UTF-8 通常是个好主意。
  • 您试图通过调用String(byte[])构造函数将压缩结果的不透明二进制数据表示为字符串。该构造函数适用于编码文本的数据......这不是。为此,您应该使用 base64。有一个公共领域的 base64 库,使这变得容易。(或者,根本不要将压缩数据转换为文本 - 只需返回一个字节数组。)

Fundamentally, you need to understand how different text and binary data are - when you want to convert between the two, you should do so carefully. If you want to represent "non text" binary data (i.e. bytes which aren'tthe direct result of encoding text) in a string you should use something like base64 or hex. When you want to encode a string as binary data (e.g. to write some text to disk) you should carefully consider which encoding to use. If another program is going to read your data, you need to work out what encoding it expects - if you have full control over it yourself, I'd usually go for UTF-8.

从根本上说,您需要了解文本和二进制数据的不同之处——当您想在两者之间进行转换时,您应该小心地进行。如果你想在一个字符串中表示“非文本”二进制数据(即不是编码文本的直接结果的字节),你应该使用类似 base64 或 hex 的东西。当您想将字符串编码为二进制数据(例如将一些文本写入磁盘)时,您应该仔细考虑使用哪种编码。如果另一个程序要读取您的数据,您需要确定它所期望的编码 - 如果您自己可以完全控制它,我通常会选择 UTF-8。

Additionally, the exception handling in your code is poor:

此外,代码中的异常处理很差:

  • You should almost never catch Exception; catch more specific exceptions
  • You shouldn't just catch an exception and continue as if it had never happened. If you can't really handlethe exception and still complete your method successfully, you should let the exception bubble up the stack (or possibly catch it and wrap it in a more appropriate exception type for your abstraction)
  • 你几乎不应该抓住Exception; 捕获更具体的异常
  • 您不应该只是捕捉异常并继续,就好像它从未发生过一样。如果您无法真正处理异常并仍然成功完成您的方法,您应该让异常在堆栈中冒泡(或者可能捕获它并将其包装在更合适的异常类型中以进行抽象)

回答by Codo

When you GZIP compress data, you always get binary data. This data cannot be converted into string as it is no valid character data (in any encoding).

当你 GZIP 压缩数据时,你总是得到二进制数据。此数据不能转换为字符串,因为它不是有效的字符数据(在任何编码中)。

So your compressmethod should return a byte array and your decompressmethod should take a byte array as its parameter.

因此,您的compress方法应返回一个字节数组,而您的decompress方法应将字节数组作为其参数。

Futhermore, I recommend you use an explicit encoding when you convert the string into a byte array before compression and when you turn the decompressed data into a string again.

此外,我建议您在压缩之前将字符串转换为字节数组以及将解压缩的数据再次转换为字符串时使用显式编码。

回答by dionkta

When you GZIP compress data, you always get binary data. This data cannot be converted into string as it is no valid character data (in any encoding).

当你 GZIP 压缩数据时,你总是得到二进制数据。此数据不能转换为字符串,因为它不是有效的字符数据(在任何编码中)。

Codo is right, thanks a lot for enlightening me. I was trying to decompress a string (converted from the binary data). What I amended was using InflaterInputStream directly on the input stream returned by my http connection. (My app was retrieving a large JSON of strings)

Codo 是对的,非常感谢您的启发。我试图解压缩一个字符串(从二进制数据转换而来)。我修改的是直接在我的 http 连接返回的输入流上使用 InflaterInputStream。(我的应用程序正在检索大量 JSON 字符串)