Java java.io.IOException: 不是 GZIP 格式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25310193/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java java.io.IOException: Not in GZIP format
提问by TenG
I searched for an example of how to compress a string in Java.
我搜索了如何在 Java 中压缩字符串的示例。
I have a function to compress then uncompress. The compress seems to work fine:
我有一个压缩然后解压缩的功能。压缩似乎工作正常:
public static String encStage1(String str)
{
String format1 = "ISO-8859-1";
String format2 = "UTF-8";
if (str == null || str.length() == 0)
{
return str;
}
System.out.println("String length : " + str.length());
ByteArrayOutputStream out = new ByteArrayOutputStream();
String outStr = null;
try
{
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str.getBytes());
gzip.close();
outStr = out.toString(format2);
System.out.println("Output String lenght : " + outStr.length());
} catch (Exception e)
{
e.printStackTrace();
}
return outStr;
}
But the reverse is complaining about the string not being in GZIP format, even when I pass the return from encStage1 straight back into the decStage3:
但反过来是抱怨字符串不是 GZIP 格式,即使我将 encStage1 的返回值直接传递回 decStage3:
public static String decStage3(String str)
{
if (str == null || str.length() == 0)
{
return str;
}
System.out.println("Input String length : " + str.length());
String outStr = "";
try
{
String format1 = "ISO-8859-1";
String format2 = "UTF-8";
GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str.getBytes(format2)));
BufferedReader bf = new BufferedReader(new InputStreamReader(gis, format2));
String line;
while ((line = bf.readLine()) != null)
{
outStr += line;
}
System.out.println("Output String lenght : " + outStr.length());
} catch (Exception e)
{
e.printStackTrace();
}
return outStr;
}
I get this error when I call with a string return from encStage1:
当我使用从 encStage1 返回的字符串调用时出现此错误:
public String encIDData(String idData)
{
String tst = "A simple test string";
System.out.println("Enc 0: " + tst);
String stg1 = encStage1(tst);
System.out.println("Enc 1: " + toHex(stg1));
String dec1 = decStage3(stg1);
System.out.println("unzip: " + toHex(dec1));
}
Output/Error:
输出/错误:
Enc 0: A simple test string
String length : 20
Output String lenght : 40
Enc 1: 1fefbfbd0800000000000000735428efbfbdefbfbd2defbfbd495528492d2e51282e29efbfbdefbfbd4b07005aefbfbd21efbfbd14000000
Input String length : 40
java.io.IOException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:137)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
回答by Joop Eggen
A small error is:
一个小错误是:
gzip.write(str.getBytes());
takes the default platform encoding, which on Windows will never be ISO-8859-1. Better:
采用默认平台编码,在 Windows 上永远不会是 ISO-8859-1。更好的:
gzip.write(str.getBytes(format1));
You could consider taking "Cp1252", Windows Latin-1 (for some European languages), instead of "ISO-8859-1", Latin-1. That adds comma like quotes and such.
您可以考虑采用“Cp1252”,Windows Latin-1(对于某些欧洲语言),而不是“ISO-8859-1”,Latin-1。这增加了逗号之类的引号等。
The major error is converting the compressed bytes to a String. Java separates binary data (byte[], InputStream, OutputStream) from text (String, char, Reader, Writer) which internally is always kept in Unicode. A byte sequence does not need to be valid UTF-8. You might get away by converting the bytes as a single byte encoding (ISO-8859-1 for instance).
主要错误是将压缩字节转换为字符串。Java 将二进制数据(byte[]、InputStream、OutputStream)与内部始终保持在 Unicode 中的文本(String、char、Reader、Writer)分开。字节序列不需要是有效的 UTF-8。您可能会通过将字节转换为单字节编码(例如 ISO-8859-1)而逃脱。
The best way would be
最好的方法是
gzip.write(str.getBytes(StandardCharsets.UTF_8));
So you have full Unicode, every script may be combined.
所以你有完整的 Unicode,每个脚本都可以组合。
And uncompressing to a ByteArrayOutputStream
and new String(baos.toByteArray(), StandardCharsets.UTF_8)
.
Using BufferedReader on an InputStreamReader with UTF-8 is okay too, but a readLine throws away the newline characters
并解压缩为 aByteArrayOutputStream
和new String(baos.toByteArray(), StandardCharsets.UTF_8)
。在带有 UTF-8 的 InputStreamReader 上使用 BufferedReader 也可以,但是 readLine 会丢弃换行符
outStr += line + "\r\n"; // Or so.
Clean answer:
干净的答案:
public static byte[] encStage1(String str) throws IOException
{
try (ByteArrayOutputStream out = new ByteArrayOutputStream())
{
try (GZIPOutputStream gzip = new GZIPOutputStream(out))
{
gzip.write(str.getBytes(StandardCharsets.UTF_8));
}
return out.toByteArray();
//return out.toString(StandardCharsets.ISO_8859_1);
// Some single byte encoding
}
}
public static String decStage3(byte[] str) throws IOException
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str)))
{
int b;
while ((b = gis.read()) != -1) {
baos.write((byte) b);
}
}
return new String(baos.toByteArray(), StandardCharset.UTF_8);
}
回答by twester
usage of toString/getBytes for encoding/decoding is a wrong way. try to use something like BASE64 encoding for this purpose (java.util.Base64 in jdk 1.8)
使用 toString/getBytes 进行编码/解码是一种错误的方式。尝试为此使用 BASE64 编码(jdk 1.8 中的 java.util.Base64)
as a proof try this simple test:
作为证明试试这个简单的测试:
import org.testng.annotations.Test;
import java.io.ByteArrayOutputStream;
import static org.testng.Assert.assertEquals;
public class SimpleTest {
@Test
public void test() throws Exception {
final String CS = "utf-8";
byte[] b0 = {(byte) 0xff};
ByteArrayOutputStream out = new ByteArrayOutputStream();
out.write(b0);
out.close();
byte[] b1 = out.toString(CS).getBytes(CS);
assertEquals(b0, b1);
}
}