Java PDF 到字节数组,反之亦然
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1131116/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PDF to byte array and vice versa
提问by
I need to convert pdf to byte array and vice versa.
我需要将 pdf 转换为字节数组,反之亦然。
Can any one help me?
谁能帮我?
This is how I am converting to byte array
这就是我转换为字节数组的方式
public static byte[] convertDocToByteArray(String sourcePath) {
byte[] byteArray=null;
try {
InputStream inputStream = new FileInputStream(sourcePath);
String inputStreamToString = inputStream.toString();
byteArray = inputStreamToString.getBytes();
inputStream.close();
} catch (FileNotFoundException e) {
System.out.println("File Not found"+e);
} catch (IOException e) {
System.out.println("IO Ex"+e);
}
return byteArray;
}
If I use following code to convert it back to document, pdf is getting created. But it's saying 'Bad Format. Not a pdf'
.
如果我使用以下代码将其转换回文档,则会创建 pdf。但它说'Bad Format. Not a pdf'
。
public static void convertByteArrayToDoc(byte[] b) {
OutputStream out;
try {
out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
out.close();
System.out.println("write success");
}catch (Exception e) {
System.out.println(e);
}
回答by plinth
PDFs may contain binary data and chances are it's getting mangled when you do ToString. It seems to me that you want this:
PDF 可能包含二进制数据,并且在您执行 ToString 时它可能会被破坏。在我看来,你想要这个:
FileInputStream inputStream = new FileInputStream(sourcePath);
int numberBytes = inputStream .available();
byte bytearray[] = new byte[numberBytes];
inputStream .read(bytearray);
回答by Mark
The problem is that you are calling toString()
on the InputStream
object itself. This will return a String
representation of the InputStream
object not the actual PDF document.
问题是,您呼叫toString()
的对InputStream
对象本身。这将返回对象的String
表示,而InputStream
不是实际的 PDF 文档。
You want to read the PDF only as bytes as PDF is a binary format. You will then be able to write out that same byte
array and it will be a valid PDF as it has not been modified.
您只想将 PDF 作为字节读取,因为 PDF 是二进制格式。然后,您将能够写出相同的byte
数组,它将是一个有效的 PDF,因为它没有被修改。
e.g. to read a file as bytes
例如以字节形式读取文件
File file = new File(sourcePath);
InputStream inputStream = new FileInputStream(file);
byte[] bytes = new byte[file.length()];
inputStream.read(bytes);
回答by Eric Petroelje
Calling toString()
on an InputStream
doesn't do what you think it does. Even if it did, a PDF contains binary data, so you wouldn't want to convert it to a string first.
调用toString()
anInputStream
不会做你认为它会做的事情。即使是这样,PDF 也包含二进制数据,因此您不希望先将其转换为字符串。
What you need to do is read from the stream, write the results into a ByteArrayOutputStream
, then convert the ByteArrayOutputStream
into an actual byte
array by calling toByteArray()
:
您需要做的是从流中读取,将结果写入 a ByteArrayOutputStream
,然后通过调用将其ByteArrayOutputStream
转换为实际byte
数组toByteArray()
:
InputStream inputStream = new FileInputStream(sourcePath);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
int data;
while( (data = inputStream.read()) >= 0 ) {
outputStream.write(data);
}
inputStream.close();
return outputStream.toByteArray();
回答by Jon Skeet
You basically need a helper method to read a stream into memory. This works pretty well:
您基本上需要一个辅助方法来将流读入内存。这很好用:
public static byte[] readFully(InputStream stream) throws IOException
{
byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int bytesRead;
while ((bytesRead = stream.read(buffer)) != -1)
{
baos.write(buffer, 0, bytesRead);
}
return baos.toByteArray();
}
Then you'd call it with:
然后你会打电话给它:
public static byte[] loadFile(String sourcePath) throws IOException
{
InputStream inputStream = null;
try
{
inputStream = new FileInputStream(sourcePath);
return readFully(inputStream);
}
finally
{
if (inputStream != null)
{
inputStream.close();
}
}
}
Don'tmix up text and binary data - it only leads to tears.
不要混淆文本和二进制数据 - 它只会导致流泪。
回答by David
Are'nt you creating the pdf file but not actually writing the byte array back? Therefore you cannot open the PDF.
你不是在创建 pdf 文件但实际上没有写回字节数组吗?因此您无法打开 PDF。
out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
out.Write(b, 0, b.Length);
out.Position = 0;
out.Close();
This is in addition to correctly reading in the PDF to byte array.
这是正确读取 PDF 到字节数组的补充。
回答by Narendra
You can do it by using Apache Commons IO
without worrying about internal details.
您可以通过使用来做到这一点,Apache Commons IO
而无需担心内部细节。
Use org.apache.commons.io.FileUtils.readFileToByteArray(File?file)
which return data of type byte[]
.
使用org.apache.commons.io.FileUtils.readFileToByteArray(File?file)
which 类型的返回数据byte[]
。
回答by Sridhar
This works for me:
这对我有用:
try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){
byte[] buffer = new byte[1024];
int bytesRead;
while((bytesRead = pdfin.read(buffer))!=-1){
pdfout.write(buffer,0,bytesRead);
}
}
But Jon's answer doesn't work for me if used in the following way:
但是,如果按以下方式使用,乔恩的回答对我不起作用:
try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){
int k = readFully(pdfin).length;
System.out.println(k);
}
Outputs zero as length. Why is that ?
输出零作为长度。这是为什么 ?
回答by gorbysbm
None of these worked for us, possibly because our inputstream
was byte
s from a rest call, and not from a locally hosted pdf file. What worked was using RestAssured
to read the PDF as an input stream, and then using Tika pdf reader to parse it and then call the toString()
method.
这些都不为我们工作,可能是因为我们inputstream
是byte
从一个REST调用S,而不是从本地托管的PDF文件。有效的是使用RestAssured
将 PDF 作为输入流读取,然后使用 Tika pdf reader 解析它,然后调用该toString()
方法。
import com.jayway.restassured.RestAssured;
import com.jayway.restassured.response.Response;
import com.jayway.restassured.response.ResponseBody;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.parser.Parser;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;
InputStream stream = response.asInputStream();
Parser parser = new AutoDetectParser(); // Should auto-detect!
ContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
try {
parser.parse(stream, handler, metadata, context);
} finally {
stream.close();
}
for (int i = 0; i < metadata.names().length; i++) {
String item = metadata.names()[i];
System.out.println(item + " -- " + metadata.get(item));
}
System.out.println("!!Printing pdf content: \n" +handler.toString());
System.out.println("content type: " + metadata.get(Metadata.CONTENT_TYPE));
回答by Chris Clark
Java 7 introduced Files.readAllBytes()
, which can read a PDF into a byte[]
like so:
Java 7 引入了Files.readAllBytes()
,它可以将 PDF 读取成byte[]
这样:
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;
Path pdfPath = Paths.get("/path/to/file.pdf");
byte[] pdf = Files.readAllBytes(pdfPath);
EDIT:
编辑:
Thanks Farooque for pointing out: this will work for reading any kind of file, not just PDFs. All files are ultimately just a bunch of bytes, and as such can be read into a byte[]
.
感谢 Farooque 指出:这适用于阅读任何类型的文件,而不仅仅是 PDF。所有文件最终都只是一堆字节,因此可以读入byte[]
.
回答by Sami Yousif
public static void main(String[] args) throws FileNotFoundException, IOException {
File file = new File("java.pdf");
FileInputStream fis = new FileInputStream(file);
//System.out.println(file.exists() + "!!");
//InputStream in = resource.openStream();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
try {
for (int readNum; (readNum = fis.read(buf)) != -1;) {
bos.write(buf, 0, readNum); //no doubt here is 0
//Writes len bytes from the specified byte array starting at offset off to this byte array output stream.
System.out.println("read " + readNum + " bytes,");
}
} catch (IOException ex) {
Logger.getLogger(genJpeg.class.getName()).log(Level.SEVERE, null, ex);
}
byte[] bytes = bos.toByteArray();
//below is the different part
File someFile = new File("java2.pdf");
FileOutputStream fos = new FileOutputStream(someFile);
fos.write(bytes);
fos.flush();
fos.close();
}