Java PDF 到字节数组，反之亦然

Question

提问by

I need to convert pdf to byte array and vice versa.

我需要将 pdf 转换为字节数组，反之亦然。

Can any one help me?

谁能帮我？

This is how I am converting to byte array

这就是我转换为字节数组的方式

public static byte[] convertDocToByteArray(String sourcePath) {

    byte[] byteArray=null;
    try {
        InputStream inputStream = new FileInputStream(sourcePath);


        String inputStreamToString = inputStream.toString();
        byteArray = inputStreamToString.getBytes();

        inputStream.close();
    } catch (FileNotFoundException e) {
        System.out.println("File Not found"+e);
    } catch (IOException e) {
                System.out.println("IO Ex"+e);
    }
    return byteArray;
}

If I use following code to convert it back to document, pdf is getting created. But it's saying 'Bad Format. Not a pdf'.

如果我使用以下代码将其转换回文档，则会创建 pdf。但它说'Bad Format. Not a pdf'。

public static void convertByteArrayToDoc(byte[] b) {          

    OutputStream out;
    try {       
        out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
        out.close();
        System.out.println("write success");
    }catch (Exception e) {
        System.out.println(e);
    }

Answer 1

回答by plinth

PDFs may contain binary data and chances are it's getting mangled when you do ToString. It seems to me that you want this:

PDF 可能包含二进制数据，并且在您执行 ToString 时它可能会被破坏。在我看来，你想要这个：

        FileInputStream inputStream = new FileInputStream(sourcePath);

        int numberBytes = inputStream .available();
        byte bytearray[] = new byte[numberBytes];

        inputStream .read(bytearray);

Answer 2

回答by Mark

The problem is that you are calling toString()on the InputStreamobject itself. This will return a Stringrepresentation of the InputStreamobject not the actual PDF document.

问题是，您呼叫toString()的对InputStream对象本身。这将返回对象的String表示，而InputStream不是实际的 PDF 文档。

You want to read the PDF only as bytes as PDF is a binary format. You will then be able to write out that same bytearray and it will be a valid PDF as it has not been modified.

您只想将 PDF 作为字节读取，因为 PDF 是二进制格式。然后，您将能够写出相同的byte数组，它将是一个有效的 PDF，因为它没有被修改。

e.g. to read a file as bytes

例如以字节形式读取文件

File file = new File(sourcePath);
InputStream inputStream = new FileInputStream(file); 
byte[] bytes = new byte[file.length()];
inputStream.read(bytes);

Answer 3

回答by Eric Petroelje

Calling toString()on an InputStreamdoesn't do what you think it does. Even if it did, a PDF contains binary data, so you wouldn't want to convert it to a string first.

调用toString()anInputStream不会做你认为它会做的事情。即使是这样，PDF 也包含二进制数据，因此您不希望先将其转换为字符串。

What you need to do is read from the stream, write the results into a ByteArrayOutputStream, then convert the ByteArrayOutputStreaminto an actual bytearray by calling toByteArray():

您需要做的是从流中读取，将结果写入 a ByteArrayOutputStream，然后通过调用将其ByteArrayOutputStream转换为实际byte数组toByteArray()：

InputStream inputStream = new FileInputStream(sourcePath);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

int data;
while( (data = inputStream.read()) >= 0 ) {
    outputStream.write(data);
}

inputStream.close();
return outputStream.toByteArray();

Answer 4

回答by Jon Skeet

You basically need a helper method to read a stream into memory. This works pretty well:

您基本上需要一个辅助方法来将流读入内存。这很好用：

public static byte[] readFully(InputStream stream) throws IOException
{
    byte[] buffer = new byte[8192];
    ByteArrayOutputStream baos = new ByteArrayOutputStream();

    int bytesRead;
    while ((bytesRead = stream.read(buffer)) != -1)
    {
        baos.write(buffer, 0, bytesRead);
    }
    return baos.toByteArray();
}

Then you'd call it with:

然后你会打电话给它：

public static byte[] loadFile(String sourcePath) throws IOException
{
    InputStream inputStream = null;
    try 
    {
        inputStream = new FileInputStream(sourcePath);
        return readFully(inputStream);
    } 
    finally
    {
        if (inputStream != null)
        {
            inputStream.close();
        }
    }
}

Don'tmix up text and binary data - it only leads to tears.

不要混淆文本和二进制数据 - 它只会导致流泪。

Answer 5

回答by David

Are'nt you creating the pdf file but not actually writing the byte array back? Therefore you cannot open the PDF.

你不是在创建 pdf 文件但实际上没有写回字节数组吗？因此您无法打开 PDF。

out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
out.Write(b, 0, b.Length);
out.Position = 0;
out.Close();

This is in addition to correctly reading in the PDF to byte array.

这是正确读取 PDF 到字节数组的补充。

Answer 6

回答by Narendra

You can do it by using Apache Commons IOwithout worrying about internal details.

您可以通过使用来做到这一点，Apache Commons IO而无需担心内部细节。

Use org.apache.commons.io.FileUtils.readFileToByteArray(File?file)which return data of type byte[].

使用org.apache.commons.io.FileUtils.readFileToByteArray(File?file)which 类型的返回数据byte[]。

Click here for Javadoc

单击此处获取 Javadoc

Answer 7

回答by Sridhar

This works for me:

这对我有用：

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){
    byte[] buffer = new byte[1024];
    int bytesRead;
    while((bytesRead = pdfin.read(buffer))!=-1){
        pdfout.write(buffer,0,bytesRead);
    }
}

But Jon's answer doesn't work for me if used in the following way:

但是，如果按以下方式使用，乔恩的回答对我不起作用：

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){

    int k = readFully(pdfin).length;
    System.out.println(k);
}

Outputs zero as length. Why is that ?

输出零作为长度。这是为什么？

Answer 8

回答by gorbysbm

None of these worked for us, possibly because our inputstreamwas bytes from a rest call, and not from a locally hosted pdf file. What worked was using RestAssuredto read the PDF as an input stream, and then using Tika pdf reader to parse it and then call the toString()method.

这些都不为我们工作，可能是因为我们inputstream是byte从一个REST调用S，而不是从本地托管的PDF文件。有效的是使用RestAssured将 PDF 作为输入流读取，然后使用 Tika pdf reader 解析它，然后调用该toString()方法。

import com.jayway.restassured.RestAssured;
import com.jayway.restassured.response.Response;
import com.jayway.restassured.response.ResponseBody;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.parser.Parser;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;

            InputStream stream = response.asInputStream();
            Parser parser = new AutoDetectParser(); // Should auto-detect!
            ContentHandler handler = new BodyContentHandler();
            Metadata metadata = new Metadata();
            ParseContext context = new ParseContext();

            try {
                parser.parse(stream, handler, metadata, context);
            } finally {
                stream.close();
            }
            for (int i = 0; i < metadata.names().length; i++) {
                String item = metadata.names()[i];
                System.out.println(item + " -- " + metadata.get(item));
            }

            System.out.println("!!Printing pdf content: \n" +handler.toString());
            System.out.println("content type: " + metadata.get(Metadata.CONTENT_TYPE));

Answer 9

回答by Chris Clark

Java 7 introduced Files.readAllBytes(), which can read a PDF into a byte[]like so:

Java 7 引入了Files.readAllBytes()，它可以将 PDF 读取成byte[]这样：

import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;

Path pdfPath = Paths.get("/path/to/file.pdf");
byte[] pdf = Files.readAllBytes(pdfPath);

EDIT:

编辑：

Thanks Farooque for pointing out: this will work for reading any kind of file, not just PDFs. All files are ultimately just a bunch of bytes, and as such can be read into a byte[].

感谢 Farooque 指出：这适用于阅读任何类型的文件，而不仅仅是 PDF。所有文件最终都只是一堆字节，因此可以读入byte[].

Answer 10

回答by Sami Yousif

public static void main(String[] args) throws FileNotFoundException, IOException {
        File file = new File("java.pdf");

        FileInputStream fis = new FileInputStream(file);
        //System.out.println(file.exists() + "!!");
        //InputStream in = resource.openStream();
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        byte[] buf = new byte[1024];
        try {
            for (int readNum; (readNum = fis.read(buf)) != -1;) {
                bos.write(buf, 0, readNum); //no doubt here is 0
                //Writes len bytes from the specified byte array starting at offset off to this byte array output stream.
                System.out.println("read " + readNum + " bytes,");
            }
        } catch (IOException ex) {
            Logger.getLogger(genJpeg.class.getName()).log(Level.SEVERE, null, ex);
        }
        byte[] bytes = bos.toByteArray();

        //below is the different part
        File someFile = new File("java2.pdf");
        FileOutputStream fos = new FileOutputStream(someFile);
        fos.write(bytes);
        fos.flush();
        fos.close();
    }

Java PDF 到字节数组，反之亦然

提问by

回答by plinth

回答by Mark

回答by Eric Petroelje

回答by Jon Skeet

回答by David

回答by Narendra

回答by Sridhar

回答by gorbysbm

回答by Chris Clark

回答by Sami Yousif

相关推荐

最近更新

标签

Java PDF 到字节数组，反之亦然

提问by

回答by plinth

回答by Mark

回答by Eric Petroelje

回答by Jon Skeet

回答by David

回答by Narendra

回答by Sridhar

回答by gorbysbm

回答by Chris Clark

回答by Sami Yousif

相关推荐

Java 如何使用 maven-compiler-plugin 配置 Lombok？

Java：未解决的编译问题

Java 简单的薪资计划

Java 如何使用 Spring RestTemplate 调用 HTTPS RESTful Web 服务

相关推荐

最近更新

标签