使用 PHP 将 Word doc、docx 和 Excel xls、xlsx 转换为 PDF

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5538584/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 21:45:29  来源:igfitidea点击:

Convert Word doc, docx and Excel xls, xlsx to PDF with PHP

phpexcelms-wordpdf-generation

提问by saulposel

I am looking for a way to convert Word and Excel files to PDF using PHP.

我正在寻找一种使用 PHP 将 Word 和 Excel 文件转换为 PDF 的方法。

The reason for this, is I need to be able to combine files of various formats into one document. I know that if I am able to convert everything to PDF I can then merge the PDFs into one file using PDFMerger (which uses fpdf).

这样做的原因是我需要能够将各种格式的文件合并到一个文档中。我知道,如果我能够将所有内容都转换为 PDF,我就可以使用 PDFMerger(使用 fpdf)将 PDF 合并为一个文件。

I am already able to create PDFs from other file types / images, but am stuck with Word Docs. (I think I would possibly be able to convert the Excel files using the PHPExcel library that I already use to create Excel files from html code).

我已经能够从其他文件类型/图像创建 PDF,但我坚持使用 Word Docs。(我想我可能能够使用 PHPExcel 库转换 Excel 文件,我已经使用该库从 html 代码创建 Excel 文件)。

I do not use the Zend Framework, so am hoping that someone will be able to point me in the right direction.

我不使用 Zend 框架,所以我希望有人能够指出我正确的方向。

Alternatively, if there is a way to create image (jpg) files from the Word documents, that would be workable.

或者,如果有一种方法可以从 Word 文档创建图像 (jpg) 文件,那将是可行的。

Thanks for any help!

谢谢你的帮助!

采纳答案by saulposel

I found a solution to my issue and after a request, will post it here to help others. Apologies if I missed any details, it's been a while since I worked on this solution.

我找到了我的问题的解决方案,并在收到请求后将其发布在这里以帮助其他人。抱歉,如果我错过了任何细节,我已经有一段时间没有研究这个解决方案了。

The first thing that is required is to install Openoffice.orgon the server. I requested my hosting provider to install the open office RPM on my VPS. This can be done through WHM directly.

需要做的第一件事是在服务器上安装Openoffice.org。我要求我的托管服务提供商在我的 VPS 上安装开放式办公室 RPM。这可以直接通过 WHM 完成。

Now that the server has the capability to handle MS Office files you are able to convert the files by executing command line instructions via PHP. To handle this, I found PyODConverter: https://github.com/mirkonasato/pyodconverter

既然服务器具有处理 MS Office 文件的能力,您就可以通过 PHP 执行命令行指令来转换文件。为了解决这个问题,我找到了PyODConverterhttps: //github.com/mirkonasato/pyodconverter

I created a directory on the server and placed the PyODConverter python file within it. I also created a plain text file above the web root (I named it "adocpdf"), with the following command line instructions in it:

我在服务器上创建了一个目录并将 PyODConverter python 文件放在其中。我还在网络根目录上方创建了一个纯文本文件(我将其命名为“adocpdf”),其中包含以下命令行说明:

directory=
filename=
extension=
SERVICE='soffice'
if [ "`ps ax|grep -v grep|grep -c $SERVICE`" -lt 1 ]; then 
unset DISPLAY
/usr/bin/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard & 
sleep 5s
fi
python /home/website/python/DocumentConverter.py /home/website/$directory$filename$extension /home/website/$directory$filename.pdf

This checks that the openoffice.org libraries are running and then calls the PyODConverter script to process the file and output it as a PDF. The 3 variables on the first three lines are provided when the script is executed from with a PHP file. The delay ("sleep 5s") is used to ensure that openoffice.org has enough to time to initiate if required. I have used this for months now and the 5s gap seems to give enough breathing room.

这会检查 openoffice.org 库是否正在运行,然后调用 PyODConverter 脚本来处理文件并将其输出为 PDF。前三行的 3 个变量是在使用 PHP 文件执行脚本时提供的。延迟(“睡眠 5 秒”)用于确保 openoffice.org 有足够的时间在需要时启动。我已经用了几个月了,5s 的差距似乎给了足够的喘息空间。

The script will create a PDF version of the document in the same directory as the original.

该脚本将在与原始文档相同的目录中创建文档的 PDF 版本。

Finally, initiating the conversion of a Word / Excel file from within PHP (I have it within a function that checks if the file we are dealing with is a word / excel document)...

最后,从 PHP 中启动 Word/Excel 文件的转换(我在一个函数中使用它来检查我们正在处理的文件是否为 word/excel 文档)...

//use openoffice.org
$output = array();
$return_var = 0;
exec("/opt/adocpdf {$directory} {$filename} {$extension}", $output, $return_var);

This PHP function is called once the Word / Excel file has been uploaded to the server. The 3 variables in the exec() call relate directly to the 3 at the start of the plain text script above. Note that the $directory variable requires no leading forward slash if the file for conversion is within the web root.

一旦 Word/Excel 文件上传到服务器,就会调用这个 PHP 函数。exec() 调用中的 3 个变量与上面纯文本脚本开头的 3 个直接相关。请注意,如果要转换的文件位于 Web 根目录中,则 $directory 变量不需要前导正斜杠。

OK, that's it! Hopefully this will be useful to someone and save them the difficulties and learning curve I faced.

好的,就是这样!希望这对某人有用,并为他们节省我面临的困难和学习曲线。

回答by dbf

Well my 2 cents when it comes to the topic word 2007 docx, word 97-2004 doc, pdfand all other types of MS Office wishing to be "converted from yto zbut in real they don't wanna be". In my experience so far, conversion with LibreOffice or OpenOffice can't be relied on. Though .docdocuments tend to be better supported than word 2007's .docx. In general it's very hard to convert the .docxto .docwithout breaking anything.

那么我的2美分,当谈到主题词2007年docx,字97-2004 docpdf和所有其他类型的MS Office的希望是“从转换yz,但在现实,他们不希望是”。根据我目前的经验,不能依赖 LibreOffice 或 OpenOffice 的转换。尽管.doc文档往往比 word 2007 的.docx. 总的来说这是非常难的转换.docx,以.doc不破坏任何东西。

.docxalso tend to be extremely useful for templating where .docis not for being binary.

.docx也往往对于模板化非常有用,.doc而不是二进制。

The conversion from .docto PDF was most of the time quite reliable. If you can still influence the design or content of the word document then this might be satisfying, but in my situation documents were supplied from foreign companies where even after generating the .docxtemplates, in some scenario's, the generated .docxhad to be slightly modified with supplement text before it was generated to a PDF.

.doc到 PDF的转换在大多数情况下是非常可靠的。如果您仍然可以影响 word 文档的设计或内容,那么这可能会令人满意,但在我的情况下,文档是由外国公司提供的,即使在生成.docx模板后,在某些情况下,生成的模板.docx也必须用补充文本稍加修改在它生成为 PDF 之前。



WINDOWS BASED!

基于WINDOWS!

All this hiccup made me come to the conclusion that the only true reliable conversion method I found was using the COMclass in PHP and let the MS Word or Excel Application do all the work for you. I'll just give an example on converting .docxto .docand/or PDF. If you do not have MS Office installed, you can download a trialversion of 60 days which would give you enough room for testing purposes.

所有这些小问题让我得出结论,我找到的唯一真正可靠的转换方法是使用PHP 中的COM类,让 MS Word 或 Excel 应用程序为您完成所有工作。我将仅举一个转换.docx.doc和/或 PDF的示例。如果您没有安装 MS Office,您可以下载60 天的试用版,这将为您提供足够的测试空间。

the COM.net extension is by default commented out in the php.ini, just search for the line php_com_dotnet.dlland uncomment it like so

默认情况下,COM.net 扩展名在 中被注释掉php.ini,只需搜索该行php_com_dotnet.dll并像这样取消注释

  extension=php_com_dotnet.dll

Restart the web server (IIS is not a pre, Apache will work just as well).

重新启动 Web 服务器(IIS 不是预安装,Apache 也能正常工作)。

The code below is a demonstration on how easy it is.

下面的代码演示了它是多么容易。

  $word = new COM("Word.Application") or die ("Could not initialise Object.");
  // set it to 1 to see the MS Word window (the actual opening of the document)
  $word->Visible = 0;
  // recommend to set to 0, disables alerts like "Do you want MS Word to be the default .. etc"
  $word->DisplayAlerts = 0;
  // open the word 2007-2013 document 
  $word->Documents->Open('yourdocument.docx');
  // save it as word 2003
  $word->ActiveDocument->SaveAs('newdocument.doc');
  // convert word 2007-2013 to PDF
  $word->ActiveDocument->ExportAsFixedFormat('yourdocument.pdf', 17, false, 0, 0, 0, 0, 7, true, true, 2, true, true, false);
  // quit the Word process
  $word->Quit(false);
  // clean up
  unset($word);

This is just a small demonstration. I can just say that if it comes to conversion, this was the only real reliable option I could use and even recommend.

这只是一个小示范。我只能说,如果涉及转换,这是我可以使用甚至推荐的唯一真正可靠的选择。

回答by Vineesh Kalarickal

1) I am using WAMP.

1) 我正在使用 WAMP。

2) I have installed Open Office (from apache http://www.openoffice.org/download/).

2)我已经安装了 Open Office(来自 apache http://www.openoffice.org/download/)。

3) $output_dir = "C:/wamp/www/projectfolder/";this is my project folder where i want to create output file.

3)$output_dir = "C:/wamp/www/projectfolder/";这是我要在其中创建输出文件的项目文件夹。

4) I have already placed my input file here C:/wamp/www/projectfolder/wordfile.docx";

4)我已经把我的输入文件放在这里 C:/wamp/www/projectfolder/wordfile.docx";

Then I Run My Code.. (given below)

然后我运行我的代码..(如下所示)

<?php
    set_time_limit(0);
    function MakePropertyValue($name,$value,$osm){
    $oStruct = $osm->Bridge_GetStruct("com.sun.star.beans.PropertyValue");
    $oStruct->Name = $name;
    $oStruct->Value = $value;
    return $oStruct;
    }
    function word2pdf($doc_url, $output_url){

    //Invoke the OpenOffice.org service manager
    $osm = new COM("com.sun.star.ServiceManager") or die ("Please be sure that OpenOffice.org is installed.\n");
    //Set the application to remain hidden to avoid flashing the document onscreen
    $args = array(MakePropertyValue("Hidden",true,$osm));
    //Launch the desktop
    $oDesktop = $osm->createInstance("com.sun.star.frame.Desktop");
    //Load the .doc file, and pass in the "Hidden" property from above
    $oWriterDoc = $oDesktop->loadComponentFromURL($doc_url,"_blank", 0, $args);
    //Set up the arguments for the PDF output
    $export_args = array(MakePropertyValue("FilterName","writer_pdf_Export",$osm));
    //print_r($export_args);
    //Write out the PDF
    $oWriterDoc->storeToURL($output_url,$export_args);
    $oWriterDoc->close(true);
    }

    $output_dir = "C:/wamp/www/projectfolder/";
    $doc_file = "C:/wamp/www/projectfolder/wordfile.docx";
    $pdf_file = "outputfile_name.pdf";

    $output_file = $output_dir . $pdf_file;
    $doc_file = "file:///" . $doc_file;
    $output_file = "file:///" . $output_file;
    word2pdf($doc_file,$output_file);
    ?>

回答by Robert Hyatt

I successfully put a portable version of libreoffice on my host's webserver, which I call with PHP to do a commandline conversion from .docx, etc. to pdf. on the fly. I do not have admin rights on my host's webserver. Here is my blog post of what I did:

我成功地将 libreoffice 的便携式版本放在我主机的网络服务器上,我用 PHP 调用它来执行从 .docx 等到 pdf 的命令行转换。在飞行中。我在主机的网络服务器上没有管理员权限。这是我的博客文章,介绍了我所做的事情:

http://geekswithblogs.net/robertphyatt/archive/2011/11/19/converting-.docx-to-pdf-or-.doc-to-pdf-or-.doc.aspx

http://geekswithblogs.net/robertphyatt/archive/2011/11/19/converting-.docx-to-pdf-or-.doc-to-pdf-or-.doc.aspx

Yay! Convert directly from .docx or .odt to .pdf using PHP with LibreOffice (OpenOffice's successor)!

好极了!使用 PHP 和 LibreOffice(OpenOffice 的继任者)直接从 .docx 或 .odt 转换为 .pdf!

回答by Jeroen Ritmeijer

Open Office / LibreOffice based solutions will do an OK job, but don't expect your PDFs to resemble your source files if they were created in MS-Office. A PDF that looks 90% like the original is not considered to be acceptable in many fields.

基于 Open Office / LibreOffice 的解决方案会做的不错,但如果 PDF 是在 MS-Office 中创建的,则不要指望它们与源文件相似。看起来与原始文件 90% 相似的 PDF 在许多领域被认为是不可接受的。

The only way to make sure your PDFs look exactly like the originals is to use a solution that uses the official MS-Office DLLs under the hood. If you are running your PHP solution on non-Windows based servers then it requires an additional Windows Server. This may be a showstopper, but if you really care about the look and feel of your PDFs you may not have an option.

确保您的 PDF 看起来与原件完全一样的唯一方法是使用一种在幕后使用官方 MS-Office DLL 的解决方案。如果您在非基于 Windows 的服务器上运行 PHP 解决方案,则需要额外的 Windows Server。这可能是一个亮点,但如果您真的关心 PDF 的外观和感觉,您可能没有选择。

Have a look at this blog post. It shows how to use PHP to convert MS-Office files with a high level of fidelity.

看看这篇博文。它展示了如何使用 PHP 以高保真度转换 MS-Office 文件。

Disclaimer: I wrote this blog post and worked on a related commercial product, so consider me biased. However, it appears to be a great solution for the PHP people I work with.

免责声明:我写了这篇博文并致力于相关的商业产品,所以认为我有偏见。但是,对于与我一起工作的 PHP 人员来说,它似乎是一个很好的解决方案。

回答by Sandip Patel

Step 1. Install "Apache_OpenOffice_4.1.2" in your system Step 2. Download "unoconv" library from github or any where else.

步骤 1. 在您的系统中安装“Apache_OpenOffice_4.1.2” 步骤 2. 从 github 或其他任何地方下载“unoconv”库。

-> C:\Program Files (x86)\OpenOffice 4\program\python.exe = Path of open office install directory

-> D:\wamp\www\doc_to_pdf\libobasis4.4-pyuno\unoconv = Path of library folder

-> D:/wamp/www/doc_to_pdf/files/'.$pdf_File_name.' = path and file name of pdf

-> D:/wamp/www/doc_to_pdf/files/'.$doc_file_name = Path of your document file.

-> C:\Program Files (x86)\OpenOffice 4\program\python.exe = 打开办公安装目录的路径

-> D:\wamp\www\doc_to_pdf\libobasis4.4-pyuno\unoconv = 库文件夹路径

-> D:/wamp/www/doc_to_pdf/files/'.$pdf_File_name.' = pdf的路径和文件名

-> D:/wamp/www/doc_to_pdf/files/'.$doc_file_name = 文档文件的路径。

If pdf not created than last step is Go to ->Control Panel\All Control Panel Items\Administrative Tools-> services-> find "wampapache" -> right click and click on property -> click on logon tab Than check checkbox of allow service to interact with desktop

如果没有创建 pdf 比最后一步是转到->控制面板\所有控制面板项目\管理工具->服务->找到“wampapache”->右键单击并单击属性->单击登录选项卡比允许复选框与桌面交互的服务

Create sample .php file and put below code and run on wamp or xampp server

创建示例 .php 文件并放入以下代码并在 wamp 或 xampp 服务器上运行

$result = exec('"C:\Program Files (x86)\OpenOffice 4\program\python.exe" D:\wamp\www\doc_to_pdf\libobasis4.4-pyuno\unoconv -f pdf -o D:/wamp/www/doc_to_pdf/files/'.$pdf_File_name.' D:/wamp/www/doc_to_pdf/files/'.$doc_file_name);

This code working for me in windows-8 operating system

这段代码在 windows-8 操作系统中对我有用

回答by Sunil kumar

I have found some solution after so much googling. You can also try it if tired to search for a good solution.

经过如此多的谷歌搜索,我找到了一些解决方案。如果厌倦了寻找好的解决方案,您也可以尝试一下。

For common using SOAP API

常见的使用 SOAP API

You need username and password to make SOAP request on https://www.livedocx.com

Make registration using this https://www.livedocx.com/user/account_registration.aspxand follow the steps accordingly.

Use below code in your .php file.

您需要用户名和密码才能在https://www.livedocx.com上发出 SOAP 请求

使用此https://www.livedocx.com/user/account_registration.aspx 进行注册 并按照相应的步骤操作。

在您的 .php 文件中使用以下代码。

ini_set ('soap.wsdl_cache_enabled', 0);

// you will get this username and pass while register
define ('USERNAME', 'Username'); 
define ('PASSWORD', 'Password');

// SOAP WSDL endpoint
define ('ENDPOINT', 'https://api.livedocx.com/2.1/mailmerge.asmx?wsdl');

// Define timezone
date_default_timezone_set('Europe/Berlin');
$soap = new SoapClient(ENDPOINT);
$soap->LogIn(
    array(
        'username' => USERNAME,
        'password' => PASSWORD
    )
);
$data = file_get_contents('test.doc');
$soap->SetLocalTemplate(
    array(
        'template' => base64_encode($data),
        'format'   => 'doc'
    )
);
$soap->CreateDocument();
$result = $soap->RetrieveDocument(
    array(
        'format' => 'pdf'
    )
);
$data = $result->RetrieveDocumentResult;
file_put_contents('tree.pdf', base64_decode($data));
$soap->LogOut();
unset($soap);

Follow this link for more information http://www.phplivedocx.org/

点击此链接了解更多信息http://www.phplivedocx.org/

For Ubuntu

对于 Ubuntu

OpenOffice and Unoconv installation Required.

from command prompt

需要安装 OpenOffice 和 Unoconv。

从命令提示符

apt-get remove --purge unoconv
git clone https://github.com/dagwieers/unoconv
cd unoconv
sudo make install

Now add below code in your PHP script and make sure file should be executable.

现在在你的 PHP 脚本中添加以下代码并确保文件应该是可执行的。

shell_exec('/usr/bin/unoconv -f pdf  folder/test.docx');
shell_exec('/usr/bin/unoconv -f pdf  folder/sachin.png');

Hope this solution help you.

希望此解决方案对您有所帮助。

回答by Chris Whyley

For a PHP-specific you could try PHPWord- this library is written in pure PHP and provides a set of classes to write to and read from different document file formats (including .doc and .docx). The main drawback is that the quality of converted files can be quite variable.

对于特定于 PHP 的,您可以尝试PHPWord- 这个库是用纯 PHP 编写的,并提供一组类来写入和读取不同的文档文件格式(包括 .doc 和 .docx)。主要缺点是转换后的文件的质量可能会有很大差异。

Alternatively if you want a higher quality option you could use a file conversion API like Zamzar. You can use it to convert a wide rangeof office formats (and others) into PDF, and you can call from any platform (Windows, Linux, OS X etc).

或者,如果您想要更高质量的选项,您可以使用像Zamzar这样的文件转换 API 。您可以使用它将各种办公格式(和其他格式)转换为 PDF,并且您可以从任何平台(Windows、Linux、OS X 等)调用。

PHP code to convert a file would look like this:

用于转换文件的 PHP 代码如下所示:

<?php
$endpoint = "https://api.zamzar.com/v1/jobs";
$apiKey = "API_KEY";
$sourceFilePath = "/my.doc"; // Or docx/xls/xlsx etc
$targetFormat = "pdf";

$postData = array(
  "source_file" => $sourceFile,
  "target_format" => $targetFormat
);

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $endpoint);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST');
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_SAFE_UPLOAD, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERPWD, $apiKey . ":");
$body = curl_exec($ch);
curl_close($ch);

$response = json_decode($body, true);
print_r($response);
?>

Full disclosure: I'm the lead developer for the Zamzar API.

完全披露:我是 Zamzar API 的首席开发人员。

回答by Marcelo A

Another way to do this, is using directly a parameter on the libreoffice command:

另一种方法是直接使用 libreoffice 命令上的参数:

libreoffice --convert-to pdf /path/to/file.{doc,docx}

回答by A br

The easiest way to do this in my experience is with the Cloudmersive free native PHP library, just call convertDocumentDocxToPdf:

根据我的经验,最简单的方法是使用 Cloudmersive 免费原生 PHP 库,只需调用 convertDocumentDocxToPdf:

<?php
require_once(__DIR__ . '/vendor/autoload.php');

// Configure API key authorization: Apikey
$config = Swagger\Client\Configuration::getDefaultConfiguration()->setApiKey('Apikey', 'YOUR_API_KEY');



$apiInstance = new Swagger\Client\Api\ConvertDocumentApi(


    new GuzzleHttp\Client(),
    $config
);
$input_file = "/path/to/file.txt"; // \SplFileObject | Input file to perform the operation on.

try {
    $result = $apiInstance->convertDocumentDocxToPdf($input_file);
    print_r($result);
} catch (Exception $e) {
    echo 'Exception when calling ConvertDocumentApi->convertDocumentDocxToPdf: ', $e->getMessage(), PHP_EOL;
}
?>

Be sure to replace $input_file with the appropriate file path. You can also configure it to use a byte array if you prefer to do it that way. The result will be the bytes of the converted PDF file.

请务必将 $input_file 替换为适当的文件路径。如果您喜欢这样做,您也可以将其配置为使用字节数组。结果将是转换后的 PDF 文件的字节数。