如何使用Python将网页转换为PDF

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23359083/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:49:23  来源:igfitidea点击:

How to convert webpage into PDF by using Python

pythonpdfwebpageqprinter

提问by Mark K

I was finding solution to print webpage into local file PDF, using Python. one of the good solution is to use Qt, found here, https://bharatikunal.wordpress.com/2010/01/.

我正在寻找使用 Python 将网页打印到本地文件 PDF 的解决方案。一个好的解决方案是使用 Qt,在此处找到,https://bharatikunal.wordpress.com/2010/01/

It didn't work at the beginning as I had problem with the installation of PyQt4 because it gave error messages such as 'ImportError: No module named PyQt4.QtCore', and 'ImportError: No module named PyQt4.QtCore'.

它在开始时不起作用,因为我在安装 PyQt4 时遇到了问题,因为它给出了诸如“ImportError: No module named PyQt4.QtCore”和“ImportError: No module named PyQt4.QtCore”之类的错误消息。

It was because PyQt4's not installed properly. I used to have the libraries located at C:\Python27\Lib however it's not for PyQt4.

这是因为 PyQt4 没有正确安装。我曾经将库位于 C:\Python27\Lib 但它不适用于 PyQt4。

In fact, it simply needs to download from http://www.riverbankcomputing.com/software/pyqt/download(mind the correct Python version you are using), and install it to C:\Python27 (my case). That's it.

事实上,它只需要从http://www.riverbankcomputing.com/software/pyqt/download下载(注意你使用的正确 Python 版本),并将其安装到 C:\Python27(我的情况)。就是这样。

Now the scripts runs fine so I want to share it. for more options in using Qprinter, please refer to http://qt-project.org/doc/qt-4.8/qprinter.html#Orientation-enum.

现在脚本运行良好,所以我想分享它。有关使用 Qpr​​inter 的更多选项,请参阅http://qt-project.org/doc/qt-4.8/qprinter.html#Orientation-enum

采纳答案by Mark K

thanks to below posts, and I am able to add on the webpage link address to be printed and present time on the PDF generated, no matter how many pages it has.

感谢以下帖子,我可以在生成的 PDF 上添加要打印的网页链接地址和显示时间,无论它有多少页。

Add text to Existing PDF using Python

使用 Python 将文本添加到现有 PDF

https://github.com/disflux/django-mtr/blob/master/pdfgen/doc_overlay.py

https://github.com/disflux/django-mtr/blob/master/pdfgen/doc_overlay.py

To share the script as below:

分享脚本如下:

import time
from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
from xhtml2pdf import pisa
import sys 
from PyQt4.QtCore import *
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

url = 'http://www.yahoo.com'
tem_pdf = "c:\tem_pdf.pdf"
final_file = "c:\younameit.pdf"

app = QApplication(sys.argv)
web = QWebView()
#Read the URL given
web.load(QUrl(url))
printer = QPrinter()
#setting format
printer.setPageSize(QPrinter.A4)
printer.setOrientation(QPrinter.Landscape)
printer.setOutputFormat(QPrinter.PdfFormat)
#export file as c:\tem_pdf.pdf
printer.setOutputFileName(tem_pdf)

def convertIt():
    web.print_(printer)
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)

app.exec_()
sys.exit

# Below is to add on the weblink as text and present date&time on PDF generated

outputPDF = PdfFileWriter()
packet = StringIO.StringIO()
# create a new PDF with Reportlab
can = canvas.Canvas(packet, pagesize=letter)
can.setFont("Helvetica", 9)
# Writting the new line
oknow = time.strftime("%a, %d %b %Y %H:%M")
can.drawString(5, 2, url)
can.drawString(605, 2, oknow)
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file(tem_pdf, "rb"))
pages = existing_pdf.getNumPages()
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
for x in range(0,pages):
    page = existing_pdf.getPage(x)
    page.mergePage(new_pdf.getPage(0))
    output.addPage(page)
# finally, write "output" to a real file
outputStream = file(final_file, "wb")
output.write(outputStream)
outputStream.close()

print final_file, 'is ready.'

回答by Mark K

here is the one working fine:

这是一个工作正常的:

import sys 
from PyQt4.QtCore import *
from PyQt4.QtGui import * 
from PyQt4.QtWebKit import * 

app = QApplication(sys.argv)
web = QWebView()
web.load(QUrl("http://www.yahoo.com"))
printer = QPrinter()
printer.setPageSize(QPrinter.A4)
printer.setOutputFormat(QPrinter.PdfFormat)
printer.setOutputFileName("fileOK.pdf")

def convertIt():
    web.print_(printer)
    print("Pdf generated")
    QApplication.exit()

QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)
sys.exit(app.exec_())

回答by NorthCat

You also can use pdfkit:

您也可以使用pdfkit

Usage

用法

import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')

Install

安装

MacOS: brew install Caskroom/cask/wkhtmltopdf

苹果系统: brew install Caskroom/cask/wkhtmltopdf

Debian/Ubuntu: apt-get install wkhtmltopdf

Debian/Ubuntu: apt-get install wkhtmltopdf

See official documentation for MacOS/Ubuntu/other OS: https://github.com/JazzCore/python-pdfkit/wiki/Installing-wkhtmltopdf

请参阅 MacOS/Ubuntu/其他操作系统的官方文档:https: //github.com/JazzCore/python-pdfkit/wiki/Installing-wkhtmltopdf

回答by Jim Paul

Here is a simple solution using QT. I found this as part of an answer to a different question on StackOverFlow. I tested it on Windows.

这是一个使用 QT 的简单解决方案。我发现这是对 StackOverFlow 上不同问题的回答的一部分。我在 Windows 上测试过。

from PyQt4.QtGui import QTextDocument, QPrinter, QApplication

import sys
app = QApplication(sys.argv)

doc = QTextDocument()
location = "c://apython//Jim//html//notes.html"
html = open(location).read()
doc.setHtml(html)

printer = QPrinter()
printer.setOutputFileName("foo.pdf")
printer.setOutputFormat(QPrinter.PdfFormat)
printer.setPageSize(QPrinter.A4);
printer.setPageMargins (15,15,15,15,QPrinter.Millimeter);

doc.print_(printer)
print "done!"

回答by JohnMudd

WeasyPrint

打印

pip install weasyprint  # No longer supports Python 2.x.

python
>>> pdf = weasyprint.HTML('http://www.google.com').write_pdf()
>>> len(pdf)
92059
>>> file('google.pdf', 'wb').write(pdf)

回答by Mark K

I tried @NorthCat answer using pdfkit.

我尝试使用 pdfkit 回答@NorthCat。

It required wkhtmltopdf to be installed. The install can be downloaded from here. https://wkhtmltopdf.org/downloads.html

它需要安装 wkhtmltopdf。可以从这里下载安装。https://wkhtmltopdf.org/downloads.html

Install the executable file. Then write a line to indicate where wkhtmltopdf is, like below. (referenced from Can't create pdf using python PDFKIT Error : " No wkhtmltopdf executable found:"

安装可执行文件。然后写一行来指示 wkhtmltopdf 的位置,如下所示。(引用自无法使用 python PDFKIT 创建 pdf 错误:“找不到 wkhtmltopdf 可执行文件:”

import pdfkit


path_wkthmltopdf = "C:\Folder\where\wkhtmltopdf.exe"
config = pdfkit.configuration(wkhtmltopdf = path_wkthmltopdf)

pdfkit.from_url("http://google.com", "out.pdf", configuration=config)