使用 Python 将 Pandas DataFrame 导出为 PDF 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33155776/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:53:21  来源:igfitidea点击:

Export Pandas DataFrame into a PDF file using Python

pythonpdfpandasreportlabpypdf

提问by b8con

What is an efficient way to generate PDF for data frames in Pandas?

为 Pandas 中的数据框生成 PDF 的有效方法是什么?

采纳答案by wgwz

Well one way is to use markdown. You can use df.to_html(). This converts the dataframe into a html table. From there you can put the generated html into a markdown file (.md) (see http://daringfireball.net/projects/markdown/basics). From there, there are utilities to convert markdown into a pdf (https://www.npmjs.com/package/markdown-pdf).

一种方法是使用降价。您可以使用df.to_html(). 这会将数据框转换为 html 表。从那里您可以将生成的 html 放入降价文件 (.md)(请参阅http://daringfireball.net/projects/markdown/basics)。从那里,有一些实用程序可以将 Markdown 转换为 pdf ( https://www.npmjs.com/package/markdown-pdf)。

One all-in-one tool for this method is to use Atom text editor (https://atom.io/). There you can use an extension, search "markdown to pdf", which will make the conversion for you.

这种方法的一种多合一工具是使用 Atom 文本编辑器 ( https://atom.io/)。在那里您可以使用扩展名,搜索“markdown to pdf”,这将为您进行转换。

Note: When using to_html()recently I had to remove extra '\n' characters for some reason. I chose to use Atom -> Find -> '\n' -> Replace "".

注意:to_html()最近使用时,由于某种原因,我不得不删除额外的 '\n' 字符。我选择使用Atom -> Find -> '\n' -> Replace "".

Overall this should do the trick!

总的来说,这应该可以解决问题!

回答by Dalibor

Here is how I do it from sqlite database using sqlite3, pandas and pdfkit

这是我如何使用 sqlite3、pandas 和pdfkit从 sqlite 数据库中执行此操作

import pandas as pd
import pdfkit as pdf
import sqlite3

con=sqlite3.connect("baza.db")

df=pd.read_sql_query("select * from dobit", con)
df.to_html('/home/linux/izvestaj.html')
nazivFajla='/home/linux/pdfPrintOut.pdf'
pdf.from_file('/home/linux/izvestaj.html', nazivFajla)

回答by mit

This is a solution with an intermediate pdf file.

这是一个带有中间pdf文件的解决方案。

The table is pretty printed with some minimal css.

该表格非常漂亮,带有一些最小的 css。

The pdf conversion is done with weasyprint. You need to pip install weasyprint.

pdf 转换是用 weasyprint 完成的。你需要pip install weasyprint

# Create a pandas dataframe with demo data:
import pandas as pd
demodata_csv = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
df = pd.read_csv(demodata_csv)

# Pretty print the dataframe as an html table to a file
intermediate_html = '/tmp/intermediate.html'
to_html_pretty(df,intermediate_html,'Iris Data')
# if you do not want pretty printing, just use pandas:
# df.to_html(intermediate_html)

# Convert the html file to a pdf file using weasyprint
import weasyprint
out_pdf= '/tmp/demo.pdf'
weasyprint.HTML(intermediate_html).write_pdf(out_pdf)

# This is the table pretty printer used above:

def to_html_pretty(df, filename='/tmp/out.html', title=''):
    '''
    Write an entire dataframe to an HTML file
    with nice formatting.
    Thanks to @stackoverflowuser2010 for the
    pretty printer see https://stackoverflow.com/a/47723330/362951
    '''
    ht = ''
    if title != '':
        ht += '<h2> %s </h2>\n' % title
    ht += df.to_html(classes='wide', escape=False)

    with open(filename, 'w') as f:
         f.write(HTML_TEMPLATE1 + ht + HTML_TEMPLATE2)

HTML_TEMPLATE1 = '''
<html>
<head>
<style>
  h2 {
    text-align: center;
    font-family: Helvetica, Arial, sans-serif;
  }
  table { 
    margin-left: auto;
    margin-right: auto;
  }
  table, th, td {
    border: 1px solid black;
    border-collapse: collapse;
  }
  th, td {
    padding: 5px;
    text-align: center;
    font-family: Helvetica, Arial, sans-serif;
    font-size: 90%;
  }
  table tbody tr:hover {
    background-color: #dddddd;
  }
  .wide {
    width: 90%; 
  }
</style>
</head>
<body>
'''

HTML_TEMPLATE2 = '''
</body>
</html>
'''

Thanks to @stackoverflowuser2010 for the pretty printer, see stackoverflowuser2010's answer https://stackoverflow.com/a/47723330/362951

感谢 @stackoverflowuser2010 提供漂亮的打印机,请参阅 stackoverflowuser2010 的回答https://stackoverflow.com/a/47723330/362951

I did not use pdfkit, because I had some problems with it on a headless machine. But weasyprint is great.

我没有使用 pdfkit,因为我在无头机器上遇到了一些问题。但是 weasyprint 很棒。

回答by user3226167

First plot table with matplotlibthen generate pdf

第一个绘图表,matplotlib然后生成 pdf

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

df = pd.DataFrame(np.random.random((10,3)), columns = ("col 1", "col 2", "col 3"))

#https://stackoverflow.com/questions/32137396/how-do-i-plot-only-a-table-in-matplotlib
fig, ax =plt.subplots(figsize=(12,4))
ax.axis('tight')
ax.axis('off')
the_table = ax.table(cellText=df.values,colLabels=df.columns,loc='center')

#https://stackoverflow.com/questions/4042192/reduce-left-and-right-margins-in-matplotlib-plot
pp = PdfPages("foo.pdf")
pp.savefig(fig, bbox_inches='tight')
pp.close()

reference:

参考:

How do I plot only a table in Matplotlib?

如何在 Matplotlib 中仅绘制表格?

Reduce left and right margins in matplotlib plot

减少 matplotlib 图中的左右边距