使用 Python 将 Pandas DataFrame 导出为 PDF 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33155776/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Export Pandas DataFrame into a PDF file using Python
提问by b8con
What is an efficient way to generate PDF for data frames in Pandas?
为 Pandas 中的数据框生成 PDF 的有效方法是什么?
采纳答案by wgwz
Well one way is to use markdown. You can use df.to_html()
. This converts the dataframe into a html table. From there you can put the generated html into a markdown file (.md) (see http://daringfireball.net/projects/markdown/basics). From there, there are utilities to convert markdown into a pdf (https://www.npmjs.com/package/markdown-pdf).
一种方法是使用降价。您可以使用df.to_html()
. 这会将数据框转换为 html 表。从那里您可以将生成的 html 放入降价文件 (.md)(请参阅http://daringfireball.net/projects/markdown/basics)。从那里,有一些实用程序可以将 Markdown 转换为 pdf ( https://www.npmjs.com/package/markdown-pdf)。
One all-in-one tool for this method is to use Atom text editor (https://atom.io/). There you can use an extension, search "markdown to pdf", which will make the conversion for you.
这种方法的一种多合一工具是使用 Atom 文本编辑器 ( https://atom.io/)。在那里您可以使用扩展名,搜索“markdown to pdf”,这将为您进行转换。
Note: When using to_html()
recently I had to remove extra '\n' characters for some reason. I chose to use Atom -> Find -> '\n' -> Replace ""
.
注意:to_html()
最近使用时,由于某种原因,我不得不删除额外的 '\n' 字符。我选择使用Atom -> Find -> '\n' -> Replace ""
.
Overall this should do the trick!
总的来说,这应该可以解决问题!
回答by Dalibor
Here is how I do it from sqlite database using sqlite3, pandas and pdfkit
这是我如何使用 sqlite3、pandas 和pdfkit从 sqlite 数据库中执行此操作
import pandas as pd
import pdfkit as pdf
import sqlite3
con=sqlite3.connect("baza.db")
df=pd.read_sql_query("select * from dobit", con)
df.to_html('/home/linux/izvestaj.html')
nazivFajla='/home/linux/pdfPrintOut.pdf'
pdf.from_file('/home/linux/izvestaj.html', nazivFajla)
回答by mit
This is a solution with an intermediate pdf file.
这是一个带有中间pdf文件的解决方案。
The table is pretty printed with some minimal css.
该表格非常漂亮,带有一些最小的 css。
The pdf conversion is done with weasyprint. You need to pip install weasyprint
.
pdf 转换是用 weasyprint 完成的。你需要pip install weasyprint
。
# Create a pandas dataframe with demo data:
import pandas as pd
demodata_csv = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
df = pd.read_csv(demodata_csv)
# Pretty print the dataframe as an html table to a file
intermediate_html = '/tmp/intermediate.html'
to_html_pretty(df,intermediate_html,'Iris Data')
# if you do not want pretty printing, just use pandas:
# df.to_html(intermediate_html)
# Convert the html file to a pdf file using weasyprint
import weasyprint
out_pdf= '/tmp/demo.pdf'
weasyprint.HTML(intermediate_html).write_pdf(out_pdf)
# This is the table pretty printer used above:
def to_html_pretty(df, filename='/tmp/out.html', title=''):
'''
Write an entire dataframe to an HTML file
with nice formatting.
Thanks to @stackoverflowuser2010 for the
pretty printer see https://stackoverflow.com/a/47723330/362951
'''
ht = ''
if title != '':
ht += '<h2> %s </h2>\n' % title
ht += df.to_html(classes='wide', escape=False)
with open(filename, 'w') as f:
f.write(HTML_TEMPLATE1 + ht + HTML_TEMPLATE2)
HTML_TEMPLATE1 = '''
<html>
<head>
<style>
h2 {
text-align: center;
font-family: Helvetica, Arial, sans-serif;
}
table {
margin-left: auto;
margin-right: auto;
}
table, th, td {
border: 1px solid black;
border-collapse: collapse;
}
th, td {
padding: 5px;
text-align: center;
font-family: Helvetica, Arial, sans-serif;
font-size: 90%;
}
table tbody tr:hover {
background-color: #dddddd;
}
.wide {
width: 90%;
}
</style>
</head>
<body>
'''
HTML_TEMPLATE2 = '''
</body>
</html>
'''
Thanks to @stackoverflowuser2010 for the pretty printer, see stackoverflowuser2010's answer https://stackoverflow.com/a/47723330/362951
感谢 @stackoverflowuser2010 提供漂亮的打印机,请参阅 stackoverflowuser2010 的回答https://stackoverflow.com/a/47723330/362951
I did not use pdfkit, because I had some problems with it on a headless machine. But weasyprint is great.
我没有使用 pdfkit,因为我在无头机器上遇到了一些问题。但是 weasyprint 很棒。
回答by user3226167
First plot table with matplotlib
then generate pdf
第一个绘图表,matplotlib
然后生成 pdf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
df = pd.DataFrame(np.random.random((10,3)), columns = ("col 1", "col 2", "col 3"))
#https://stackoverflow.com/questions/32137396/how-do-i-plot-only-a-table-in-matplotlib
fig, ax =plt.subplots(figsize=(12,4))
ax.axis('tight')
ax.axis('off')
the_table = ax.table(cellText=df.values,colLabels=df.columns,loc='center')
#https://stackoverflow.com/questions/4042192/reduce-left-and-right-margins-in-matplotlib-plot
pp = PdfPages("foo.pdf")
pp.savefig(fig, bbox_inches='tight')
pp.close()
reference:
参考:
How do I plot only a table in Matplotlib?