将 Pandas 数据框的“Out[]”表另存为图形
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24574976/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Save the "Out[]" table of a pandas dataframe as a figure
提问by user262536
This may seem to be a useless feature but it would be very helpful for me. I would like to save the output I get inside Canopy IDE. I would not think this is specific to Canopy but for the sake of clarity that is what I use. For example, my console Out[2] is what I would want from this:
这似乎是一个无用的功能,但它对我非常有帮助。我想保存我在 Canopy IDE 中获得的输出。我不认为这是 Canopy 特有的,但为了清楚起见,我使用了它。例如,我的控制台 Out[2] 就是我想要的:


I think that the formatting is quite nice and to reproduce this each time instead of just saving the output would be a waste of time. So my question is, how can I get a handle on this figure? Ideally the implimentation would be similar to standard methods, such that it could be done like this:
我认为格式非常好,每次都重现它而不是仅仅保存输出会浪费时间。所以我的问题是,我怎样才能处理这个数字?理想情况下,实现将类似于标准方法,因此可以这样做:
from matplotlib.backends.backend_pdf import PdfPages
pp = PdfPages('Output.pdf')
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df.plot(how='table')
pp.savefig()
pp.close()
NOTE: I realize that a very similar question has been asked before ( How to save the Pandas dataframe/series data as a figure?) but it never received an answer and I think I have stated the question more clearly.
注意:我意识到之前有人问过一个非常相似的问题(如何将 Pandas 数据框/系列数据保存为图形?)但它从未收到答案,我想我已经更清楚地说明了这个问题。
采纳答案by Keith
Here is a somewhat hackish solution but it gets the job done. You wanted a .pdf but you get a bonus .png. :)
这是一个有点hackish的解决方案,但它完成了工作。你想要一个 .pdf,但你得到了一个 .png 的奖励。:)
import numpy as np
import pandas as pd
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
from PySide.QtGui import QImage
from PySide.QtGui import QPainter
from PySide.QtCore import QSize
from PySide.QtWebKit import QWebPage
arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
df =pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))
h = "<!DOCTYPE html> <html> <body> <p> " + df.to_html() + " </p> </body> </html>";
page = QWebPage()
page.setViewportSize(QSize(5000,5000))
frame = page.mainFrame()
frame.setHtml(h, "text/html")
img = QImage(1000,700, QImage.Format(5))
painter = QPainter(img)
frame.render(painter)
painter.end()
a = img.save("html.png")
pp = PdfPages('html.pdf')
fig = plt.figure(figsize=(8,6),dpi=1080)
ax = fig.add_subplot(1, 1, 1)
img2 = plt.imread("html.png")
plt.axis('off')
ax.imshow(img2)
pp.savefig()
pp.close()
Edits welcome.
欢迎编辑。
回答by Laurence Billingham
It is, I believe, an HTML table that your IDE is rendering. This is what ipython notebook does.
我相信,它是您的 IDE 正在呈现的 HTML 表格。这就是 ipython notebook 所做的。
You can get a handle to it thusly:
您可以通过以下方式处理它:
from IPython.display import HTML
import pandas as pd
data = pd.DataFrame({'spam':['ham','green','five',0,'kitties'],
'eggs':[0,1,2,3,4]})
h = HTML(data.to_html())
h
and save to an HTML file:
并保存到 HTML 文件:
my_file = open('some_file.html', 'w')
my_file.write(h.data)
my_file.close()
回答by J Richard Snape
I think what is needed here is a consistent way of outputting a table to a pdf file amongst graphs output to pdf.
我认为这里需要的是在输出到 pdf 的图形中将表格输出到 pdf 文件的一致方式。
My first thought is not to use the matplotlib backend i.e.
我的第一个想法是不要使用 matplotlib 后端,即
from matplotlib.backends.backend_pdf import PdfPages
because it seemed somewhat limited in formatting options and leaned towards formatting the table as an image (thus rendering the text of the table in a non-selectable format)
因为它在格式选项方面似乎有些限制,并且倾向于将表格格式化为图像(从而以不可选择的格式呈现表格的文本)
If you want to mix dataframe output and matplotlib plots in a pdf without using the matplotlib pdf backend, I can think of two ways.
如果您想在不使用 matplotlib pdf 后端的情况下在 pdf 中混合数据帧输出和 matplotlib 图,我可以想到两种方法。
- Generate your pdf of matplotlib figures as before and then insert pages containing the dataframe table afterwards. I view this as a difficult option.
- Use a different library to generate the pdf. I illustrate one option to do this below.
- 像以前一样生成 matplotlib 图形的 pdf,然后插入包含数据帧表的页面。我认为这是一个困难的选择。
- 使用不同的库来生成 pdf。我在下面说明了执行此操作的一种选择。
First, install xhtml2pdflibrary. This seems a little patchily supported, but is active on Githuband has some basic usage documentation here. You can install it via pipi.e. pip install xhtml2pdf
首先,安装xhtml2pdf库。这似乎有点不完整,但在 Github 上很活跃,并且有一些基本的使用文档here。您可以通过pipie安装它pip install xhtml2pdf
Once you've done that, here is a barebones example embedding a matplotlib figure, then the table (all text selectable), then another figure. You can play around with CSS etc to alter the formatting to your exact specifications, but I think this fulfils the brief:
完成后,这里是一个嵌入 matplotlib 图形的准系统示例,然后是表格(所有文本可选),然后是另一个图形。您可以使用 CSS 等来将格式更改为您的确切规格,但我认为这满足了简要说明:
from xhtml2pdf import pisa # this is the module that will do the work
import numpy as np
import pandas as pd
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
# Utility function
def convertHtmlToPdf(sourceHtml, outputFilename):
# open output file for writing (truncated binary)
resultFile = open(outputFilename, "w+b")
# convert HTML to PDF
pisaStatus = pisa.CreatePDF(
sourceHtml, # the HTML to convert
dest=resultFile, # file handle to recieve result
path='.') # this path is needed so relative paths for
# temporary image sources work
# close output file
resultFile.close() # close output file
# return True on success and False on errors
return pisaStatus.err
# Main program
if __name__=='__main__':
arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
df = pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))
# Define your data
sourceHtml = '<html><head>'
# add some table CSS in head
sourceHtml += '''<style>
table, td, th {
border-style: double;
border-width: 3px;
}
td,th {
padding: 5px;
}
</style>'''
sourceHtml += '</head><body>'
#Add a matplotlib figure(s)
plt.plot(range(20))
plt.savefig('tmp1.jpg')
sourceHtml += '\n<p><img src="tmp1.jpg"></p>'
# Add the dataframe
sourceHtml += '\n<p>' + df.to_html() + '</p>'
#Add another matplotlib figure(s)
plt.plot(range(70,100))
plt.savefig('tmp2.jpg')
sourceHtml += '\n<p><img src="tmp2.jpg"></p>'
sourceHtml += '</body></html>'
outputFilename = 'test.pdf'
convertHtmlToPdf(sourceHtml, outputFilename)
NoteThere seems to be a bug in xhtml2pdf at the time of writing which means that some CSS is not respected. Particularly pertinent to this question is that it seems impossible to get double borders around the table
注意在撰写本文时,xhtml2pdf 中似乎存在一个错误,这意味着某些 CSS 未得到遵守。与这个问题特别相关的是,在桌子周围设置双边框似乎是不可能的
EDIT
编辑
In response comments, it became obvious that some users (well, at least @Keith who both answered and awarded a bounty!) want the table selectable, but definitely on a matplotlib axis. This is somewhat more in keeping with the original method. Hence - here is a method using the pdfbackend for matplotlib and matplotlib objects only. I do not think the table looks as good - in particular the display of hierarchical column headers, but that's a matter of choice, I guess. I'm indebted to this answerand comments for the way to format axes for table display.
在回应评论中,很明显一些用户(好吧,至少@Keith 既回答又获得赏金!)希望表格可选择,但绝对在 matplotlib 轴上。这在某种程度上更符合原始方法。因此 - 这是一种pdf仅将后端用于 matplotlib 和 matplotlib 对象的方法。我不认为表格看起来那么好 - 特别是分层列标题的显示,但我想这是一个选择问题。我很感激这个答案,并评论了表格显示格式轴的方式。
import numpy as np
import pandas as pd
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
# Main program
if __name__=='__main__':
pp = PdfPages('Output.pdf')
arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
df =pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))
plt.plot(range(20))
pp.savefig()
plt.close()
# Calculate some sizes for formatting - constants are arbitrary - play around
nrows, ncols = len(df)+1, len(df.columns) + 10
hcell, wcell = 0.3, 1.
hpad, wpad = 0, 0
#put the table on a correctly sized figure
fig=plt.figure(figsize=(ncols*wcell+wpad, nrows*hcell+hpad))
plt.gca().axis('off')
matplotlib_tab = pd.tools.plotting.table(plt.gca(),df, loc='center')
pp.savefig()
plt.close()
#Add another matplotlib figure(s)
plt.plot(range(70,100))
pp.savefig()
plt.close()
pp.close()

