Python 如何将 Pandas DataFrame 表保存为 png

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35634238/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:43:23  来源:igfitidea点击:

How to save a pandas DataFrame table as a png

pythonpandas

提问by Shatnerz

I constructed a pandas dataframe of results. This data frame acts as a table. There are MultiIndexed columns and each row represents a name, ie index=['name1','name2',...]when creating the DataFrame. I would like to display this table and save it as a png (or any graphic format really). At the moment, the closest I can get is converting it to html, but I would like a png. It looks like similar questions have been asked such as How to save the Pandas dataframe/series data as a figure?

我构建了一个结果的熊猫数据框。此数据框充当表格。有 MultiIndexed 列,每行代表一个名称,即index=['name1','name2',...]在创建 DataFrame 时。我想显示此表并将其另存为 png(或任何图形格式)。目前,我能得到的最接近的是将其转换为 html,但我想要一个 png。似乎有人问过类似的问题,例如如何将 Pandas 数据框/系列数据保存为图形?

However, the marked solution converts the dataframe into a line plot (not a table) and the other solution relies on PySide which I would like to stay away simply because I cannot pip install it on linux. I would like this code to be easily portable. I really was expecting table creation to png to be easy with python. All help is appreciated.

但是,标记的解决方案将数据框转换为线图(而不是表格),而另一个解决方案依赖于 PySide,我想远离它,因为我无法在 linux 上安装它。我希望这段代码易于移植。我真的希望使用 python 可以轻松创建 png 表。感谢所有帮助。

采纳答案by bunji

Pandas allows you to plot tables using matplotlib (details here). Usually this plots the table directly onto a plot (with axes and everything) which is not what you want. However, these can be removed first:

Pandas 允许您使用 matplotlib 绘制表格(详情请点击此处)。通常这会将表格直接绘制到一个绘图(带有轴和所有内容)上,这不是您想要的。但是,可以先删除这些:

import matplotlib.pyplot as plt
import pandas as pd
from pandas.table.plotting import table # EDIT: see deprecation warnings below

ax = plt.subplot(111, frame_on=False) # no visible frame
ax.xaxis.set_visible(False)  # hide the x axis
ax.yaxis.set_visible(False)  # hide the y axis

table(ax, df)  # where df is your data frame

plt.savefig('mytable.png')

The output might not be the prettiest but you can find additional arguments for the table() function here. Also thanks to this postfor info on how to remove axes in matplotlib.

输出可能不是最漂亮的,但您可以在此处找到 table() 函数的其他参数。还要感谢这篇文章提供了有关如何在 matplotlib 中删除轴的信息。



EDIT:

编辑:

Here is a (admittedly quite hacky) way of simulating multi-indexes when plotting using the method above. If you have a multi-index data frame called df that looks like:

这是使用上述方法进行绘图时模拟多索引的(不可否认的)方法。如果您有一个名为 df 的多索引数据框,如下所示:

first  second
bar    one       1.991802
       two       0.403415
baz    one      -1.024986
       two      -0.522366
foo    one       0.350297
       two      -0.444106
qux    one      -0.472536
       two       0.999393
dtype: float64

First reset the indexes so they become normal columns

首先重置索引,使它们成为普通列

df = df.reset_index() 
df
    first second       0
0   bar    one  1.991802
1   bar    two  0.403415
2   baz    one -1.024986
3   baz    two -0.522366
4   foo    one  0.350297
5   foo    two -0.444106
6   qux    one -0.472536
7   qux    two  0.999393

Remove all duplicates from the higher order multi-index columns by setting them to an empty string (in my example I only have duplicate indexes in "first"):

通过将它们设置为空字符串来删除高阶多索引列中的所有重复项(在我的示例中,我只有在“first”中有重复索引):

df.ix[df.duplicated('first') , 'first'] = '' # see deprecation warnings below
df
  first second         0
0   bar    one  1.991802
1          two  0.403415
2   baz    one -1.024986
3          two -0.522366
4   foo    one  0.350297
5          two -0.444106
6   qux    one -0.472536
7          two  0.999393

Change the column names over your "indexes" to the empty string

将“索引”上的列名称更改为空字符串

new_cols = df.columns.values
new_cols[:2] = '',''  # since my index columns are the two left-most on the table
df.columns = new_cols 

Now call the table function but set all the row labels in the table to the empty string (this makes sure the actual indexes of your plot are not displayed):

现在调用 table 函数,但将表中的所有行标签设置为空字符串(这确保不显示绘图的实际索引):

table(ax, df, rowLabels=['']*df.shape[0], loc='center')

et voila:

等等:

enter image description here

在此处输入图片说明

Your not-so-pretty but totally functional multi-indexed table.

您不那么漂亮但功能齐全的多索引表。

EDIT: DEPRECATION WARNINGS

编辑:弃用警告

As pointed out in the comments, the import statement for table:

正如评论中所指出的,导入语句用于table

from pandas.tools.plotting import table

is now deprecated in newer versions of pandas in favour of:

现在在较新版本的熊猫中已弃用,以支持:

from pandas.plotting import table 

EDIT: DEPRECATION WARNINGS 2

编辑:弃用警告 2

The ixindexer has now been fully deprecatedso we should use the locindexer instead. Replace:

ix索引现在已经完全过时,所以我们应该使用loc索引来代替。代替:

df.ix[df.duplicated('first') , 'first'] = ''

with

df.loc[df.duplicated('first') , 'first'] = ''

回答by jcdoming

Although I am not sure if this is the result you expect, you can save your DataFrame in png by plotting the DataFrame with Seaborn Heatmap with annotations on, like this:

虽然我不确定这是否是您期望的结果,但您可以通过使用带有注释的 Seaborn Heatmap 绘制 DataFrame 以 png 格式保存您的 DataFrame,如下所示:

http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.heatmap.html#seaborn.heatmap

http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.heatmap.html#seaborn.heatmap

Example of Seaborn heatmap with annotations on

带有注释的 Seaborn 热图示例

It works right away with a Pandas Dataframe. You can look at this example: Efficiently ploting a table in csv format using Python

它可以立即与 Pandas Dataframe 配合使用。你可以看看这个例子:Efficiently plotting a table in csv format using Python

You might want to change the colormap so it displays a white background only.

您可能想要更改颜色图,使其仅显示白色背景。

Hope this helps.

希望这可以帮助。

回答by Colin Dickie

The following would need extensive customisation to format the table correctly, but the bones of it works:

以下内容需要大量自定义才能正确设置表格格式,但它的主要内容是:

import numpy as np
from PIL import Image, ImageDraw, ImageFont
import pandas as pd

df = pd.DataFrame({ 'A' : 1.,
                     'B' : pd.Series(1,index=list(range(4)),dtype='float32'),
                     'C' : np.array([3] * 4,dtype='int32'),
                     'D' : pd.Categorical(["test","train","test","train"]),
                     'E' : 'foo' })


class DrawTable():
    def __init__(self,_df):
        self.rows,self.cols = _df.shape
        img_size = (300,200)
        self.border = 50
        self.bg_col = (255,255,255)
        self.div_w = 1
        self.div_col = (128,128,128)
        self.head_w = 2
        self.head_col = (0,0,0)
        self.image = Image.new("RGBA", img_size,self.bg_col)
        self.draw = ImageDraw.Draw(self.image)
        self.draw_grid()
        self.populate(_df)
        self.image.show()
    def draw_grid(self):
        width,height = self.image.size
        row_step = (height-self.border*2)/(self.rows)
        col_step = (width-self.border*2)/(self.cols)
        for row in range(1,self.rows+1):
            self.draw.line((self.border-row_step//2,self.border+row_step*row,width-self.border,self.border+row_step*row),fill=self.div_col,width=self.div_w)
            for col in range(1,self.cols+1):
                self.draw.line((self.border+col_step*col,self.border-col_step//2,self.border+col_step*col,height-self.border),fill=self.div_col,width=self.div_w)
        self.draw.line((self.border-row_step//2,self.border,width-self.border,self.border),fill=self.head_col,width=self.head_w)
        self.draw.line((self.border,self.border-col_step//2,self.border,height-self.border),fill=self.head_col,width=self.head_w)
        self.row_step = row_step
        self.col_step = col_step
    def populate(self,_df2):
        font = ImageFont.load_default().font
        for row in range(self.rows):
            print(_df2.iloc[row,0])
            self.draw.text((self.border-self.row_step//2,self.border+self.row_step*row),str(_df2.index[row]),font=font,fill=(0,0,128))
            for col in range(self.cols):
                text = str(_df2.iloc[row,col])
                text_w, text_h = font.getsize(text)
                x_pos = self.border+self.col_step*(col+1)-text_w
                y_pos = self.border+self.row_step*row
                self.draw.text((x_pos,y_pos),text,font=font,fill=(0,0,128))
        for col in range(self.cols):
            text = str(_df2.columns[col])
            text_w, text_h = font.getsize(text)
            x_pos = self.border+self.col_step*(col+1)-text_w
            y_pos = self.border - self.row_step//2
            self.draw.text((x_pos,y_pos),text,font=font,fill=(0,0,128))
    def save(self,filename):
        try:
            self.image.save(filename,mode='RGBA')
            print(filename," Saved.")
        except:
            print("Error saving:",filename)




table1 = DrawTable(df)
table1.save('C:/Users/user/Pictures/table1.png')

The output looks like this:

输出如下所示:

enter image description here

在此处输入图片说明

回答by jrovegno

The solution of @bunji works for me, but default options don't always give a good result. I added some useful parameter to tweak the appearance of the table.

@bunji 的解决方案对我有用,但默认选项并不总是能给出好的结果。我添加了一些有用的参数来调整表格的外观。

import pandas as pd
import matplotlib.pyplot as plt
from pandas.tools.plotting import table
import numpy as np

dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))

df.index = [item.strftime('%Y-%m-%d') for item in df.index] # Format date

fig, ax = plt.subplots(figsize=(12, 2)) # set size frame
ax.xaxis.set_visible(False)  # hide the x axis
ax.yaxis.set_visible(False)  # hide the y axis
ax.set_frame_on(False)  # no visible frame, uncomment if size is ok
tabla = table(ax, df, loc='upper right', colWidths=[0.17]*len(df.columns))  # where df is your data frame
tabla.auto_set_font_size(False) # Activate set fontsize manually
tabla.set_fontsize(12) # if ++fontsize is necessary ++colWidths
tabla.scale(1.2, 1.2) # change size table
plt.savefig('table.png', transparent=True)

The result: Table

结果: 桌子

回答by norok2

The best solution to your problem is probably to first export your dataframe to HTML and then convert it using an HTML-to-image tool. The final appearance could be tweaked via CSS.

您问题的最佳解决方案可能是首先将您的数据帧导出为 HTML,然后使用 HTML 到图像工具将其转换。最终的外观可以通过 CSS 进行调整。

Popular options for HTML-to-image rendering include:

HTML 到图像渲染的流行选项包括:



Let us assume we have a dataframe named df. We can generate one with the following code:

让我们假设我们有一个名为df. 我们可以使用以下代码生成一个:

import string
import numpy as np
import pandas as pd


np.random.seed(0)  # just to get reproducible results from `np.random`
rows, cols = 5, 10
labels = list(string.ascii_uppercase[:cols])
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 10)), columns=labels)
print(df)
#     A   B   C   D   E   F   G   H   I   J
# 0  44  47  64  67  67   9  83  21  36  87
# 1  70  88  88  12  58  65  39  87  46  88
# 2  81  37  25  77  72   9  20  80  69  79
# 3  47  64  82  99  88  49  29  19  19  14
# 4  39  32  65   9  57  32  31  74  23  35


Using WeasyPrint

使用 Wea​​syPrint

This approach uses a pip-installable package, which will allow you to do everything using the Python ecosystem. One shortcoming of weasyprintis that it does not seem to provide a way of adapting the image size to its content. Anyway, removing some background from an image is relatively easy in Python / PIL, and it is implemented in the trim()function below (adapted from here). One also would need to make sure that the image will be large enough, and this can be done with CSS's @page sizeproperty.

这种方法使用一个pip-installable 包,它允许你使用 Python 生态系统做所有事情。一个缺点weasyprint是它似乎没有提供一种使图像大小适应其内容的方法。无论如何,从图像中去除一些背景在 Python/PIL 中相对容易,它在trim()下面的函数中实现(改编自这里)。还需要确保图像足够大,这可以通过 CSS 的@page size属性来完成。

The code follows:

代码如下:

import weasyprint as wsp
import PIL as pil


def trim(source_filepath, target_filepath=None, background=None):
    if not target_filepath:
        target_filepath = source_filepath
    img = pil.Image.open(source_filepath)
    if background is None:
        background = img.getpixel((0, 0))
    border = pil.Image.new(img.mode, img.size, background)
    diff = pil.ImageChops.difference(img, border)
    bbox = diff.getbbox()
    img = img.crop(bbox) if bbox else img
    img.save(target_filepath)


img_filepath = 'table1.png'
css = wsp.CSS(string='''
@page { size: 2048px 2048px; padding: 0px; margin: 0px; }
table, td, tr, th { border: 1px solid black; }
td, th { padding: 4px 8px; }
''')
html = wsp.HTML(string=df.to_html())
html.write_png(img_filepath, stylesheets=[css])
trim(img_filepath)

table_weasyprint

table_weasyprint



Using wkhtmltopdf/wkhtmltoimage

使用wkhtmltopdf/wkhtmltoimage

This approach uses an external open source tool and this needs to be installed prior to the generation of the image. There is also a Python package, pdfkit, that serves as a front-end to it (it does not waive you from installing the core software yourself), but I will not use it.

这种方法使用外部开源工具,需要在生成图像之前安装。还有一个 Python 包,pdfkit用作它的前端(它并不免除您自己安装核心软件),但我不会使用它。

wkhtmltoimagecan be simply called using subprocess(or any other similar means of running an external program in Python). One would also need to output to disk the HTML file.

wkhtmltoimage可以简单地调用 using subprocess(或任何其他类似的在 Python 中运行外部程序的方法)。还需要将 HTML 文件输出到磁盘。

The code follows:

代码如下:

import subprocess


df.to_html('table2.html')
subprocess.call(
    'wkhtmltoimage -f png --width 0 table2.html table2.png', shell=True)

table_wkhtmltoimage

table_wkhtmltoimage

and its aspect could be further tweaked with CSS similarly to the other approach.

和其他方法类似,它的方面可以用 CSS 进一步调整。



回答by pythonomicon

If you're okay with the formatting as it appears when you call the DataFrame in your coding environment, then the absolute easiest way is to just use print screen and crop the image using basic image editing software.

如果您对在编码环境中调用 DataFrame 时出现的格式感到满意,那么绝对最简单的方法是仅使用打印屏幕并使用基本图像编辑软件裁剪图像。

Here's how it turned out for me using Jupyter Notebook, and Pinta Image Editor (Ubuntu freeware).

是我使用 Jupyter Notebook 和 Pinta Image Editor(Ubuntu 免费软件)的结果。

回答by Alon Lavian

As jcdomingsuggested, use Seaborn heatmap():

正如jcdoming建议的那样,使用 Seaborn heatmap()

import seaborn as sns
import matplotlib.pyplot as plt

fig = plt.figure(facecolor='w', edgecolor='k')
sns.heatmap(df.head(), annot=True, cmap='viridis', cbar=False)
plt.savefig('DataFrame.png')

DataFrame as a heat map

DataFrame 作为热图

回答by Carlo Carandang

The easiest and fastest way to convert a Pandas dataframe into a png image using Anaconda Spyder IDE- just double-click on the dataframe in variable explorer, and the IDE table will appear, nicely packaged with automatic formatting and color scheme. Just use a snipping tool to capture the table for use in your reports, saved as a png:

使用 Anaconda Spyder IDE 将 Pandas 数据框转换为 png 图像的最简单、最快捷的方法 - 只需在变量资源管理器中双击数据框,IDE 表就会出现,很好地打包了自动格式化和配色方案。只需使用截图工具捕获表格以用于您的报告,并保存为 png:

2020 Blue Chip Ratio

2020年蓝筹比率

This saves me lots of time, and is still elegant and professional.

这为我节省了大量时间,并且仍然优雅和专业。