如何格式化 Pandas 数据框的 IPython html 显示?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18876022/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:09:11  来源:igfitidea点击:

How to format IPython html display of Pandas dataframe?

pythonhtmlpandasipython

提问by behzad.nouri

How can I format IPython html display of pandas dataframes so that

如何格式化 Pandas 数据帧的 IPython html 显示,以便

  1. numbers are right justified
  2. numbers have commas as thousands separator
  3. large floats have no decimal places
  1. 数字是正确的
  2. 数字以逗号作为千位分隔符
  3. 大浮点数没有小数位

I understand that numpyhas the facility of set_printoptionswhere I can do:

我知道numpyset_printoptions我可以做的设施:

int_frmt:lambda x : '{:,}'.format(x)
np.set_printoptions(formatter={'int_kind':int_frmt})

and similarly for other data types.

其他数据类型也类似。

But IPython does not pick up these formatting options when displaying dataframes in html. I still need to have

但是 IPython 在 html 中显示数据帧时不会选择这些格式选项。我还需要

pd.set_option('display.notebook_repr_html', True)

but with 1, 2, 3 as in above.

但是上面有 1, 2, 3 。

Edit:Below is my solution for 2 & 3 ( not sure this is the best way ), but I still need to figure out how to make number columns right justified.

编辑:下面是我对 2 和 3 的解决方案(不确定这是最好的方法),但我仍然需要弄清楚如何使数字列正确对齐。

from IPython.display import HTML
int_frmt = lambda x: '{:,}'.format(x)
float_frmt = lambda x: '{:,.0f}'.format(x) if x > 1e3 else '{:,.2f}'.format(x)
frmt_map = {np.dtype('int64'):int_frmt, np.dtype('float64'):float_frmt}
frmt = {col:frmt_map[df.dtypes[col]] for col in df.columns if df.dtypes[col] in frmt_map.keys()}
HTML(df.to_html(formatters=frmt))

采纳答案by Viktor Kerkez

HTML receives a custom string of html data. Nobody forbids you to pass in a style tag with the custom CSS style for the .dataframeclass (which the to_htmlmethod adds to the table).

HTML 接收自定义的 html 数据字符串。没有人禁止您为.dataframe类(该to_html方法添加到表中)传递带有自定义 CSS 样式的样式标记。

So the simplest solution would be to just add a style and concatenate it with the output of the df.to_html:

因此,最简单的解决方案是添加一个样式并将其与以下输出连接df.to_html

style = '<style>.dataframe td { text-align: right; }</style>'
HTML( style + df.to_html( formatters=frmt ) )

But I would suggest to define a custom class for a DataFrame since this will change the style of all the tables in your notebook (style is "global").

但我建议为 DataFrame 定义一个自定义类,因为这会改变笔记本中所有表格的样式(样式为“全局”)。

style = '<style>.right_aligned_df td { text-align: right; }</style>'
HTML(style + df.to_html(formatters=frmt, classes='right_aligned_df'))

You can also define the style in one of the previous cells, and then just set the classesparameter of the to_htmlmethod:

您也可以在前面的单元格之一中定义样式,然后只需设置classesto_html方法的参数:

# Some cell at the begining of the notebook
In [2]: HTML('''<style>
                    .right_aligned_df td { text-align: right; }
                    .left_aligned_df td { text-align: right; }
                    .pink_df { background-color: pink; }
                </style>''')

...

# Much later in your notebook
In [66]: HTML(df.to_html(classes='pink_df'))

回答by kynan

On the OP's point 2:

关于OP的第2点:

numbers have commas as thousands separator

数字以逗号作为千位分隔符

pandas (as of 0.20.1) does not allow overriding the default integer format in an easy way. It is hard coded in pandas.io.formats.format.IntArrayFormatter(the labmdafunction):

pandas(从 0.20.1 开始)不允许以简单的方式覆盖默认的整数格式。它被硬编码在pandas.io.formats.format.IntArrayFormatterlabmda函数)中:

class IntArrayFormatter(GenericArrayFormatter):

    def _format_strings(self):
        formatter = self.formatter or (lambda x: '% d' % x)
        fmt_values = [formatter(x) for x in self.values]
        return fmt_values

I'm assuming is what you're actually asking for is how you can override the format for all integers: replace ("monkey patch") the IntArrayFormatterto print integer values with thousands separated by comma as follows:

我假设您实际要求的是如何覆盖所有整数的格式:replace ("monkey patch")IntArrayFormatter以用逗号分隔的千位打印整数值,如下所示:

import pandas

class _IntArrayFormatter(pandas.io.formats.format.GenericArrayFormatter):

    def _format_strings(self):
        formatter = self.formatter or (lambda x: ' {:,}'.format(x))
        fmt_values = [formatter(x) for x in self.values]
        return fmt_values

pandas.io.formats.format.IntArrayFormatter = _IntArrayFormatter

Note:

笔记:

  • before 0.20.0, the formatters were in pandas.formats.format.
  • before 0.18.1, the formatters were in pandas.core.format.
  • 在 0.20.0 之前,格式化程序在pandas.formats.format.
  • 在 0.18.1 之前,格式化程序在pandas.core.format.

Aside

在旁边

For floats you do not need to jump through those hoops since there is a configuration optionfor it:

对于花车,您不需要跳过这些箍,因为它有一个配置选项

display.float_format: The callable should accept a floating point number and return a string with the desired format of the number. This is used in some places like SeriesFormatter. See core.format.EngFormatterfor an example.

display.float_format: callable 应该接受一个浮点数并返回一个具有所需数字格式的字符串。这在某些地方使用,例如SeriesFormatter. 参见core.format.EngFormatter示例。

回答by Julien Marrec

This question was asked a long time ago. Back then, pandas didn't yet include pd.Styler. It was added in version 0.17.1.

这个问题很久以前就被问到了。那时,pandas 还没有包含pd.Styler。它是在版本中添加的0.17.1

Here's how you would use this to achieve your desired goal and some more:

以下是您将如何使用它来实现您想要的目标以及更多:

  • Center the header
  • right-align any number columns
  • left-align the other columns.
  • Add a formatter for the numeric columns like you want
  • make it so that each column has the same width.
  • 居中标题
  • 右对齐任意数字列
  • 左对齐其他列。
  • 为您想要的数字列添加格式化程序
  • 使每列具有相同的宽度。

Here's some example data:

以下是一些示例数据:

In [1]:
df = pd.DataFrame(np.random.rand(10,3)*2000, columns=['A','B','C'])
df['D'] = np.random.randint(0,10000,size=10)
df['TextCol'] = np.random.choice(['a','b','c'], 10)
df.dtypes

Out[1]:
A          float64
B          float64
C          float64
D            int64
TextCol     object
dtype: object

Let's format this using df.style:

让我们使用df.style以下格式对其进行格式化:

# Construct a mask of which columns are numeric
numeric_col_mask = df.dtypes.apply(lambda d: issubclass(np.dtype(d).type, np.number))

# Dict used to center the table headers
d = dict(selector="th",
    props=[('text-align', 'center')])

# Style
df.style.set_properties(subset=df.columns[numeric_col_mask], # right-align the numeric columns and set their width
                        **{'width':'10em', 'text-align':'right'})\
        .set_properties(subset=df.columns[~numeric_col_mask], # left-align the non-numeric columns and set their width
                        **{'width':'10em', 'text-align':'left'})\
        .format(lambda x: '{:,.0f}'.format(x) if x > 1e3 else '{:,.2f}'.format(x), # format the numeric values
                subset=pd.IndexSlice[:,df.columns[numeric_col_mask]])\
        .set_table_styles([d]) # center the header

Result using pd.Styler

结果使用 pd.Styler



Note that instead of calling .formaton the subset columns, you can very well set the global default pd.options.display.float_formatinstead:

请注意.format,您可以很好地设置全局默认值,而不是调用子集列pd.options.display.float_format

pd.options.display.float_format = lambda x: '{:,.0f}'.format(x) if x > 1e3 else '{:,.2f}'.format(x)