如何使用 Python Pandas Stylers 根据给定列为整行着色?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43596579/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:28:02  来源:igfitidea点击:

How to use Python Pandas Stylers for coloring an entire row based on a given column?

pythonpandas

提问by Dread Pirate Roberts

I've been trying to print out a Pandas dataframe to html and have specific entire rows highlighted if the value of one specific column's value for that row is over a threshold. I've looked through the Pandas Styler Slicing and tried to vary the highlight_max function for such a use, but seem to be failing miserably; if I try, say, to replace the is_max with a check for whether a given row's value is above said threshold (e.g., something like

我一直在尝试将 Pandas 数据框打印到 html,如果该行的特定列的值超过阈值,则突出显示特定的整行。我已经浏览了 Pandas Styler Slicing 并尝试为这种用途改变 highlight_max 函数,但似乎失败了;例如,如果我尝试用检查给定行的值是否高于所述阈值来替换 is_max(例如,类似于

is_x = df['column_name'] >= threshold

), it isn't apparent how to properly pass such a thing or what to return.

),不清楚如何正确传递这样的东西或返回什么。

I've also tried to simply define it elsewhere using df.loc, but that hasn't worked too well either.

我还尝试使用 df.loc 在别处简单地定义它,但这也不太好。

Another concern also came up: If I drop that column (currently the criterion) afterwards, will the styling still hold? I am wondering if a df.loc would prevent such a thing from being a problem.

另一个问题也出现了:如果我之后删除该列(目前是标准),样式还会保留吗?我想知道 df.loc 是否会阻止这样的事情成为问题。

回答by Scott Boston

This solution allows for you to pass a column label or a list of column labels to highlight the entire row if that value in the column(s) exceeds the threshold.

如果列中的值超过阈值,此解决方案允许您传递列标签或列标签列表以突出显示整行。

import pandas as pd
import numpy as np

np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})

df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
               axis=1)
df.iloc[0, 2] = np.nan

def highlight_greaterthan(s, threshold, column):
    is_max = pd.Series(data=False, index=s.index)
    is_max[column] = s.loc[column] >= threshold
    return ['background-color: yellow' if is_max.any() else '' for v in is_max]


df.style.apply(highlight_greaterthan, threshold=1.0, column=['C', 'B'], axis=1)

Output:

输出:

enter image description here

在此处输入图片说明

Or for one column

或为一列

df.style.apply(highlight_greaterthan, threshold=1.0, column='E', axis=1)

enter image description here

在此处输入图片说明

回答by Steven

Here is a simpler approach:

这是一个更简单的方法:

  1. Assume you have a 100 x 10 dataframe, df. Also assume you want to highlight all the rows corresponding to a column, say "duration", greater than 5.

  2. You first need to define a function that highlights the cells. The real trick is that you need to return a row, not a single cell. For example,

    def highlight(s):
        if s.duration > 5:
            return ['background-color: yellow']*10
        else:
            return ['background-color: white']*10
    
  1. 假设您有一个 100 x 10 的数据帧 df。还假设您要突出显示与列对应的所有行,例如“持续时间”,大于 5。

  2. 您首先需要定义一个突出显示单元格的函数。真正的技巧是您需要返回一行,而不是单个单元格。例如,

    def highlight(s):
        if s.duration > 5:
            return ['background-color: yellow']*10
        else:
            return ['background-color: white']*10
    

**Note that the return part should be a list of 10 (corresponding to the number of columns). This is the key part.

**注意返回部分应该是10个的列表(对应列数)。这是关键部分。

  1. Now you can apply this to the dataframe style as:

    df.style.apply(highlight, axis=1)
    
  1. 现在您可以将其应用于数据框样式,如下所示:

    df.style.apply(highlight, axis=1)