如何对 Python Pandas 数据框列执行数学运算,但前提是满足特定条件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41534428/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:44:50  来源:igfitidea点击:

How do I perform a math operation on a Python Pandas dataframe column, but only if a certain condition is met?

pythonpandas

提问by ScottP

I have a Pandas dataframe that I'm working with and I simply need to divide all values in a certain column that are greater than 800 by 100. In other words, if the value in the 'credit_score' column is greater than 800, it can be assumed that the data were entered with two extra places to the left of the decimal place. For example...

我有一个正在使用的 Pandas 数据框,我只需要将某个列中大于 800 的所有值除以 100。换句话说,如果“credit_score”列中的值大于 800,则它可以假设输入的数据在小数点左侧有两个额外的位置。例如...

id    credit_score    column_b    column_c
0     750             ...         ...
1     653             ...         ...
2     741             ...         ...
3     65100           ...         ...
4     73500           ...         ...
5     565             ...         ...
6     480             ...         ...
7     78900           ...         ...
8     699             ...         ...
9     71500           ...         ...

So I basically want to divide the credit scores for row indexes 3, 4, 7, and 9 by 100, but not the others. I want the new, valid values to replace the old, invalid ones. Alternatively, a new column such as 'credit_score_fixed' would work too. I'm fairly new to Python and Pandas, so any help is much appreciated.

所以我基本上想将行索引 3、4、7 和 9 的信用评分除以 100,而不是其他。我想要新的、有效的值来替换旧的、无效的值。或者,也可以使用诸如“credit_score_fixed”之类的新列。我对 Python 和 Pandas 还很陌生,所以非常感谢任何帮助。

采纳答案by jezrael

You can use mask:

您可以使用mask

df.credit_score = df.credit_score.mask( df.credit_score > 800, df.credit_score/ 100)

Or numpy.where:

numpy.where

df.credit_score = np.where( df.credit_score > 800, df.credit_score/ 100, df.credit_score)

print (df)
   id  credit_score    col   col1
0   0           750  750.0  750.0
1   1           653  653.0  653.0
2   2           741  741.0  741.0
3   3         65100  651.0  651.0
4   4         73500  735.0  735.0
5   5           565  565.0  565.0
6   6           480  480.0  480.0
7   7         78900  789.0  789.0
8   8           699  699.0  699.0
9   9         71500  715.0  715.0

回答by MaxU

I'd use Pandas boolean indexing:

我会使用Pandas 布尔索引

In [193]: df.loc[df.credit_score > 800, 'credit_score'] /= 100

In [194]: df
Out[194]:
    credit_score
id
0          750.0
1          653.0
2          741.0
3          651.0
4          735.0
5          565.0
6          480.0
7          789.0
8          699.0
9          715.0

回答by DeepSpace

You can use Series.apply. It accepts a function and applies it on every element in the series. Note that it is not inplace and you will in need to reassign the series that it returns, either to a new column or to the same column.

您可以使用Series.apply. 它接受一个函数并将其应用于系列中的每个元素。请注意,它不是就地,您需要将它返回的系列重新分配给新列或同一列。

def fix_scores(score):
    return score / 100 if score > 800 else score
    # same as
    # if score > 800:
    #      return score / 100
    # return score

df['credit_score_fixed'] = df['credit_score'].apply(fix_scores)