如何对 Python Pandas 数据框列执行数学运算,但前提是满足特定条件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41534428/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I perform a math operation on a Python Pandas dataframe column, but only if a certain condition is met?
提问by ScottP
I have a Pandas dataframe that I'm working with and I simply need to divide all values in a certain column that are greater than 800 by 100. In other words, if the value in the 'credit_score' column is greater than 800, it can be assumed that the data were entered with two extra places to the left of the decimal place. For example...
我有一个正在使用的 Pandas 数据框,我只需要将某个列中大于 800 的所有值除以 100。换句话说,如果“credit_score”列中的值大于 800,则它可以假设输入的数据在小数点左侧有两个额外的位置。例如...
id credit_score column_b column_c
0 750 ... ...
1 653 ... ...
2 741 ... ...
3 65100 ... ...
4 73500 ... ...
5 565 ... ...
6 480 ... ...
7 78900 ... ...
8 699 ... ...
9 71500 ... ...
So I basically want to divide the credit scores for row indexes 3, 4, 7, and 9 by 100, but not the others. I want the new, valid values to replace the old, invalid ones. Alternatively, a new column such as 'credit_score_fixed' would work too. I'm fairly new to Python and Pandas, so any help is much appreciated.
所以我基本上想将行索引 3、4、7 和 9 的信用评分除以 100,而不是其他。我想要新的、有效的值来替换旧的、无效的值。或者,也可以使用诸如“credit_score_fixed”之类的新列。我对 Python 和 Pandas 还很陌生,所以非常感谢任何帮助。
采纳答案by jezrael
You can use mask
:
您可以使用mask
:
df.credit_score = df.credit_score.mask( df.credit_score > 800, df.credit_score/ 100)
Or numpy.where
:
df.credit_score = np.where( df.credit_score > 800, df.credit_score/ 100, df.credit_score)
print (df)
id credit_score col col1
0 0 750 750.0 750.0
1 1 653 653.0 653.0
2 2 741 741.0 741.0
3 3 65100 651.0 651.0
4 4 73500 735.0 735.0
5 5 565 565.0 565.0
6 6 480 480.0 480.0
7 7 78900 789.0 789.0
8 8 699 699.0 699.0
9 9 71500 715.0 715.0
回答by MaxU
I'd use Pandas boolean indexing:
我会使用Pandas 布尔索引:
In [193]: df.loc[df.credit_score > 800, 'credit_score'] /= 100
In [194]: df
Out[194]:
credit_score
id
0 750.0
1 653.0
2 741.0
3 651.0
4 735.0
5 565.0
6 480.0
7 789.0
8 699.0
9 715.0
回答by DeepSpace
You can use Series.apply
. It accepts a function and applies it on every element in the series. Note that it is not inplace and you will in need to reassign the series that it returns, either to a new column or to the same column.
您可以使用Series.apply
. 它接受一个函数并将其应用于系列中的每个元素。请注意,它不是就地,您需要将它返回的系列重新分配给新列或同一列。
def fix_scores(score):
return score / 100 if score > 800 else score
# same as
# if score > 800:
# return score / 100
# return score
df['credit_score_fixed'] = df['credit_score'].apply(fix_scores)