如何对 Python Pandas 数据框列执行数学运算，但前提是满足特定条件？

Question

提问by ScottP

I have a Pandas dataframe that I'm working with and I simply need to divide all values in a certain column that are greater than 800 by 100. In other words, if the value in the 'credit_score' column is greater than 800, it can be assumed that the data were entered with two extra places to the left of the decimal place. For example...

我有一个正在使用的 Pandas 数据框，我只需要将某个列中大于 800 的所有值除以 100。换句话说，如果“credit_score”列中的值大于 800，则它可以假设输入的数据在小数点左侧有两个额外的位置。例如...

id    credit_score    column_b    column_c
0     750             ...         ...
1     653             ...         ...
2     741             ...         ...
3     65100           ...         ...
4     73500           ...         ...
5     565             ...         ...
6     480             ...         ...
7     78900           ...         ...
8     699             ...         ...
9     71500           ...         ...

So I basically want to divide the credit scores for row indexes 3, 4, 7, and 9 by 100, but not the others. I want the new, valid values to replace the old, invalid ones. Alternatively, a new column such as 'credit_score_fixed' would work too. I'm fairly new to Python and Pandas, so any help is much appreciated.

所以我基本上想将行索引 3、4、7 和 9 的信用评分除以 100，而不是其他。我想要新的、有效的值来替换旧的、无效的值。或者，也可以使用诸如“credit_score_fixed”之类的新列。我对 Python 和 Pandas 还很陌生，所以非常感谢任何帮助。

Answer 1

采纳答案by jezrael

You can use mask:

您可以使用mask：

df.credit_score = df.credit_score.mask( df.credit_score > 800, df.credit_score/ 100)

Or numpy.where:

或numpy.where：

df.credit_score = np.where( df.credit_score > 800, df.credit_score/ 100, df.credit_score)

print (df)
   id  credit_score    col   col1
0   0           750  750.0  750.0
1   1           653  653.0  653.0
2   2           741  741.0  741.0
3   3         65100  651.0  651.0
4   4         73500  735.0  735.0
5   5           565  565.0  565.0
6   6           480  480.0  480.0
7   7         78900  789.0  789.0
8   8           699  699.0  699.0
9   9         71500  715.0  715.0

Answer 2

回答by MaxU

I'd use Pandas boolean indexing:

我会使用Pandas 布尔索引：

In [193]: df.loc[df.credit_score > 800, 'credit_score'] /= 100

In [194]: df
Out[194]:
    credit_score
id
0          750.0
1          653.0
2          741.0
3          651.0
4          735.0
5          565.0
6          480.0
7          789.0
8          699.0
9          715.0

Answer 3

回答by DeepSpace

You can use Series.apply. It accepts a function and applies it on every element in the series. Note that it is not inplace and you will in need to reassign the series that it returns, either to a new column or to the same column.

您可以使用Series.apply. 它接受一个函数并将其应用于系列中的每个元素。请注意，它不是就地，您需要将它返回的系列重新分配给新列或同一列。

def fix_scores(score):
    return score / 100 if score > 800 else score
    # same as
    # if score > 800:
    #      return score / 100
    # return score

df['credit_score_fixed'] = df['credit_score'].apply(fix_scores)

如何对 Python Pandas 数据框列执行数学运算，但前提是满足特定条件？

提问by ScottP

采纳答案by jezrael

回答by MaxU

回答by DeepSpace

相关推荐

最近更新

标签

如何对 Python Pandas 数据框列执行数学运算，但前提是满足特定条件？

提问by ScottP

采纳答案by jezrael

回答by MaxU

回答by DeepSpace

相关推荐

Python Pandas 将一系列字符串连接成一个字符串

Python Pandas：错误：丢失），位置 2 处的未终止子模式

Python（Pandas）错误“标签[阿尔及利亚]不在[索引]中”

pandas DataFrameGroupBy 对象的计算模式时出错

相关推荐

最近更新

标签