Python 熊猫适用,但仅适用于满足条件的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33769860/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:57:23  来源:igfitidea点击:

Pandas apply but only for rows where a condition is met

pythonpandas

提问by mgoldwasser

I would like to use Pandas df.applybut only for certain rows

我想使用 Pandasdf.apply但仅适用于某些行

As an example, I want to do something like this, but my actual issue is a little more complicated:

举个例子,我想做这样的事情,但我的实际问题有点复杂:

import pandas as pd
import math
z = pd.DataFrame({'a':[4.0,5.0,6.0,7.0,8.0],'b':[6.0,0,5.0,0,1.0]})
z.where(z['b'] != 0, z['a'] / z['b'].apply(lambda l: math.log(l)), 0)

What I want in this example is the value in 'a' divided by the log of the value in 'b' for each row, and for rows where 'b' is 0, I simply want to return 0.

在这个例子中,我想要的是 'a' 中的值除以每行 'b' 中值的对数,对于 'b' 为 0 的行,我只想返回 0。

采纳答案by jakevdp

The other answers are excellent, but I thought I'd add one other approach that can be faster in some circumstances – using broadcasting and masking to achieve the same result:

其他答案非常好,但我想我会添加另一种在某些情况下可以更快的方法 - 使用广播和屏蔽来实现相同的结果:

import numpy as np

mask = (z['b'] != 0)
z_valid = z[mask]

z['c'] = 0
z.loc[mask, 'c'] = z_valid['a'] / np.log(z_valid['b'])

Especially with very large dataframes, this approach will generally be faster than solutions based on apply().

特别是对于非常大的数据帧,这种方法通常比基于apply().

回答by Liam Foley

You can just use an if statement in a lambda function.

您可以只在 lambda 函数中使用 if 语句。

z['c'] = z.apply(lambda row: 0 if row['b'] in (0,1) else row['a'] / math.log(row['b']), axis=1)

I also excluded 1, because log(1) is zero.

我也排除了 1,因为 log(1) 为零。

Output:

输出:

   a  b         c
0  4  6  2.232443
1  5  0  0.000000
2  6  5  3.728010
3  7  0  0.000000
4  8  1  0.000000

回答by bananafish

You can use a lambda with a conditional to return 0 if the input value is 0 and skip the whole whereclause:

如果输入值为 0,您可以使用带条件的 lambda 返回 0 并跳过整个where子句:

z['c'] = z.apply(lambda x: math.log(x.b) if x.b > 0 else 0, axis=1)

You also have to assign the results to a new column (z['c']).

您还必须将结果分配给新列 ( z['c'])。

回答by maswadkar

Hope this helps. It is easy and readable

希望这可以帮助。它简单易读

df['c']=df['b'].apply(lambda x: 0 if x ==0 else math.log(x))