pandas python pandas数据框如果没有迭代思想数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23482304/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas data frame if else without iterating thought data frame
提问by cryp
I want to add a column to a df. The values of this new df will be dependent upon the values of the other columns. eg
我想向 df 添加一列。这个新 df 的值将取决于其他列的值。例如
dc = {'A':[0,9,4,5],'B':[6,0,10,12],'C':[1,3,15,18]}
df = pd.DataFrame(dc)
A B C
0 0 6 1
1 9 0 3
2 4 10 15
3 5 12 18
Now I want to add another column D whose values will depend on values of A,B,C. So for example if was iterating through the df I would just do:
现在我想添加另一列 D,其值将取决于 A、B、C 的值。因此,例如,如果要遍历 df 我会这样做:
for row in df.iterrows():
if(row['A'] != 0 and row[B] !=0):
row['D'] = (float(row['A'])/float(row['B']))*row['C']
elif(row['C'] ==0 and row['A'] != 0 and row[B] ==0):
row['D'] == 250.0
else:
row['D'] == 20.0
Is there a way to do this without the for loop or using where () or apply () functions.
有没有办法在没有 for 循环或使用 where () 或 apply () 函数的情况下做到这一点。
Thanks
谢谢
回答by TomAugspurger
applyshould work well for you:
apply应该适合你:
In [20]: def func(row):
if (row == 0).all():
return 250.0
elif (row[['A', 'B']] != 0).all():
return (float(row['A']) / row['B'] ) * row['C']
else:
return 20
....:
In [21]: df['D'] = df.apply(func, axis=1)
In [22]: df
Out[22]:
A B C D
0 0 6 1 20.0
1 9 0 3 20.0
2 4 10 15 6.0
3 5 12 18 7.5
[4 rows x 4 columns]
回答by fantabolous
.wherecan be much faster than .apply, so if all you're doing is if/elses then I'd aim for .where. As you're returning scalars in some cases, np.wherewill be easier to use than pandas' own .where.
.where可以比 快得多.apply,所以如果你所做的只是 if/elses 那么我的目标是.where. 当您在某些情况下返回标量时,np.where将比Pandas自己的.where.
import pandas as pd
import numpy as np
df['D'] = np.where((df.A!=0) & (df.B!=0), ((df.A/df.B)*df.C),
np.where((df.C==0) & (df.A!=0) & (df.B==0), 250,
20))
A B C D
0 0 6 1 20.0
1 9 0 3 20.0
2 4 10 15 6.0
3 5 12 18 7.5
For a tiny df like this, you wouldn't need to worry about speed. However, on a 10000 row df of randn, this is almost 2000 times faster than the .applysolution above: 3ms vs 5850ms. That said if speed isn't a concern, then .apply can often be easier to read.
对于像这样的小 df,您无需担心速度。但是,在.applyrandn的 10000 行 df 上,这几乎比上述解决方案快 2000 倍:3ms vs 5850ms。也就是说,如果速度不是问题,那么 .apply 通常更容易阅读。
回答by acushner
here's a start:
这是一个开始:
df['D'] = np.nan
df['D'].loc[df[(df.A != 0) & (df.B != 0)].index] = df.A / df.B.astype(np.float) * df.C
edit, you should probably just go ahead and cast the whole thing to floats unless you really care about integers for some reason:
编辑,除非您出于某种原因真正关心整数,否则您应该继续将整个事情转换为浮点数:
df = df.astype(np.float)
and then you don't have to constantly keep converting in call itself
然后你不必不断地在通话中不断转换

