pandas python pandas数据框如果没有迭代思想数据框

Question

提问by cryp

I want to add a column to a df. The values of this new df will be dependent upon the values of the other columns. eg

我想向 df 添加一列。这个新 df 的值将取决于其他列的值。例如

dc = {'A':[0,9,4,5],'B':[6,0,10,12],'C':[1,3,15,18]}
df = pd.DataFrame(dc)
   A   B   C
0  0   6   1
1  9   0   3
2  4  10  15
3  5  12  18

Now I want to add another column D whose values will depend on values of A,B,C. So for example if was iterating through the df I would just do:

现在我想添加另一列 D，其值将取决于 A、B、C 的值。因此，例如，如果要遍历 df 我会这样做：

for row in df.iterrows():
    if(row['A'] != 0 and row[B] !=0):
         row['D'] = (float(row['A'])/float(row['B']))*row['C']
    elif(row['C'] ==0 and row['A'] != 0 and row[B] ==0):
         row['D'] == 250.0
    else:
         row['D'] == 20.0

Is there a way to do this without the for loop or using where () or apply () functions.

有没有办法在没有 for 循环或使用 where () 或 apply () 函数的情况下做到这一点。

Thanks

谢谢

Answer 1

回答by TomAugspurger

applyshould work well for you:

apply应该适合你：

In [20]: def func(row):
            if (row == 0).all():
                return 250.0
            elif (row[['A', 'B']] != 0).all():
                return (float(row['A']) / row['B'] ) * row['C']
            else:
                return 20
       ....:     


In [21]: df['D'] = df.apply(func, axis=1)

In [22]: df
Out[22]: 
   A   B   C     D
0  0   6   1  20.0
1  9   0   3  20.0
2  4  10  15   6.0
3  5  12  18   7.5

[4 rows x 4 columns]

Answer 2

回答by fantabolous

.wherecan be much faster than .apply, so if all you're doing is if/elses then I'd aim for .where. As you're returning scalars in some cases, np.wherewill be easier to use than pandas' own .where.

.where可以比快得多.apply，所以如果你所做的只是 if/elses 那么我的目标是.where. 当您在某些情况下返回标量时，np.where将比Pandas自己的.where.

import pandas as pd
import numpy as np
df['D'] = np.where((df.A!=0) & (df.B!=0), ((df.A/df.B)*df.C),
          np.where((df.C==0) & (df.A!=0) & (df.B==0), 250,
          20))

   A   B   C     D
0  0   6   1  20.0
1  9   0   3  20.0
2  4  10  15   6.0
3  5  12  18   7.5

For a tiny df like this, you wouldn't need to worry about speed. However, on a 10000 row df of randn, this is almost 2000 times faster than the .applysolution above: 3ms vs 5850ms. That said if speed isn't a concern, then .apply can often be easier to read.

对于像这样的小 df，您无需担心速度。但是，在.applyrandn的 10000 行 df 上，这几乎比上述解决方案快 2000 倍：3ms vs 5850ms。也就是说，如果速度不是问题，那么 .apply 通常更容易阅读。

Answer 3

回答by acushner

here's a start:

这是一个开始：

df['D'] = np.nan
df['D'].loc[df[(df.A != 0) & (df.B != 0)].index] = df.A / df.B.astype(np.float) * df.C

edit, you should probably just go ahead and cast the whole thing to floats unless you really care about integers for some reason:

编辑，除非您出于某种原因真正关心整数，否则您应该继续将整个事情转换为浮点数：

df = df.astype(np.float)

and then you don't have to constantly keep converting in call itself

然后你不必不断地在通话中不断转换

pandas python pandas数据框如果没有迭代思想数据框

提问by cryp

回答by TomAugspurger

回答by fantabolous

回答by acushner

相关推荐

最近更新

标签

pandas python pandas数据框如果没有迭代思想数据框

提问by cryp

回答by TomAugspurger

回答by fantabolous

回答by acushner

相关推荐

在 python pandas 中按列分层的箱线图

pandas “模块”对象没有属性“_version_”

使用 Pandas 进行自定义排序

pandas 使用日期作为索引合并熊猫数据框

相关推荐

最近更新

标签