Pandas 中的 Sumifs 有两个条件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48748804/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sumifs in Pandas with two conditions
提问by Irwin Mooketsi Chelenyane
I want the pandas equivalent of the Excel's sumifs
for example
我想Pandas相当于Excel中的sumifs
例如
=SUMIFS($D4:D7,$D7,$G4:G7)
I have three columns, the contract
, the amount
and transaction_type_tla
. For each contract
, I would like to sum the amount
if the transaction type is CBP
. The following formula is not working:
我有三列, the contract
, theamount
和transaction_type_tla
。对于每个contract
,我想总结amount
交易类型是否为CBP
. 以下公式不起作用:
data['Var']=(data.groupby('contract',"transaction_type_tla=='CBP'")['amount'].cumsum())
回答by YOBEN_S
Borrow jp'data :-)
借用 jp'data :-)
df['New']=df.groupby('contract').apply(lambda x : x['amount'][x['type']=='CBP'].cumsum()).reset_index(level=0,drop=True)
df
Out[258]:
contract amount type New
0 A 123 ABC NaN
1 A 341 ABC NaN
2 A 652 CBP 652.0
3 A 150 CBP 802.0
4 B 562 DEF NaN
5 B 674 ABC NaN
6 B 562 CBP 562.0
7 B 147 CBP 709.0
回答by ZaxR
Edit: I think @Wen's answer is more in line with what you're looking for, but in case you wanted the result as a series:
编辑:我认为@Wen 的答案更符合您的要求,但如果您希望将结果作为一个系列:
An easy way to do this is to first filter the list of transactions by the transaction_type_tla you're looking for and then apply the groupby and whatever aggregation method you want:
一种简单的方法是首先按您要查找的 transaction_type_tla 过滤交易列表,然后应用 groupby 和您想要的任何聚合方法:
ans = data[data['transaction_type_tla'] == 'CBP']
ans.groupby('contract')['amount'].cumsum()
This will result in a series with your answer.
这将导致您的答案系列。
回答by jpp
This is one way. I've set up some imaginary data to test.
这是一种方式。我已经设置了一些假想数据进行测试。
Output is dataframe in same format, but with CBP
transactions summed.
输出是相同格式的数据帧,但CBP
总和交易。
import pandas as pd
df = pd.DataFrame([['A', 123, 'ABC'],
['A', 341, 'ABC'],
['A', 652, 'CBP'],
['A', 150, 'CBP'],
['B', 562, 'DEF'],
['B', 674, 'ABC'],
['B', 562, 'CBP'],
['B', 147, 'CBP']],
columns=['contract', 'amount', 'type'])
s = df.groupby(['contract', 'type'])['amount'].sum()
df = df.set_index(['contract', 'type']).join(s, rsuffix='_group')
df.loc[pd.IndexSlice[:, 'CBP'], 'amount'] = df.loc[pd.IndexSlice[:, 'CBP'], 'amount_group']
df = df.drop('amount_group', 1).reset_index().drop_duplicates()
# contract type amount
# 0 A ABC 123
# 1 A ABC 341
# 2 A CBP 802
# 4 B ABC 674
# 5 B CBP 709
# 7 B DEF 562