在 Pandas 中创建类似 Excel 的 SUMIFS

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11012981/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 15:44:30  来源:igfitidea点击:

Create Excel-like SUMIFS in Pandas

pythonpandas

提问by Julio Guzman

I recently learned about pandasand was happy to see its analytics functionality. I am trying to convert Excel array functions into the Pandas equivalent to automate spreadsheets that I have created for the creation of performance attribution reports. In this example, I created a new column in Excel based on conditions within other columns:

我最近了解pandas并很高兴看到它的分析功能。我正在尝试将 Excel 数组函数转换为 Pandas,相当于我为创建绩效归因报告而创建的自动化电子表格。在此示例中,我根据其他列中的条件在 Excel 中创建了一个新列:

={SUMIFS($F:$F18,$A:$A18,$C,$B:$B18,0,$C:$C18," ",$D:$D18,$D10,$E:$E18,$E10)}

The formula is summing up the values in the "F" array (security weights) based on certain conditions. "A" array (portfolio ID) is a certain number, "B" array (security id) is zero, "C" array (group description) is " ", "D" array (start date) is the date of the row that I am on, and "E" array (end date) is the date of the row that I am on.

该公式是根据特定条件对“F”数组​​(证券权重)中的值求和。“A”数组(投资组合ID)是某个数字,“B”数组(证券ID)为零,“C”数组(组描述)是“ ”,“D”数组(开始日期)是该行的日期我在,“E”数组(结束日期)是我所在行的日期。

In Pandas, I am using the DataFrame. Creating a new column on a dataframe with the first three conditions is straight forward, but I am having difficult with the last two conditions.

在 Pandas 中,我使用的是 DataFrame。在具有前三个条件的数据帧上创建一个新列很简单,但我对后两个条件有困难。

reportAggregateDF['PORT_WEIGHT'] = reportAggregateDF['SEC_WEIGHT_RATE']
          [(reportAggregateDF['PORT_ID'] == portID) &
           (reportAggregateDF['SEC_ID'] == 0) &
           (reportAggregateDF['GROUP_LIST'] == " ") & 
           (reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[:]) & 
           (reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[:])].sum()

Obviously the .ix[:] in the last two conditions is not doing anything for me, but is there a way to make the sum conditional on the row that I am on without looping? My goal is to not do any loops, but instead use purely vector operations.

显然,最后两个条件中的 .ix[:] 对我没有任何作用,但是有没有办法使总和以我所在的行为条件而不循环?我的目标是不做任何循环,而是使用纯向量运算。

回答by guyrt

You want to use the apply function and a lambda:

您想使用 apply 函数和一个 lambda:

>> df
     A    B    C    D     E
0  mitfx  0  200  300  0.25
1     gs  1  150  320  0.35
2    duk  1    5    2  0.45
3    bmo  1  145   65  0.65

Let's say I want to sum column C times E but only if column B == 1 and D is greater than 5:

假设我想将 C 列与 E 相加,但前提是列 B == 1 且 D 大于 5:

df['matches'] = df.apply(lambda x: x['C'] * x['E'] if x['B'] == 1 and x['D'] > 5 else 0, axis=1)
df.matches.sum()

It might be cleaner to split this into two steps:

将其分为两个步骤可能会更清晰:

df_subset = df[(df.B == 1) & (df.D > 5)]
df_subset.apply(lambda x: x.C * x.E, axis=1).sum()

or to use simply multiplication for speed:

或者简单地使用乘法来提高速度:

df_subset = df[(df.B == 1) & (df.D > 5)]
print sum(df_subset.C * df_subset.E)

You are absolutely right to want to do this problem without loops.

你想在没有循环的情况下解决这个问题是绝对正确的。

回答by Julio Guzman

I'm sure there is a better way, but this did it in a loop:

我确定有更好的方法,但这是在循环中完成的:

for idx, eachRecord in reportAggregateDF.T.iteritems():
reportAggregateDF['PORT_WEIGHT'].ix[idx] = reportAggregateDF['SEC_WEIGHT_RATE'][(reportAggregateDF['PORT_ID'] == portID) &            
    (reportAggregateDF['SEC_ID'] == 0) &            
    (reportAggregateDF['GROUP_LIST'] == " ") &             
    (reportAggregateDF['START_DATE'] == reportAggregateDF['START_DATE'].ix[idx]) &             
    (reportAggregateDF['END_DATE'] == reportAggregateDF['END_DATE'].ix[idx])].sum()