Python Pandas 计数和求和特定条件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20995196/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas counting and summing specific conditions
提问by user3084006
Are there single functions in pandas to perform the equivalents of SUMIF, which sums over a specific condition and COUNTIF, which counts values of specific conditions from Excel?
pandas 中是否有单个函数来执行SUMIF的等效项,它对特定条件和COUNTIF求和,它计算 Excel 中特定条件的值?
I know that there are many multiple step functions that can be used for
我知道有许多多步函数可用于
for example for sumifI can use (df.map(lambda x: condition), or df.size())then use .sum()
例如sumif我可以使用(df.map(lambda x: condition), or df.size())然后使用.sum()
and for countifI can use (groupby functionsand look for my answer or use a filter and the .count())
因为countif我可以使用(groupby functions并寻找我的答案或使用过滤器和.count())
Is there simple one step process to do these functions where you enter the condition and the data frame and you get the sum or counted results?
是否有简单的一步过程来执行这些功能,您可以在其中输入条件和数据框并获得总和或计数结果?
采纳答案by Jimmy C
You can first make a conditional selection, and sum up the results of the selection using the sumfunction.
您可以先进行条件选择,然后使用该sum函数总结选择的结果。
>> df = pd.DataFrame({'a': [1, 2, 3]})
>> df[df.a > 1].sum()
a 5
dtype: int64
Having more than one condition:
有多个条件:
>> df[(df.a > 1) & (df.a < 3)].sum()
a 2
dtype: int64
回答by Thorsten Kranz
You didn't mention the fancy indexing capabilities of dataframes, e.g.:
您没有提到数据帧的花哨索引功能,例如:
>>> df = pd.DataFrame({"class":[1,1,1,2,2], "value":[1,2,3,4,5]})
>>> df[df["class"]==1].sum()
class 3
value 6
dtype: int64
>>> df[df["class"]==1].sum()["value"]
6
>>> df[df["class"]==1].count()["value"]
3
You could replace df["class"]==1by another condition.
您可以替换df["class"]==1为其他条件。
回答by dan12345
I usually use numpy sum over the logical condition column:
我通常在逻辑条件列上使用 numpy sum:
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({'Age' : [20,24,18,5,78]})
>>> np.sum(df['Age'] > 20)
2
This seems to me slightly shorter than the solution presented above
在我看来,这比上面提出的解决方案略短

