Python 如何使用熊猫对与给定条件匹配的列中的值求和?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28236305/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I sum values in a column that match a given condition using pandas?
提问by adijo
Suppose I have a column like so:
假设我有一个像这样的专栏:
a b
1 5
1 7
2 3
1 3
2 5
I want to sum up the values for b
where a = 1
, for example. This would give me 5 + 7 + 3 = 15
.
例如,我想总结b
where的值a = 1
。这会给我5 + 7 + 3 = 15
。
How do I do this in pandas?
我如何在熊猫中做到这一点?
采纳答案by Alex Riley
The essential idea here is to select the data you want to sum, and then sum them. This selection of data can be done in several different ways, a few of which are shown below.
这里的基本思想是选择要求和的数据,然后对它们求和。这种数据选择可以通过几种不同的方式完成,其中一些如下所示。
Boolean indexing
布尔索引
Arguably the most common way to select the values is to use Boolean indexing.
可以说,选择值的最常用方法是使用布尔索引。
With this method, you find out where column 'a' is equal to 1
and then sum the corresponding rows of column 'b'. You can use loc
to handle the indexing of rows and columns:
使用此方法,您可以找出“a”列等于的位置1
,然后对“b”列的相应行求和。您可以使用loc
来处理行和列的索引:
>>> df.loc[df['a'] == 1, 'b'].sum()
15
The Boolean indexing can be extended to other columns. For example if df
also contained a column 'c' and we wanted to sum the rows in 'b' where 'a' was 1 and 'c' was 2, we'd write:
布尔索引可以扩展到其他列。例如,如果df
还包含一列 'c' 并且我们想对 'b' 中的行求和,其中 'a' 是 1,'c' 是 2,我们会写:
df.loc[(df['a'] == 1) & (df['c'] == 2), 'b'].sum()
Query
询问
Another way to select the data is to use query
to filter the rows you're interested in, select column 'b' and then sum:
选择数据的另一种方法是使用query
过滤您感兴趣的行,选择列 'b' 然后求和:
>>> df.query("a == 1")['b'].sum()
15
Again, the method can be extended to make more complicated selections of the data:
同样,该方法可以扩展为对数据进行更复杂的选择:
df.query("a == 1 and c == 2")['b'].sum()
Note this is a little more concise than the Boolean indexing approach.
请注意,这比布尔索引方法要简洁一些。
Groupby
通过...分组
The alternative approach is to use groupby
to split the DataFrame into parts according to the value in column 'a'. You can then sum each part and pull out the value that the 1s added up to:
另一种方法是groupby
根据“a”列中的值将 DataFrame 拆分为多个部分。然后,您可以对每个部分求和并提取 1 加起来的值:
>>> df.groupby('a')['b'].sum()[1]
15
This approach is likely to be slower than using Boolean indexing, but it is useful if you want check the sums for other values in column a
:
这种方法可能比使用布尔索引慢,但如果您想检查 column 中其他值的总和,它很有用a
:
>>> df.groupby('a')['b'].sum()
a
1 15
2 8
回答by Himanshu Poddar
You can also do this without using groupby or loc. By simply including the condition in code. Let the name of dataframe be df. Then you can try :
您也可以在不使用 groupby 或 loc 的情况下执行此操作。通过简单地在代码中包含条件。让数据框的名称为 df。然后你可以尝试:
df[df['a']==1]['b'].sum()
or you can also try :
或者您也可以尝试:
sum(df[df['a']==1]['b'])
Another way could be to use the numpy library of python :
另一种方法是使用 python 的 numpy 库:
import numpy as np
print(np.where(df['a']==1, df['b'],0).sum())