Python 如何使用熊猫对与给定条件匹配的列中的值求和？

Question

提问by adijo

Suppose I have a column like so:

假设我有一个像这样的专栏：

I want to sum up the values for bwhere a = 1, for example. This would give me 5 + 7 + 3 = 15.

例如，我想总结bwhere的值a = 1。这会给我5 + 7 + 3 = 15。

How do I do this in pandas?

我如何在熊猫中做到这一点？

Answer 1

采纳答案by Alex Riley

The essential idea here is to select the data you want to sum, and then sum them. This selection of data can be done in several different ways, a few of which are shown below.

这里的基本思想是选择要求和的数据，然后对它们求和。这种数据选择可以通过几种不同的方式完成，其中一些如下所示。

Boolean indexing

布尔索引

Arguably the most common way to select the values is to use Boolean indexing.

可以说，选择值的最常用方法是使用布尔索引。

With this method, you find out where column 'a' is equal to 1and then sum the corresponding rows of column 'b'. You can use locto handle the indexing of rows and columns:

使用此方法，您可以找出“a”列等于的位置1，然后对“b”列的相应行求和。您可以使用loc来处理行和列的索引：

>>> df.loc[df['a'] == 1, 'b'].sum()
15

The Boolean indexing can be extended to other columns. For example if dfalso contained a column 'c' and we wanted to sum the rows in 'b' where 'a' was 1 and 'c' was 2, we'd write:

布尔索引可以扩展到其他列。例如，如果df还包含一列 'c' 并且我们想对 'b' 中的行求和，其中 'a' 是 1，'c' 是 2，我们会写：

df.loc[(df['a'] == 1) & (df['c'] == 2), 'b'].sum()

Query

询问

Another way to select the data is to use queryto filter the rows you're interested in, select column 'b' and then sum:

选择数据的另一种方法是使用query过滤您感兴趣的行，选择列 'b' 然后求和：

>>> df.query("a == 1")['b'].sum()
15

Again, the method can be extended to make more complicated selections of the data:

同样，该方法可以扩展为对数据进行更复杂的选择：

df.query("a == 1 and c == 2")['b'].sum()

Note this is a little more concise than the Boolean indexing approach.

请注意，这比布尔索引方法要简洁一些。

Groupby

通过...分组

The alternative approach is to use groupbyto split the DataFrame into parts according to the value in column 'a'. You can then sum each part and pull out the value that the 1s added up to:

另一种方法是groupby根据“a”列中的值将 DataFrame 拆分为多个部分。然后，您可以对每个部分求和并提取 1 加起来的值：

>>> df.groupby('a')['b'].sum()[1]
15

This approach is likely to be slower than using Boolean indexing, but it is useful if you want check the sums for other values in column a:

这种方法可能比使用布尔索引慢，但如果您想检查 column 中其他值的总和，它很有用a：

>>> df.groupby('a')['b'].sum()
a
1    15
2     8

Answer 2

回答by Himanshu Poddar

You can also do this without using groupby or loc. By simply including the condition in code. Let the name of dataframe be df. Then you can try :

您也可以在不使用 groupby 或 loc 的情况下执行此操作。通过简单地在代码中包含条件。让数据框的名称为 df。然后你可以尝试：

df[df['a']==1]['b'].sum()

or you can also try :

或者您也可以尝试：

sum(df[df['a']==1]['b'])

Another way could be to use the numpy library of python :

另一种方法是使用 python 的 numpy 库：

import numpy as np
print(np.where(df['a']==1, df['b'],0).sum())

Python 如何使用熊猫对与给定条件匹配的列中的值求和？

提问by adijo

采纳答案by Alex Riley

Boolean indexing

布尔索引

Query

询问

Groupby

通过...分组

回答by Himanshu Poddar

相关推荐

最近更新

标签

Python 如何使用熊猫对与给定条件匹配的列中的值求和？

提问by adijo

采纳答案by Alex Riley

Boolean indexing

布尔索引

Query

询问

Groupby

通过...分组

回答by Himanshu Poddar

相关推荐

Python 将字符串转换为 numpy 数组

Python 做groupby时保留其他列

Python Flask 在哪里寻找图像文件？

检查python中的值是否为零或不为空

相关推荐

最近更新

标签