选择特定的列来计算 Pandas 中的行式总计

Question

提问by owise

Is there any way of making a sum on the columns after grouping in pandas data frame? For example I have the following data frame:

在 Pandas 数据框中分组后，有没有办法对列进行求和？例如，我有以下数据框：

ID   W_1       W_2     W_3 
1    0.1       0.2     0.3
1    0.2       0.4     0.5
2    0.3       0.3     0.2
2    0.1       0.3     0.4
2    0.2       0.0     0.5
1    0.5       0.3     0.2
1    0.4       0.2     0.1

I want to have an extra column called "my_sum" that sums the first row in all columns (W_1, W_2, W_3). The output would be something like this:

我想要一个名为“my_sum”的额外列，它对所有列（W_1、W_2、W_3）中的第一行求和。输出将是这样的：

ID   W_1       W_2     W_3     my_sum
1    0.1       0.2     0.3      0.6
1    0.2       0.4     0.5      1.1
2    0.3       0.3     0.2      0.8
2    0.1       0.3     0.4      0.8
2    0.2       0.0     0.5      0.7
1    0.5       0.3     0.2      1.0
1    0.4       0.2     0.1      0.7

I tred the following:

我尝试了以下内容：

df['my_sum'] =   df.groupby('ID')['W_1','W_1','W_1'].transform(sum,axis=1)

but this sums all entries of just W_1. The documentationmentions the axis parmeter, but I am not sure why it is not effective.

但这只是 W_1 的所有条目的总和。该文档提到了轴参数，但我不确定为什么它无效。

I looked into this questionand also this, but they are different from what I want.

我调查了this question和this，但它们与我想要的不同。

Answer 1

回答by cs95

The thing that remains the same is the .sum(1). Here are some inventive alternatives to answers already posted.

保持不变的是.sum(1). 以下是已经发布的答案的一些创造性替代方案。

`df.select_dtypes`

df['my_sum'] = df.select_dtypes(float).sum(1)
df
   ID  W_1  W_2  W_3  my_sum
0   1  0.1  0.2  0.3     0.6
1   1  0.2  0.4  0.5     1.1
2   2  0.3  0.3  0.2     0.8
3   2  0.1  0.3  0.4     0.8
4   2  0.2  0.0  0.5     0.7
5   1  0.5  0.3  0.2     1.0
6   1  0.4  0.2  0.1     0.7

`df.iloc`

df['my_sum'] = df.iloc[:, 1:].sum(1)
df
   ID  W_1  W_2  W_3  my_sum
0   1  0.1  0.2  0.3     0.6
1   1  0.2  0.4  0.5     1.1
2   2  0.3  0.3  0.2     0.8
3   2  0.1  0.3  0.4     0.8
4   2  0.2  0.0  0.5     0.7
5   1  0.5  0.3  0.2     1.0
6   1  0.4  0.2  0.1     0.7

Boolean Indexing

布尔索引

This is obnoxiously exploiting your data.

这是令人讨厌地利用您的数据。

df['my_sum'] = df[df < 1].sum(1)
df
   ID  W_1  W_2  W_3  my_sum
0   1  0.1  0.2  0.3     0.6
1   1  0.2  0.4  0.5     1.1
2   2  0.3  0.3  0.2     0.8
3   2  0.1  0.3  0.4     0.8
4   2  0.2  0.0  0.5     0.7
5   1  0.5  0.3  0.2     1.0
6   1  0.4  0.2  0.1     0.7

`DataFrame.sum`, or `numpy.sum`

`DataFrame.sum`，或者 `numpy.sum`

Filter on the column names using str.contains:

使用str.contains以下方法过滤列名：

df.iloc[:, df.columns.str.contains('W_')].sum(1)
df
   ID  W_1  W_2  W_3  my_sum
0   1  0.1  0.2  0.3     0.6
1   1  0.2  0.4  0.5     1.1
2   2  0.3  0.3  0.2     0.8
3   2  0.1  0.3  0.4     0.8
4   2  0.2  0.0  0.5     0.7
5   1  0.5  0.3  0.2     1.0
6   1  0.4  0.2  0.1     0.7

Alternatively try summing on a numpy array directly, for performance:

或者尝试直接对 numpy 数组求和，以提高性能：

df['my_sum'] = df.values[:, 1:].sum(1)
df
   ID  W_1  W_2  W_3  my_sum
0   1  0.1  0.2  0.3     0.6
1   1  0.2  0.4  0.5     1.1
2   2  0.3  0.3  0.2     0.8
3   2  0.1  0.3  0.4     0.8
4   2  0.2  0.0  0.5     0.7
5   1  0.5  0.3  0.2     1.0
6   1  0.4  0.2  0.1     0.7

Answer 2

回答by MaxU

In [7]: df['my_sum'] = df.drop('ID',1).sum(axis=1)

In [8]: df
Out[8]:
   ID  W_1  W_2  W_3  my_sum
0   1  0.1  0.2  0.3     0.6
1   1  0.2  0.4  0.5     1.1
2   2  0.3  0.3  0.2     0.8
3   2  0.1  0.3  0.4     0.8
4   2  0.2  0.0  0.5     0.7
5   1  0.5  0.3  0.2     1.0
6   1  0.4  0.2  0.1     0.7

or:

或者：

In [9]: df['my_sum'] = df.filter(regex='^W_\d+').sum(axis=1)

In [10]: df
Out[10]:
   ID  W_1  W_2  W_3  my_sum
0   1  0.1  0.2  0.3     0.6
1   1  0.2  0.4  0.5     1.1
2   2  0.3  0.3  0.2     0.8
3   2  0.1  0.3  0.4     0.8
4   2  0.2  0.0  0.5     0.7
5   1  0.5  0.3  0.2     1.0
6   1  0.4  0.2  0.1     0.7

Answer 3

回答by piRSquared

You don't need to group by anything if you just want to sum across rows. Just use axis=1in your sum.

如果您只想跨行求和，则无需按任何内容分组。只需axis=1在您的总和中使用。

The crux is figuring out how to identify which columns to sum over. In your case, we can pick the appropriate columns in many ways. We don't actually know what your "real" data looks like.

关键是弄清楚如何确定要对哪些列求和。在您的情况下，我们可以通过多种方式选择合适的列。我们实际上并不知道您的“真实”数据是什么样的。

@MaxU covered the more practical solutions. This one should be fast.

@MaxU 涵盖了更实用的解决方案。这个应该很快。

df.assign(
    my_sum=np.column_stack([df[c].values for c in df if c.startswith('W_')]).sum(1)
)

   ID  W_1  W_2  W_3  my_sum
0   1  0.1  0.2  0.3     0.6
1   1  0.2  0.4  0.5     1.1
2   2  0.3  0.3  0.2     0.8
3   2  0.1  0.3  0.4     0.8
4   2  0.2  0.0  0.5     0.7
5   1  0.5  0.3  0.2     1.0
6   1  0.4  0.2  0.1     0.7

Or if it really is just ['W_1', 'W_2', 'W_3']

或者如果它真的只是 ['W_1', 'W_2', 'W_3']

df.assign(my_sum=df[['W_1', 'W_2', 'W_3']].sum(1))

   ID  W_1  W_2  W_3  my_sum
0   1  0.1  0.2  0.3     0.6
1   1  0.2  0.4  0.5     1.1
2   2  0.3  0.3  0.2     0.8
3   2  0.1  0.3  0.4     0.8
4   2  0.2  0.0  0.5     0.7
5   1  0.5  0.3  0.2     1.0
6   1  0.4  0.2  0.1     0.7

Answer 4

回答by sameagol

Also, you can pass a list to the dataframe, indicating which columns to sum over. This is helpful because columns can be readily put in list form.

此外，您可以将列表传递给数据框，指示要对哪些列求和。这很有用，因为列可以很容易地以列表形式放置。

sum_list = ['W_1', 'W_2', 'W_3']
df['my_sum'] = df[sum_list].sum(1)

选择特定的列来计算 Pandas 中的行式总计

提问by owise

回答by cs95

`df.select_dtypes`

`df.select_dtypes`

`df.iloc`

`df.iloc`

Boolean Indexing

布尔索引

`DataFrame.sum`, or `numpy.sum`

`DataFrame.sum`，或者 `numpy.sum`

回答by MaxU

回答by piRSquared

回答by sameagol

相关推荐

最近更新

标签

选择特定的列来计算 Pandas 中的行式总计

提问by owise

回答by cs95

df.select_dtypes

df.select_dtypes

df.iloc

df.iloc

Boolean Indexing

布尔索引

DataFrame.sum, or numpy.sum

DataFrame.sum， 或者 numpy.sum

回答by MaxU

回答by piRSquared

回答by sameagol

相关推荐

pandas 'float' 对象没有属性 'strip'

如何在 Pandas 中获得过去几个月的移动平均值

Python 的 Pandas：例外：数据必须是一维的

pandas 熊猫：平衡数据

相关推荐

最近更新

标签

`df.select_dtypes`

`df.select_dtypes`

`df.iloc`

`df.iloc`

`DataFrame.sum`, or `numpy.sum`

`DataFrame.sum`，或者 `numpy.sum`