选择特定的列来计算 Pandas 中的行式总计
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45948418/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selecting specific columns for calculating row wise total in pandas
提问by owise
Is there any way of making a sum on the columns after grouping in pandas data frame? For example I have the following data frame:
在 Pandas 数据框中分组后,有没有办法对列进行求和?例如,我有以下数据框:
ID W_1 W_2 W_3
1 0.1 0.2 0.3
1 0.2 0.4 0.5
2 0.3 0.3 0.2
2 0.1 0.3 0.4
2 0.2 0.0 0.5
1 0.5 0.3 0.2
1 0.4 0.2 0.1
I want to have an extra column called "my_sum" that sums the first row in all columns (W_1, W_2, W_3). The output would be something like this:
我想要一个名为“my_sum”的额外列,它对所有列(W_1、W_2、W_3)中的第一行求和。输出将是这样的:
ID W_1 W_2 W_3 my_sum
1 0.1 0.2 0.3 0.6
1 0.2 0.4 0.5 1.1
2 0.3 0.3 0.2 0.8
2 0.1 0.3 0.4 0.8
2 0.2 0.0 0.5 0.7
1 0.5 0.3 0.2 1.0
1 0.4 0.2 0.1 0.7
I tred the following:
我尝试了以下内容:
df['my_sum'] = df.groupby('ID')['W_1','W_1','W_1'].transform(sum,axis=1)
but this sums all entries of just W_1. The documentationmentions the axis parmeter, but I am not sure why it is not effective.
但这只是 W_1 的所有条目的总和。该文档提到了轴参数,但我不确定为什么它无效。
I looked into this questionand also this, but they are different from what I want.
回答by cs95
The thing that remains the same is the .sum(1)
. Here are some inventive alternatives to answers already posted.
保持不变的是.sum(1)
. 以下是已经发布的答案的一些创造性替代方案。
df.select_dtypes
df.select_dtypes
df['my_sum'] = df.select_dtypes(float).sum(1)
df
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
df.iloc
df.iloc
df['my_sum'] = df.iloc[:, 1:].sum(1)
df
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
Boolean Indexing
布尔索引
This is obnoxiously exploiting your data.
这是令人讨厌地利用您的数据。
df['my_sum'] = df[df < 1].sum(1)
df
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
DataFrame.sum
, or numpy.sum
DataFrame.sum
, 或者 numpy.sum
Filter on the column names using str.contains
:
使用str.contains
以下方法过滤列名:
df.iloc[:, df.columns.str.contains('W_')].sum(1)
df
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
Alternatively try summing on a numpy array directly, for performance:
或者尝试直接对 numpy 数组求和,以提高性能:
df['my_sum'] = df.values[:, 1:].sum(1)
df
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
回答by MaxU
In [7]: df['my_sum'] = df.drop('ID',1).sum(axis=1)
In [8]: df
Out[8]:
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
or:
或者:
In [9]: df['my_sum'] = df.filter(regex='^W_\d+').sum(axis=1)
In [10]: df
Out[10]:
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
回答by piRSquared
You don't need to group by anything if you just want to sum across rows. Just use axis=1
in your sum.
如果您只想跨行求和,则无需按任何内容分组。只需axis=1
在您的总和中使用。
The crux is figuring out how to identify which columns to sum over. In your case, we can pick the appropriate columns in many ways. We don't actually know what your "real" data looks like.
关键是弄清楚如何确定要对哪些列求和。在您的情况下,我们可以通过多种方式选择合适的列。我们实际上并不知道您的“真实”数据是什么样的。
@MaxU covered the more practical solutions. This one should be fast.
@MaxU 涵盖了更实用的解决方案。这个应该很快。
df.assign(
my_sum=np.column_stack([df[c].values for c in df if c.startswith('W_')]).sum(1)
)
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
Or if it really is just ['W_1', 'W_2', 'W_3']
或者如果它真的只是 ['W_1', 'W_2', 'W_3']
df.assign(my_sum=df[['W_1', 'W_2', 'W_3']].sum(1))
ID W_1 W_2 W_3 my_sum
0 1 0.1 0.2 0.3 0.6
1 1 0.2 0.4 0.5 1.1
2 2 0.3 0.3 0.2 0.8
3 2 0.1 0.3 0.4 0.8
4 2 0.2 0.0 0.5 0.7
5 1 0.5 0.3 0.2 1.0
6 1 0.4 0.2 0.1 0.7
回答by sameagol
Also, you can pass a list to the dataframe, indicating which columns to sum over. This is helpful because columns can be readily put in list form.
此外,您可以将列表传递给数据框,指示要对哪些列求和。这很有用,因为列可以很容易地以列表形式放置。
sum_list = ['W_1', 'W_2', 'W_3']
df['my_sum'] = df[sum_list].sum(1)