使用 Pandas 按列总和的值分组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21584434/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:39:47  来源:igfitidea点击:

Group by value of sum of columns with Pandas

pythongroup-bypandasdataframe

提问by mazieres

I got lost in Pandas doc and features trying to figure out a way to groupbya DataFrameby the values of the sum of the columns.

我失去了在Pandasdoc和功能尝试的方式计算出到 groupby一个DataFrame由列的和值。

for instance, let say I have the following data :

例如,假设我有以下数据:

In [2]: dat = {'a':[1,0,0], 'b':[0,1,0], 'c':[1,0,0], 'd':[2,3,4]}

In [3]: df = pd.DataFrame(dat)

In [4]: df
Out[4]: 
   a  b  c  d
0  1  0  1  2
1  0  1  0  3
2  0  0  0  4

I would like columns a, band cto be grouped since they all have their sum equal to 1. The resulting DataFrame would have columns labels equals to the sum of the columns it summed. Like this :

我想要 columns ab并且c要分组,因为它们的总和都等于 1。生成的 DataFrame 的列标签将等于它相加的列的总和。像这样 :

   1  9
0  2  2
1  1  3
2  0  4

Any idea to put me in the good direction ? Thanks in advance !

任何想法让我朝着好的方向发展?提前致谢 !

回答by TomAugspurger

Here you go:

干得好:

In [57]: df.groupby(df.sum(), axis=1).sum()
Out[57]: 
   1  9
0  2  2
1  1  3
2  0  4

[3 rows x 2 columns]

df.sum()is your grouper. It sums over the 0 axis (the index), giving you the two groups: 1(columns a, b, and, c) and 9(column d) . You want to group the columns (axis=1), and take the sum of each group.

df.sum()是你的石斑鱼。它在 0 轴(索引)上求和,为您提供两组:1(columns a, b, and, c) 和9(column d) 。您想对列 ( axis=1)进行分组,并计算每组的总和。

回答by LondonRob

Because pandasis designed with database concepts in mind, it's really expected information to be stored together in rows, not in columns. Because of this, it's usually more elegant to do things row-wise. Here's how to solve your problem row-wise:

因为pandas在设计时考虑了数据库概念,所以真正期望信息以行而不是列的形式存储在一起。因此,按行做事通常更优雅。以下是按行解决问题的方法:

dat = {'a':[1,0,0], 'b':[0,1,0], 'c':[1,0,0], 'd':[2,3,4]}
df = pd.DataFrame(dat)

df = df.transpose()
df['totals'] = df.sum(1)

print df.groupby('totals').sum().transpose()
#totals  1  9
#0       2  2
#1       1  3
#2       0  4