pandas 使用python对另一列的列和总内容进行分组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39116735/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Group by of a Column and Sum Contents of another column with python
提问by Poisson
I have a dataframe merged_df_energy
:
我有一个数据框merged_df_energy
:
+------------------------+------------------------+------------------------+--------------+
| ACT_TIME_AERATEUR_1_F1 | ACT_TIME_AERATEUR_1_F3 | ACT_TIME_AERATEUR_1_F5 | class_energy |
+------------------------+------------------------+------------------------+--------------+
| 63.333333 | 63.333333 | 63.333333 | low |
| 0 | 0 | 0 | high |
| 45.67 | 0 | 55.94 | high |
| 0 | 0 | 23.99 | low |
| 0 | 20 | 23.99 | medium |
+------------------------+------------------------+------------------------+--------------+
I would like to create for each ACT_TIME_AERATEUR_1_Fx
(ACT_TIME_AERATEUR_1_F1
, ACT_TIME_AERATEUR_1_F3
and ACT_TIME_AERATEUR_1_F5
) a dataframe wich contains these columns : class_energy
and sum_time
我想为每个ACT_TIME_AERATEUR_1_Fx
( ACT_TIME_AERATEUR_1_F1
,ACT_TIME_AERATEUR_1_F3
和ACT_TIME_AERATEUR_1_F5
)创建一个包含这些列的数据框:class_energy
和sum_time
For example for the dataframe corresponding to ACT_TIME_AERATEUR_1_F1
:
例如对于对应于的数据框ACT_TIME_AERATEUR_1_F1
:
+-----------------+-----------+
| class_energy | sum_time |
+-----------------+-----------+
| low | 63.333333 |
| medium | 0 |
| high | 45.67 |
+-----------------+-----------+
I thing to do I should use the group by like this:
我要做的事情我应该像这样使用该组:
data.groupby(by=['class_energy'])['sum_time'].sum()
Any idea to help me please?
有什么想法可以帮助我吗?
采纳答案by jezrael
You can add all columns to []
for aggregating:
您可以将所有列添加到[]
聚合:
print (df.groupby(by=['class_energy'])['ACT_TIME_AERATEUR_1_F1', 'ACT_TIME_AERATEUR_1_F3','ACT_TIME_AERATEUR_1_F5'].sum())
ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3 \
class_energy
high 45.670000 0.000000
low 63.333333 63.333333
medium 0.000000 20.000000
ACT_TIME_AERATEUR_1_F5
class_energy
high 55.940000
low 87.323333
medium 23.990000
You can use also parameter as_index=False
:
您还可以使用参数as_index=False
:
print (df.groupby(by=['class_energy'], as_index=False)['ACT_TIME_AERATEUR_1_F1', 'ACT_TIME_AERATEUR_1_F3','ACT_TIME_AERATEUR_1_F5'].sum())
class_energy ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3 \
0 high 45.670000 0.000000
1 low 63.333333 63.333333
2 medium 0.000000 20.000000
ACT_TIME_AERATEUR_1_F5
0 55.940000
1 87.323333
2 23.990000
If need aggregate only first 3
columns:
如果只需要聚合第一3
列:
print (df.groupby(by=['class_energy'], as_index=False)[df.columns[:3]].sum())
class_energy ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3 \
0 high 45.670000 0.000000
1 low 63.333333 63.333333
2 medium 0.000000 20.000000
ACT_TIME_AERATEUR_1_F5
0 55.940000
1 87.323333
2 23.990000
...or all columns without last:
...或没有最后一个的所有列:
print (df.groupby(by=['class_energy'], as_index=False)[df.columns[:-1]].sum())
class_energy ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3 \
0 high 45.670000 0.000000
1 low 63.333333 63.333333
2 medium 0.000000 20.000000
ACT_TIME_AERATEUR_1_F5
0 55.940000
1 87.323333
2 23.990000