pandas 为通过 groupby 应用结果设置列名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29802034/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:14:58  来源:igfitidea点击:

Set column name for apply result over groupby

pythonpandas

提问by MrT

This is a fairly trivial problem, but its triggering my OCD and I haven't been able to find a suitable solution for the past half hour.

这是一个相当微不足道的问题,但它触发了我的强迫症,在过去的半小时里我一直找不到合适的解决方案。

For background, I'm looking to calculate a value (let's call it F) for each group in a DataFrame derived from differentaggregated measures of columns in the existing DataFrame.

作为背景,我希望为 DataFrame 中的每个组计算一个值(我们称之为 F),这些值源自现有 DataFrame 中列的不同聚合度量。

Here's a toy example of what I'm trying to do:

这是我正在尝试做的一个玩具示例:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['X', 'Y', 'X', 'Y', 'Y', 'Y', 'Y', 'X', 'Y', 'X'],
                'B': ['N', 'N', 'N', 'M', 'N', 'M', 'M', 'N', 'M', 'N'],
                'C': [69, 83, 28, 25, 11, 31, 14, 37, 14,  0],
                'D': [ 0.3,  0.1,  0.1,  0.8,  0.8,  0. ,  0.8,  0.8,  0.1,  0.8],
                'E': [11, 11, 12, 11, 11, 12, 12, 11, 12, 12]
                })

df_grp = df.groupby(['A','B'])
df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max())

What I'd like to do is assign a name to the result of apply(or lambda). Is there anyway to do this without moving lambdato a named function or renaming the column after running the last line?

我想做的是为apply(or lambda)的结果指定一个名称。无论如何,lambda在运行最后一行后,是否可以在不移动到命名函数或重命名列的情况下执行此操作?

回答by Alexander

Have the lambda function return a new Series:

让 lambda 函数返回一个新系列:

df_grp.apply(lambda x: pd.Series({'new_name':
                    x['C'].sum() * x['D'].mean() / x['E'].max()}))
# or df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max()).to_frame('new_name')

     new_name
A B          
X N  5.583333
Y M  2.975000
  N  3.845455

回答by Zero

You could convert your seriesto a dataframeusing reset_index()and provide name='yout_col_name'-- The name of the column corresponding to the Series values

您可以将您的转换seriesdataframeusingreset_index()并提供name='yout_col_name'-- 与系列值对应的列的名称

(df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max())
      .reset_index(name='your_col_name'))

   A  B  your_col_name
0  X  N   5.583333
1  Y  M   2.975000
2  Y  N   3.845455