pandas 将函数应用于熊猫数据框中的组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18137341/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:04:32  来源:igfitidea点击:

applying functions to groups in pandas dataframe

pythonnumpypandasdataframe

提问by Justin

I'm trying to apply simple functions to groups in pandas. I have this dataframe which I can group by type:

我正在尝试将简单的函数应用于 Pandas 中的组。我有这个数据框,我可以按type以下方式分组:

df = pandas.DataFrame({"id": ["a", "b", "c", "d"], "v": [1,2,3,4], "type": ["X", "Y", "Y", "Y"]}).set_index("id")
df.groupby("type").mean()  # gets the mean per type

I want to apply a function like np.log2only to the groups before taking the mean of each group. This does not work since applyis element wise and type(as well as potentially other columns in dfin a real scenario) is not numeric:

我想np.log2在取每个组的平均值之前只对组应用一个函数。这不起作用,因为它apply是元素明智的,并且type(以及df实际场景中可能的其他列)不是数字:

# fails
df.apply(np.log2).groupby("type").mean()

is there a way to apply np.log2only to the groups prior to taking the mean? I thought transformwould be the answer but the problem is that it returns a dataframe that does not have the original typecolumns:

有没有办法np.log2在取平均值之前仅适用于组?我认为transform这将是答案,但问题是它返回一个没有原始type列的数据框:

df.groupby("type").transform(np.log2)
           v
id          
a   0.000000
b   1.000000
c   1.584963
d   2.000000

Variants like grouping and then applying do not work: df.groupby("type").apply(np.log2). What is the correct way to do this?

像分组然后应用这样的变体不起作用:df.groupby("type").apply(np.log2). 这样做的正确方法是什么?

回答by Justin

The problem is that np.log2cannot deal with the first column. Instead, you need to pass just your numeric column. You can do this as suggested in the comments, or define a lambda:

问题是np.log2无法处理第一列。相反,您只需要传递数字列。您可以按照评论中的建议执行此操作,或定义一个lambda

df.groupby('type').apply(lambda x: np.mean(np.log2(x['v'])))


As per comments, I would define a function:

根据评论,我会定义一个函数:

df['w'] = [5, 6, 7,8]

def foo(x):
     return x._get_numeric_data().apply(axis=0, func=np.log2).mean()

df.groupby('type').apply(foo)

#              v         w
# type                    
# X     0.000000  2.321928
# Y     1.528321  2.797439