pandas 将函数应用于熊猫数据框中的组

Question

提问by Justin

I'm trying to apply simple functions to groups in pandas. I have this dataframe which I can group by type:

我正在尝试将简单的函数应用于 Pandas 中的组。我有这个数据框，我可以按type以下方式分组：

df = pandas.DataFrame({"id": ["a", "b", "c", "d"], "v": [1,2,3,4], "type": ["X", "Y", "Y", "Y"]}).set_index("id")
df.groupby("type").mean()  # gets the mean per type

I want to apply a function like np.log2only to the groups before taking the mean of each group. This does not work since applyis element wise and type(as well as potentially other columns in dfin a real scenario) is not numeric:

我想np.log2在取每个组的平均值之前只对组应用一个函数。这不起作用，因为它apply是元素明智的，并且type（以及df实际场景中可能的其他列）不是数字：

# fails
df.apply(np.log2).groupby("type").mean()

is there a way to apply np.log2only to the groups prior to taking the mean? I thought transformwould be the answer but the problem is that it returns a dataframe that does not have the original typecolumns:

有没有办法np.log2在取平均值之前仅适用于组？我认为transform这将是答案，但问题是它返回一个没有原始type列的数据框：

df.groupby("type").transform(np.log2)
           v
id          
a   0.000000
b   1.000000
c   1.584963
d   2.000000

Variants like grouping and then applying do not work: df.groupby("type").apply(np.log2). What is the correct way to do this?

像分组然后应用这样的变体不起作用：df.groupby("type").apply(np.log2). 这样做的正确方法是什么？

Answer 1

回答by Justin

The problem is that np.log2cannot deal with the first column. Instead, you need to pass just your numeric column. You can do this as suggested in the comments, or define a lambda:

问题是np.log2无法处理第一列。相反，您只需要传递数字列。您可以按照评论中的建议执行此操作，或定义一个lambda：

df.groupby('type').apply(lambda x: np.mean(np.log2(x['v'])))

As per comments, I would define a function:

根据评论，我会定义一个函数：

df['w'] = [5, 6, 7,8]

def foo(x):
     return x._get_numeric_data().apply(axis=0, func=np.log2).mean()

df.groupby('type').apply(foo)

#              v         w
# type                    
# X     0.000000  2.321928
# Y     1.528321  2.797439

pandas 将函数应用于熊猫数据框中的组

提问by Justin

回答by Justin

相关推荐

最近更新

标签

pandas 将函数应用于熊猫数据框中的组

提问by Justin

回答by Justin

相关推荐

在 HDF5 中存储 Pandas 对象和常规 Python 对象

在 Pandas 中连接列作为索引

计算不包含一些字符串 Pandas DataFrames 的行

pandas 如何一次性删除多列

相关推荐

最近更新

标签