Python 在 Pandas 和 numpy 中聚合 lambda 函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30718231/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:52:02  来源:igfitidea点击:

Aggregating lambda functions in pandas and numpy

pythonnumpypandaslambda

提问by user2524994

I have an aggregation statement below:

我在下面有一个聚合语句:

data = data.groupby(['type', 'status', 'name']).agg({'one' : np.mean, 'two' : lambda value: 100* ((value>32).sum() / reading.mean()), 'test2': lambda value: 100* ((value > 45).sum() / value.mean())})

I continue to get key errors. I have been able to make it work for one lambda function but not two.

我继续收到关键错误。我已经能够使它适用于一个 lambda 函数,但不能适用于两个。

采纳答案by unutbu

You need to specify the column in datawhose values are to be aggregated. For example,

您需要指定data要聚合其值的列。例如,

data = data.groupby(['type', 'status', 'name'])['value'].agg(...)

instead of

代替

data = data.groupby(['type', 'status', 'name']).agg(...)

If you don't mention the column (e.g. 'value'), then the keys in dict passed to aggare taken to be the column names. The KeyErrors are Pandas' way of telling you that it can't find columns named one, twoor test2in the DataFrame data.

如果您不提及列(例如'value'),则传递给的 dict 中的键将agg被视为列名。该KeyErrors为告诉你它找不到列命名的大熊猫的方式onetwotest2在数据帧data

Note: Passing a dict to groupby/agghas been deprecated. Instead, going forward you should pass a list-of-tuples instead. Each tuple is expected to be of the form ('new_column_name', callable).

注意:groupby/agg不推荐将 dict 传递给。相反,你应该传递一个元组列表。每个元组都应该是('new_column_name', callable).



Here is runnable example:

这是可运行的示例:

import numpy as np
import pandas as pd

N = 100
data = pd.DataFrame({
    'type': np.random.randint(10, size=N),
    'status': np.random.randint(10, size=N),
    'name': np.random.randint(10, size=N),
    'value': np.random.randint(10, size=N),
})

reading = np.random.random(10,)

data = data.groupby(['type', 'status', 'name'])['value'].agg(
    [('one',  np.mean), 
    ('two', lambda value: 100* ((value>32).sum() / reading.mean())), 
    ('test2', lambda value: 100* ((value > 45).sum() / value.mean()))])
print(data)
#                   one  two  test2
# type status name                 
# 0    1      3     3.0    0    0.0
#             7     4.0    0    0.0
#             9     8.0    0    0.0
#      3      1     5.0    0    0.0
#             6     3.0    0    0.0
# ...


If this does not match your situation, then please provide runnable code that does.

如果这与您的情况不符,请提供可运行的代码。