Python 在 Pandas 和 numpy 中聚合 lambda 函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30718231/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Aggregating lambda functions in pandas and numpy
提问by user2524994
I have an aggregation statement below:
我在下面有一个聚合语句:
data = data.groupby(['type', 'status', 'name']).agg({'one' : np.mean, 'two' : lambda value: 100* ((value>32).sum() / reading.mean()), 'test2': lambda value: 100* ((value > 45).sum() / value.mean())})
I continue to get key errors. I have been able to make it work for one lambda function but not two.
我继续收到关键错误。我已经能够使它适用于一个 lambda 函数,但不能适用于两个。
采纳答案by unutbu
You need to specify the column in data
whose values are to be aggregated.
For example,
您需要指定data
要聚合其值的列。例如,
data = data.groupby(['type', 'status', 'name'])['value'].agg(...)
instead of
代替
data = data.groupby(['type', 'status', 'name']).agg(...)
If you don't mention the column (e.g. 'value'
), then the keys in dict passed to agg
are taken to be the column names. The KeyError
s are Pandas' way of telling you that it can't find columns named one
, two
or test2
in the DataFrame data
.
如果您不提及列(例如'value'
),则传递给的 dict 中的键将agg
被视为列名。该KeyError
s为告诉你它找不到列命名的大熊猫的方式one
,two
或test2
在数据帧data
。
Note: Passing a dict to groupby/agg
has been deprecated. Instead, going forward you should pass a list-of-tuples instead. Each tuple is expected to be of the form ('new_column_name', callable)
.
注意:groupby/agg
不推荐将 dict 传递给。相反,你应该传递一个元组列表。每个元组都应该是('new_column_name', callable)
.
Here is runnable example:
这是可运行的示例:
import numpy as np
import pandas as pd
N = 100
data = pd.DataFrame({
'type': np.random.randint(10, size=N),
'status': np.random.randint(10, size=N),
'name': np.random.randint(10, size=N),
'value': np.random.randint(10, size=N),
})
reading = np.random.random(10,)
data = data.groupby(['type', 'status', 'name'])['value'].agg(
[('one', np.mean),
('two', lambda value: 100* ((value>32).sum() / reading.mean())),
('test2', lambda value: 100* ((value > 45).sum() / value.mean()))])
print(data)
# one two test2
# type status name
# 0 1 3 3.0 0 0.0
# 7 4.0 0 0.0
# 9 8.0 0 0.0
# 3 1 5.0 0 0.0
# 6 3.0 0 0.0
# ...
If this does not match your situation, then please provide runnable code that does.
如果这与您的情况不符,请提供可运行的代码。