pandas 自定义聚合函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/56720571/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas custom aggregation function
提问by hmmmbob
I have a pandas dataframe, which the following command works on:
我有一个 Pandas 数据框,以下命令适用于该数据框:
house.groupby(['place_name'])['index_nsa'].agg(['first','last'])
It gives me what I want. Now I want to make a custom aggregation value that gives me the percentage change between the first and the last value.
它给了我我想要的。现在我想创建一个自定义聚合值,它为我提供第一个值和最后一个值之间的百分比变化。
I got an error for doing math on the values, so I assumed that I have to turn them into numbers.
我对这些值进行数学运算时出错,所以我认为我必须将它们转换为数字。
house.groupby(['place_name'])['index_nsa'].agg({"change in %":[(int('last')-int('first')/int('first')]})
Unfortunately, I only get a syntax error on the last bracket, which I cannot seem to find the error.
不幸的是,我只在最后一个括号上出现语法错误,我似乎找不到错误。
Does someone see where I went wrong ?
有人看到我哪里出错了吗?
采纳答案by cs95
You will need to define and pass a callback to agg
here. You can do that in-line with a lambda function:
您需要定义回调并将其传递到agg
此处。您可以使用 lambda 函数内联执行此操作:
house.groupby(['place_name'])['index_nsa'].agg([
("change in %", lambda x: (x.iloc[-1] - x.iloc[0]) / x.iloc[0])])
Look closely at .agg
call—to allow renaming the output column, you must pass a list of tuples of the format [(new_name, agg_func), ...]
. More info here.
仔细查看 call——.agg
要允许重命名输出列,您必须传递格式为 的元组列表[(new_name, agg_func), ...]
。更多信息在这里。
If you want to avoid the lambda at the cost of some verbosity, you may use
如果你想以一些冗长的代价来避免 lambda,你可以使用
def first_last_pct(ser):
first, last = ser.iloc[0], ser.iloc[-1]
return (last - first) / first
house.groupby(['place_name'])['index_nsa'].agg([("change in %", first_last_pct)])