pandas 自定义聚合函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/56720571/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:23:50  来源:igfitidea点击:

pandas custom aggregation function

pythonpandasaggregatepandas-groupby

提问by hmmmbob

I have a pandas dataframe, which the following command works on:

我有一个 Pandas 数据框,以下命令适用于该数据框:

house.groupby(['place_name'])['index_nsa'].agg(['first','last'])

It gives me what I want. Now I want to make a custom aggregation value that gives me the percentage change between the first and the last value.

它给了我我想要的。现在我想创建一个自定义聚合值,它为我提供第一个值和最后一个值之间的百分比变化。

I got an error for doing math on the values, so I assumed that I have to turn them into numbers.

我对这些值进行数学运算时出错,所以我认为我必须将它们转换为数字。

house.groupby(['place_name'])['index_nsa'].agg({"change in %":[(int('last')-int('first')/int('first')]})

Unfortunately, I only get a syntax error on the last bracket, which I cannot seem to find the error.

不幸的是,我只在最后一个括号上出现语法错误,我似乎找不到错误。

Does someone see where I went wrong ?

有人看到我哪里出错了吗?

采纳答案by cs95

You will need to define and pass a callback to agghere. You can do that in-line with a lambda function:

您需要定义回调并将其传递到agg此处。您可以使用 lambda 函数内联执行此操作:

house.groupby(['place_name'])['index_nsa'].agg([
    ("change in %", lambda x: (x.iloc[-1] - x.iloc[0]) / x.iloc[0])])

Look closely at .aggcall—to allow renaming the output column, you must pass a list of tuples of the format [(new_name, agg_func), ...]. More info here.

仔细查看 call——.agg要允许重命名输出列,您必须传递格式为 的元组列表[(new_name, agg_func), ...]。更多信息在这里

If you want to avoid the lambda at the cost of some verbosity, you may use

如果你想以一些冗长的代价来避免 lambda,你可以使用

def first_last_pct(ser):
    first, last = ser.iloc[0], ser.iloc[-1]
    return (last - first) / first

house.groupby(['place_name'])['index_nsa'].agg([("change in %", first_last_pct)])