Python 从 Pandas 聚合中重命名结果列(“FutureWarning:不推荐使用重命名的字典”)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44635626/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:17:07  来源:igfitidea点击:

Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")

pythonpandasaggregaterename

提问by Victor Mayrink

I'm trying to do some aggregations on a pandas data frame. Here is a sample code:

我正在尝试对 Pandas 数据框进行一些聚合。这是一个示例代码:

import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})

Out[1]: 
      Amount      
         Sum Count
User              
user1   18.0     2
user2   20.5     3
user3   10.5     1

Which generates the following warning:

这会产生以下警告:

FutureWarning: using a dict with renaming is deprecated and will be removed in a future version return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)

FutureWarning:不推荐使用重命名的 dict,并将在未来版本中删除 return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)

How can I avoid this?

我怎样才能避免这种情况?

回答by Ted Petrou

Use groupby applyand return a Series to rename columns

使用 groupbyapply并返回一个系列来重命名列

Use the groupby applymethod to perform an aggregation that

使用 groupbyapply方法执行聚合

  • Renames the columns
  • Allows for spaces in the names
  • Allows you to order the returned columns in any way you choose
  • Allows for interactions between columns
  • Returns a single level index and NOT a MultiIndex
  • 重命名列
  • 允许在名称中使用空格
  • 允许您以您选择的任何方式对返回的列进行排序
  • 允许列之间的交互
  • 返回单级索引而不是 MultiIndex

To do this:

去做这个:

  • create a custom function that you pass to apply
  • This custom function is passed each group as a DataFrame
  • Return a Series
  • The index of the Series will be the new columns
  • 创建您传递给的自定义函数 apply
  • 此自定义函数作为 DataFrame 传递给每个组
  • 返回一个系列
  • 系列的索引将是新列

Create fake data

创建虚假数据

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
                  'Score': [9, 1, 8, 7, 7, 6, 9]})

enter image description here

在此处输入图片说明

create custom function that returns a Series
The variable xinside of my_aggis a DataFrame

创建自定义函数,返回一个 Series里面的
变量是一个 DataFramexmy_agg

def my_agg(x):
    names = {
        'Amount mean': x['Amount'].mean(),
        'Amount std':  x['Amount'].std(),
        'Amount range': x['Amount'].max() - x['Amount'].min(),
        'Score Max':  x['Score'].max(),
        'Score Sum': x['Score'].sum(),
        'Amount Score Sum': (x['Amount'] * x['Score']).sum()}

    return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
                                   'Score Sum', 'Score Max', 'Amount Score Sum'])

Pass this custom function to the groupby applymethod

将此自定义函数传递给 groupbyapply方法

df.groupby('User').apply(my_agg)

enter image description here

在此处输入图片说明

The big downside is that this function will be much slower than aggfor the cythonized aggregations

最大的缺点是,这个功能会比慢得多aggcythonized聚合

Using a dictionary with groupby aggmethod

使用带有 groupbyagg方法的字典

Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussionon how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.

由于其复杂性和有些模棱两可的性质,使用字典已被删除。有一个正在进行的讨论,关于如何提高在GitHub上,今后这个功能在这里,你可以直接在GROUPBY调用后访问聚集列。只需传递您希望应用的所有聚合函数的列表。

df.groupby('User')['Amount'].agg(['sum', 'count'])

Output

输出

       sum  count
User              
user1  18.0      2
user2  20.5      3
user3  10.5      1

It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.

仍然可以使用字典来明确表示不同列的不同聚合,就像这里有另一个名为 的数字列Other

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
              "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
              'Other': [1,2,3,4,5,6]})

df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})

Output

输出

      Amount       Other          
         sum count   max       std
User                              
user1   18.0     2     6  3.535534
user2   20.5     3     5  1.527525
user3   10.5     1     4       NaN

回答by Jacob Stevenson

If you replace the internal dictionary with a list of tuples it gets rid of the warning message

如果用元组列表替换内部字典,它会消除警告消息

import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": [("Sum", "sum"), ("Count", "count")]})

回答by Scott Boston

Update for Pandas 0.25+ Aggregation relabeling

Pandas 0.25+聚合重新标记更新

import pandas as pd

print(pd.__version__)
#0.25.0

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby("User")['Amount'].agg(Sum='sum', Count='count')

Output:

输出:

        Sum  Count
User              
user1  18.0      2
user2  20.5      3
user3  10.5      1

回答by JodeCharger100

This is what I did:

这就是我所做的:

Create a fake dataset:

创建一个假数据集:

import pandas as pd
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
                  'Score': [9, 1, 8, 7, 7, 6, 9]})
df

O/P:

开/关:

    Amount  Score   User
0   10.0    9   user1
1   5.0 1   user2
2   8.0 8   user2
3   10.5    7   user3
4   7.5 7   user2
5   8.0 6   user1
6   9.0 9   user3

I first made the User the index, and then a groupby:

我首先将 User 作为索引,然后是 groupby:

ans = df.set_index('User').groupby(level=0)['Amount'].agg([('Sum','sum'),('Count','count')])
ans

Solution:

解决方案:

    Sum Count
User        
user1   18.0    2
user2   20.5    3
user3   19.5    2

回答by plankthom

Replace the inner dictionaries with a list of correctly named functions.

用正确命名的函数列表替换内部字典。

To rename the function I'm using this utility function:

要重命名我正在使用此实用程序函数的函数:

def aliased_aggr(aggr, name):
    if isinstance(aggr,str):
        def f(data):
            return data.agg(aggr)
    else:
        def f(data):
            return aggr(data)
    f.__name__ = name
    return f

The group-by statement then becomes:

group-by 语句变为:


df.groupby(["User"]).agg({"Amount": [ 
    aliased_aggr("sum","Sum"),
    aliased_aggr("count","Count")
]

If you have bigger, reusable aggregation specs, you can convert them with

如果你有更大的、可重用的聚合规范,你可以用

def convert_aggr_spec(aggr_spec):
    return {
        col : [ 
            aliased_aggr(aggr,alias) for alias, aggr in aggr_map.items() 
        ]  
        for col, aggr_map in aggr_spec.items() 
    }

So you can say

所以你可以说

df.groupby(["User"]).agg(convert_aggr_spec({"Amount": {"Sum": "sum", "Count": "count"}}))

See also https://github.com/pandas-dev/pandas/issues/18366#issuecomment-476597674

另见https://github.com/pandas-dev/pandas/issues/18366#issuecomment-476597674