Python 从 Pandas 聚合中重命名结果列（“FutureWarning：不推荐使用重命名的字典”）

Question

提问by Victor Mayrink

I'm trying to do some aggregations on a pandas data frame. Here is a sample code:

我正在尝试对 Pandas 数据框进行一些聚合。这是一个示例代码：

import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})

Out[1]: 
      Amount      
         Sum Count
User              
user1   18.0     2
user2   20.5     3
user3   10.5     1

Which generates the following warning:

这会产生以下警告：

FutureWarning: using a dict with renaming is deprecated and will be removed in a future version return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)

FutureWarning：不推荐使用重命名的 dict，并将在未来版本中删除 return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)

How can I avoid this?

我怎样才能避免这种情况？

Answer 1

回答by Ted Petrou

Use groupby `apply`and return a Series to rename columns

使用 groupby`apply`并返回一个系列来重命名列

Use the groupby applymethod to perform an aggregation that

使用 groupbyapply方法执行聚合

Renames the columns
Allows for spaces in the names
Allows you to order the returned columns in any way you choose
Allows for interactions between columns
Returns a single level index and NOT a MultiIndex

重命名列
允许在名称中使用空格
允许您以您选择的任何方式对返回的列进行排序
允许列之间的交互
返回单级索引而不是 MultiIndex

To do this:

去做这个：

create a custom function that you pass to apply
This custom function is passed each group as a DataFrame
Return a Series
The index of the Series will be the new columns

创建您传递给的自定义函数 apply
此自定义函数作为 DataFrame 传递给每个组
返回一个系列
系列的索引将是新列

Create fake data

创建虚假数据

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
                  'Score': [9, 1, 8, 7, 7, 6, 9]})

create custom function that returns a Series
The variable xinside of my_aggis a DataFrame

创建自定义函数，返回一个 Series里面的
变量是一个 DataFramexmy_agg

def my_agg(x):
    names = {
        'Amount mean': x['Amount'].mean(),
        'Amount std':  x['Amount'].std(),
        'Amount range': x['Amount'].max() - x['Amount'].min(),
        'Score Max':  x['Score'].max(),
        'Score Sum': x['Score'].sum(),
        'Amount Score Sum': (x['Amount'] * x['Score']).sum()}

    return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
                                   'Score Sum', 'Score Max', 'Amount Score Sum'])

Pass this custom function to the groupby applymethod

将此自定义函数传递给 groupbyapply方法

df.groupby('User').apply(my_agg)

The big downside is that this function will be much slower than aggfor the cythonized aggregations

最大的缺点是，这个功能会比慢得多agg的cythonized聚合

Using a dictionary with groupby `agg`method

使用带有 groupby`agg`方法的字典

Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussionon how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.

由于其复杂性和有些模棱两可的性质，使用字典已被删除。有一个正在进行的讨论，关于如何提高在GitHub上，今后这个功能在这里，你可以直接在GROUPBY调用后访问聚集列。只需传递您希望应用的所有聚合函数的列表。

df.groupby('User')['Amount'].agg(['sum', 'count'])

Output

输出

       sum  count
User              
user1  18.0      2
user2  20.5      3
user3  10.5      1

It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.

仍然可以使用字典来明确表示不同列的不同聚合，就像这里有另一个名为的数字列Other。

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
              "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
              'Other': [1,2,3,4,5,6]})

df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})

Output

输出

      Amount       Other          
         sum count   max       std
User                              
user1   18.0     2     6  3.535534
user2   20.5     3     5  1.527525
user3   10.5     1     4       NaN

Answer 2

回答by Jacob Stevenson

If you replace the internal dictionary with a list of tuples it gets rid of the warning message

如果用元组列表替换内部字典，它会消除警告消息

import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": [("Sum", "sum"), ("Count", "count")]})

Answer 3

回答by Scott Boston

Update for Pandas 0.25+ Aggregation relabeling

Pandas 0.25+聚合重新标记更新

import pandas as pd

print(pd.__version__)
#0.25.0

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby("User")['Amount'].agg(Sum='sum', Count='count')

Output:

输出：

        Sum  Count
User              
user1  18.0      2
user2  20.5      3
user3  10.5      1

Answer 4

回答by JodeCharger100

This is what I did:

这就是我所做的：

Create a fake dataset:

创建一个假数据集：

import pandas as pd
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
                  "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
                  'Score': [9, 1, 8, 7, 7, 6, 9]})
df

O/P:

开/关：

    Amount  Score   User
0   10.0    9   user1
1   5.0 1   user2
2   8.0 8   user2
3   10.5    7   user3
4   7.5 7   user2
5   8.0 6   user1
6   9.0 9   user3

I first made the User the index, and then a groupby:

我首先将 User 作为索引，然后是 groupby：

ans = df.set_index('User').groupby(level=0)['Amount'].agg([('Sum','sum'),('Count','count')])
ans

Solution:

解决方案：

    Sum Count
User        
user1   18.0    2
user2   20.5    3
user3   19.5    2

Answer 5

回答by plankthom

Replace the inner dictionaries with a list of correctly named functions.

用正确命名的函数列表替换内部字典。

To rename the function I'm using this utility function:

要重命名我正在使用此实用程序函数的函数：

def aliased_aggr(aggr, name):
    if isinstance(aggr,str):
        def f(data):
            return data.agg(aggr)
    else:
        def f(data):
            return aggr(data)
    f.__name__ = name
    return f

The group-by statement then becomes:

group-by 语句变为：


df.groupby(["User"]).agg({"Amount": [ 
    aliased_aggr("sum","Sum"),
    aliased_aggr("count","Count")
]

If you have bigger, reusable aggregation specs, you can convert them with

如果你有更大的、可重用的聚合规范，你可以用

def convert_aggr_spec(aggr_spec):
    return {
        col : [ 
            aliased_aggr(aggr,alias) for alias, aggr in aggr_map.items() 
        ]  
        for col, aggr_map in aggr_spec.items() 
    }

So you can say

所以你可以说

df.groupby(["User"]).agg(convert_aggr_spec({"Amount": {"Sum": "sum", "Count": "count"}}))

See also https://github.com/pandas-dev/pandas/issues/18366#issuecomment-476597674

另见https://github.com/pandas-dev/pandas/issues/18366#issuecomment-476597674

Python 从 Pandas 聚合中重命名结果列（“FutureWarning：不推荐使用重命名的字典”）

提问by Victor Mayrink

回答by Ted Petrou

Use groupby `apply`and return a Series to rename columns

使用 groupby`apply`并返回一个系列来重命名列

Using a dictionary with groupby `agg`method

使用带有 groupby`agg`方法的字典

回答by Jacob Stevenson

回答by Scott Boston

Update for Pandas 0.25+ Aggregation relabeling

Pandas 0.25+聚合重新标记更新

回答by JodeCharger100

回答by plankthom

相关推荐

最近更新

标签

Python 从 Pandas 聚合中重命名结果列（“FutureWarning：不推荐使用重命名的字典”）

提问by Victor Mayrink

回答by Ted Petrou

Use groupby applyand return a Series to rename columns

使用 groupbyapply并返回一个系列来重命名列

Using a dictionary with groupby aggmethod

使用带有 groupbyagg方法的字典

回答by Jacob Stevenson

回答by Scott Boston

Update for Pandas 0.25+ Aggregation relabeling

Pandas 0.25+聚合重新标记更新

回答by JodeCharger100

回答by plankthom

相关推荐

如何在 Ubuntu 16.04 上完全卸载 python 2.7.13

Python OpenCV 图像到字节字符串以进行 json 传输

绘制 95% 置信区间误差条 python pandas dataframes

如何从 Python 访问 AWS Lambda 环境变量

相关推荐

最近更新

标签

Use groupby `apply`and return a Series to rename columns

使用 groupby`apply`并返回一个系列来重命名列

Using a dictionary with groupby `agg`method

使用带有 groupby`agg`方法的字典