Python 从 Pandas 聚合中重命名结果列(“FutureWarning:不推荐使用重命名的字典”)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44635626/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")
提问by Victor Mayrink
I'm trying to do some aggregations on a pandas data frame. Here is a sample code:
我正在尝试对 Pandas 数据框进行一些聚合。这是一个示例代码:
import pandas as pd
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})
df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})
Out[1]:
Amount
Sum Count
User
user1 18.0 2
user2 20.5 3
user3 10.5 1
Which generates the following warning:
这会产生以下警告:
FutureWarning: using a dict with renaming is deprecated and will be removed in a future version return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
FutureWarning:不推荐使用重命名的 dict,并将在未来版本中删除 return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
How can I avoid this?
我怎样才能避免这种情况?
回答by Ted Petrou
Use groupby apply
and return a Series to rename columns
使用 groupbyapply
并返回一个系列来重命名列
Use the groupby apply
method to perform an aggregation that
使用 groupbyapply
方法执行聚合
- Renames the columns
- Allows for spaces in the names
- Allows you to order the returned columns in any way you choose
- Allows for interactions between columns
- Returns a single level index and NOT a MultiIndex
- 重命名列
- 允许在名称中使用空格
- 允许您以您选择的任何方式对返回的列进行排序
- 允许列之间的交互
- 返回单级索引而不是 MultiIndex
To do this:
去做这个:
- create a custom function that you pass to
apply
- This custom function is passed each group as a DataFrame
- Return a Series
- The index of the Series will be the new columns
- 创建您传递给的自定义函数
apply
- 此自定义函数作为 DataFrame 传递给每个组
- 返回一个系列
- 系列的索引将是新列
Create fake data
创建虚假数据
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
'Score': [9, 1, 8, 7, 7, 6, 9]})
create custom function that returns a Series
The variable x
inside of my_agg
is a DataFrame
创建自定义函数,返回一个 Series里面的
变量是一个 DataFramex
my_agg
def my_agg(x):
names = {
'Amount mean': x['Amount'].mean(),
'Amount std': x['Amount'].std(),
'Amount range': x['Amount'].max() - x['Amount'].min(),
'Score Max': x['Score'].max(),
'Score Sum': x['Score'].sum(),
'Amount Score Sum': (x['Amount'] * x['Score']).sum()}
return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
'Score Sum', 'Score Max', 'Amount Score Sum'])
Pass this custom function to the groupby apply
method
将此自定义函数传递给 groupbyapply
方法
df.groupby('User').apply(my_agg)
The big downside is that this function will be much slower than agg
for the cythonized aggregations
最大的缺点是,这个功能会比慢得多agg
的cythonized聚合
Using a dictionary with groupby agg
method
使用带有 groupbyagg
方法的字典
Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussionon how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.
由于其复杂性和有些模棱两可的性质,使用字典已被删除。有一个正在进行的讨论,关于如何提高在GitHub上,今后这个功能在这里,你可以直接在GROUPBY调用后访问聚集列。只需传递您希望应用的所有聚合函数的列表。
df.groupby('User')['Amount'].agg(['sum', 'count'])
Output
输出
sum count
User
user1 18.0 2
user2 20.5 3
user3 10.5 1
It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other
.
仍然可以使用字典来明确表示不同列的不同聚合,就像这里有另一个名为 的数字列Other
。
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
'Other': [1,2,3,4,5,6]})
df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})
Output
输出
Amount Other
sum count max std
User
user1 18.0 2 6 3.535534
user2 20.5 3 5 1.527525
user3 10.5 1 4 NaN
回答by Jacob Stevenson
If you replace the internal dictionary with a list of tuples it gets rid of the warning message
如果用元组列表替换内部字典,它会消除警告消息
import pandas as pd
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})
df.groupby(["User"]).agg({"Amount": [("Sum", "sum"), ("Count", "count")]})
回答by Scott Boston
Update for Pandas 0.25+ Aggregation relabeling
Pandas 0.25+聚合重新标记更新
import pandas as pd
print(pd.__version__)
#0.25.0
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})
df.groupby("User")['Amount'].agg(Sum='sum', Count='count')
Output:
输出:
Sum Count
User
user1 18.0 2
user2 20.5 3
user3 10.5 1
回答by JodeCharger100
This is what I did:
这就是我所做的:
Create a fake dataset:
创建一个假数据集:
import pandas as pd
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
'Score': [9, 1, 8, 7, 7, 6, 9]})
df
O/P:
开/关:
Amount Score User
0 10.0 9 user1
1 5.0 1 user2
2 8.0 8 user2
3 10.5 7 user3
4 7.5 7 user2
5 8.0 6 user1
6 9.0 9 user3
I first made the User the index, and then a groupby:
我首先将 User 作为索引,然后是 groupby:
ans = df.set_index('User').groupby(level=0)['Amount'].agg([('Sum','sum'),('Count','count')])
ans
Solution:
解决方案:
Sum Count
User
user1 18.0 2
user2 20.5 3
user3 19.5 2
回答by plankthom
Replace the inner dictionaries with a list of correctly named functions.
用正确命名的函数列表替换内部字典。
To rename the function I'm using this utility function:
要重命名我正在使用此实用程序函数的函数:
def aliased_aggr(aggr, name):
if isinstance(aggr,str):
def f(data):
return data.agg(aggr)
else:
def f(data):
return aggr(data)
f.__name__ = name
return f
The group-by statement then becomes:
group-by 语句变为:
df.groupby(["User"]).agg({"Amount": [
aliased_aggr("sum","Sum"),
aliased_aggr("count","Count")
]
If you have bigger, reusable aggregation specs, you can convert them with
如果你有更大的、可重用的聚合规范,你可以用
def convert_aggr_spec(aggr_spec):
return {
col : [
aliased_aggr(aggr,alias) for alias, aggr in aggr_map.items()
]
for col, aggr_map in aggr_spec.items()
}
So you can say
所以你可以说
df.groupby(["User"]).agg(convert_aggr_spec({"Amount": {"Sum": "sum", "Count": "count"}}))
See also https://github.com/pandas-dev/pandas/issues/18366#issuecomment-476597674
另见https://github.com/pandas-dev/pandas/issues/18366#issuecomment-476597674