pandas 熊猫在执行 groupby 后重置索引并保留选择性列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52330016/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas reset index after performing groupby and retain selective columns
提问by Alhpa Delta
I want to take a pandas dataframe, do a count of unique elements by a column and retain 2 of the columns. But I get a multi-index dataframe after groupby which I am unable to (1) flatten (2) select only relevant columns. Here is my code:
我想获取一个 Pandas 数据框,按列计算唯一元素并保留 2 列。但是我在 groupby 之后得到了一个多索引数据框,我无法 (1) 展平 (2) 只选择相关列。这是我的代码:
import pandas as pd
df = pd.DataFrame({
'ID':[1,2,3,4,5,1],
'Ticker':['AA','BB','CC','DD','CC','BB'],
'Amount':[10,20,30,40,50,60],
'Date_1':['1/12/2018','1/14/2018','1/12/2018','1/14/2018','2/1/2018','1/12/2018'],
'Random_data':['ax','','nan','','by','cz'],
'Count':[23,1,4,56,34,53]
})
df2 = df.groupby(['Ticker']).agg(['nunique'])
df2.reset_index()
print(df2)
df2 still comes out with two levels of index. And has all the columns: Amount, Count, Date_1, ID, Random_data.
df2 仍然带有两个级别的索引。并具有所有列:Amount、Count、Date_1、ID、Random_data。
How do I reduce it to one level of index?
如何将其减少到一级索引?
And retain only ID and Random_data columns?
并且只保留 ID 和 Random_data 列?
回答by Chris A
Try this instead:
试试这个:
1) Select only the relevant columns (['ID', 'Random_data']
)
1) 仅选择相关列 ( ['ID', 'Random_data']
)
2) Don't pass a list to .agg
- just 'nunique'
- the list is what is causing the multi index behaviour.
2)不要将列表传递给.agg
- 只是'nunique'
- 列表是导致多索引行为的原因。
df2 = df.groupby(['Ticker'])['ID', 'Random_data'].agg('nunique')
df2.reset_index()
Ticker ID Random_data
0 AA 1 1
1 BB 2 2
2 CC 2 2
3 DD 1 1
回答by jezrael
Use SeriesGroupBy.nunique
and filter columns in list after groupby
:
SeriesGroupBy.nunique
在以下之后使用和过滤列表中的列groupby
:
df2 = df.groupby('Ticker')['Date_1','Count','ID'].nunique().reset_index()
print(df2)
Ticker Date_1 Count ID
0 AA 1 1 1
1 BB 2 2 2
2 CC 2 2 2
3 DD 1 1 1