pandas 无论如何要取消分组熊猫数据框中的数据?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45807794/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there anyway to ungroup data in a grouped-by pandas dataframe?
提问by Omido
I have a dataset that for simplicity I need to group by and aggregate based on one column so that I can remove some rows easily. Once I am done with the calculations, I need to reverse the group by actions so that I can see the dataframe easily in excel. If I do not inverse the action, I would export the whole list to excel which is not easy to analyse. Any help is gretaly appreciated.
我有一个数据集,为简单起见,我需要根据一列进行分组和聚合,以便我可以轻松删除一些行。完成计算后,我需要按操作反转组,以便我可以在 excel 中轻松查看数据框。如果我不反转操作,我会将整个列表导出到excel,这不容易分析。非常感谢任何帮助。
Example:
例子:
Col1 Col2 Col3
123 11 Yes
123 22 Yes
256 33 Yes
256 33 No
337 00 No
337 44 No
After applying groupby and aggregate:
应用 groupby 和聚合后:
X=dataset.groupby('Col1').agg(lambda x:set(x)).reset_index()
I get
我得到
Col1 Col2 Col3
123 {11,22} {Yes}
256 {33} {Yes, No}
337 {00,44} {No}
I then remove all the columns that contain Yes using drop
然后我使用 drop 删除所有包含 Yes 的列
X=X.reset_index(drop=True)
what I need to get before exporting to excel is
在导出到 excel 之前我需要得到的是
Col1 Col2 Col3
337 00 No
337 44 No
Hope this is clear enough
希望这足够清楚
Thaks in advance
提前谢谢
采纳答案by cs95
I don't believe converting to a set is a good idea. Here's an alternative: First sort in descending order by Col3
, then create a mapping of Col2 : Yes/No
and filter based on that.
我不相信转换为集合是一个好主意。这是一个替代方案:首先按降序排序 by Col3
,然后Col2 : Yes/No
根据它创建一个和 过滤器的映射。
In [1191]: df = df.sort_values('Col3', ascending=True)
In [1192]: mapping = dict(df[['Col2', 'Col3']].values)
In [1193]: df[df.Col2.replace(mapping) == 'No'] # or df.Col2.map(mapping)
Out[1193]:
Col1 Col2 Col3
4 337 0 No
5 337 44 No
回答by YOBEN_S
I am agree with COLDSPEED. You do not need convert to set
我同意 COLDSPEED 的观点。您不需要转换为设置
df['Temp']=df.Col3.eq('Yes')
DF=df.groupby('Col1')['Temp'].sum()
df[df.Col1==DF.index[DF==0].values[0]].drop('Temp',axis=1)
Out[113]:
Col1 Col2 Col3
4 337 0 No
5 337 44 No