pandas 如何在groupby 2列后保留DataFrame的原始索引?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49216357/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to keep original index of a DataFrame after groupby 2 columns?
提问by Hana
Is there any way I can retain the original index of my large dataframe after I perform a groupby? The reason I need to this is because I need to do an inner merge back to my original df (after my groupby) to regain those lost columns. And the index value is the only 'unique' column to perform the merge back into. Does anyone know how I can achieve this?
执行 groupby 后,有什么方法可以保留大数据帧的原始索引?我需要这样做的原因是因为我需要将内部合并回我的原始 df(在我的 groupby 之后)以重新获得那些丢失的列。并且索引值是执行合并回的唯一“唯一”列。有谁知道我如何实现这一目标?
My DataFrame is quite large. My groupby looks like this:
我的 DataFrame 非常大。我的 groupby 看起来像这样:
df.groupby(['col1', 'col2']).agg({'col3': 'count'}).reset_index()
This drops my original indexes from my original dataframe, which I want to keep.
这会从我想要保留的原始数据框中删除我的原始索引。
采纳答案by Scott Boston
I think you are are looking for transform in this situation:
我认为您正在寻找这种情况下的转换:
df['count'] = df.groupby(['col1', 'col2'])['col3'].transform('count')
回答by jpp
You can elevate your index to a column via reset_index
. Then aggregate your index to a tuple via agg
, together with your count
aggregation.
您可以通过 将索引提升到列reset_index
。然后通过 将您的索引agg
与您的count
聚合一起聚合到一个元组中。
Below is a minimal example.
下面是一个最小的例子。
import pandas as pd, numpy as np
df = pd.DataFrame(np.random.randint(0, 4, (50, 5)),
index=np.random.randint(0, 4, 50))
df = df.reset_index()
res = df.groupby([0, 1]).agg({2: 'count', 'index': lambda x: tuple(x)}).reset_index()
# 0 1 2 index
# 0 0 0 4 (2, 0, 0, 2)
# 1 0 1 4 (0, 3, 1, 1)
# 2 0 2 1 (1,)
# 3 0 3 1 (3,)
# 4 1 0 4 (1, 2, 1, 3)
# 5 1 1 2 (1, 3)
# 6 1 2 4 (2, 1, 2, 2)
# 7 1 3 1 (2,)
# 8 2 0 5 (0, 3, 0, 2, 2)
# 9 2 1 2 (0, 2)
# 10 2 2 5 (1, 1, 3, 3, 2)
# 11 2 3 2 (0, 1)
# 12 3 0 4 (0, 3, 3, 3)
# 13 3 1 4 (1, 3, 0, 1)
# 14 3 2 3 (3, 2, 1)
# 15 3 3 4 (3, 3, 2, 1)
回答by manoj
You should not use 'reset_index()' if you want to keep your original indexes
如果您想保留原始索引,则不应使用“reset_index()”