pandas 熊猫DF中的重复行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25619297/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Duplicate rows in pandas DF
提问by Guforu
I have a DF in Pandas, which looks like:
我在 Pandas 中有一个 DF,它看起来像:
Letters Numbers
A 1
A 3
A 2
A 1
B 1
B 2
B 3
C 2
C 2
I'm looking to count the number of similar rows and save the result in a third column. For example, the output I'm looking for:
我希望计算相似行的数量并将结果保存在第三列中。例如,我正在寻找的输出:
Letters Numbers Events
A 1 2
A 2 1
A 3 1
B 1 1
B 2 1
B 3 1
C 2 2
An example of what I'm looking to do is here. The best idea I've come up with is to use count_values(), but I think this is just for one column. Another idea is to use duplicated(), anyway I don't want construct any for-loop. I'm pretty sure, that a Pythonic alternative to a for loop exists.
我想要做的一个例子是here。我想出的最好的主意是使用count_values(),但我认为这仅适用于一列。另一个想法是使用duplicated(),无论如何我不想构造任何for-loop。我很确定,存在 for 循环的 Pythonic 替代方案。
回答by joris
You can groupby these two columns and then calculate the sizes of the groups:
您可以对这两列进行分组,然后计算组的大小:
In [16]: df.groupby(['Letters', 'Numbers']).size()
Out[16]:
Letters Numbers
A 1 2
2 1
3 1
B 1 1
2 1
3 1
C 2 2
dtype: int64
To get a DataFrame like in your example output, you can reset the index with reset_index.
要获得示例输出中的 DataFrame,您可以使用reset_index.
回答by EdChum
You can use a combination of groupby, transformand then drop_duplicates
您可以组合使用groupby,transform然后drop_duplicates
In [84]:
df['Events'] = df.groupby('Letters')['Numbers'].transform(pd.Series.value_counts)
df.drop_duplicates()
Out[84]:
Letters Numbers Events
0 A 1 2
1 A 3 1
2 A 2 1
4 B 1 1
5 B 2 1
6 B 3 1
7 C 2 2

