pandas 熊猫计算唯一行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36018851/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Counting Unique Rows
提问by qwertylpc
I have a pandas data frame similar to:
我有一个类似于以下内容的Pandas数据框:
ColA ColB
1 1
1 1
1 1
1 2
1 2
2 1
3 2
I want an output that has the same function as Counter. I need to know how many time each row appears (with all of the columns being the same.
我想要一个与Counter具有相同功能的输出。我需要知道每行出现多少次(所有列都相同。
In this case the proper output would be:
在这种情况下,正确的输出将是:
ColA ColB Count
1 1 3
1 2 2
2 1 1
3 2 1
I have tried something of the sort:
我尝试过这样的事情:
df.groupby(['ColA','ColB']).ColA.count()
but this gives me some ugly output I am having trouble formatting
但这给了我一些丑陋的输出我在格式化时遇到问题
回答by jezrael
You can use size
with reset_index
:
你可以用size
与reset_index
:
print df.groupby(['ColA','ColB']).size().reset_index(name='Count')
ColA ColB Count
0 1 1 3
1 1 2 2
2 2 1 1
3 3 2 1
回答by eddygeek
I only needed to count the unique rowsand have used this alternative:
我只需要计算唯一行并使用了这个替代方案:
len(df[['ColA','ColB']].drop_duplicates())
For this task, on my data, it was twice faster than len(df.groupby(['ColA','ColB']))
like in the above, more general solution.
对于这个任务,在我的数据上,它比len(df.groupby(['ColA','ColB']))
上面更通用的解决方案快两倍。