pandas 熊猫计算唯一行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36018851/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:52:27  来源:igfitidea点击:

Pandas Counting Unique Rows

pythonpython-2.7pandascounter

提问by qwertylpc

I have a pandas data frame similar to:

我有一个类似于以下内容的Pandas数据框:

ColA ColB
1    1
1    1
1    1
1    2
1    2
2    1
3    2

I want an output that has the same function as Counter. I need to know how many time each row appears (with all of the columns being the same.

我想要一个与Counter具有相同功能的输出。我需要知道每行出现多少次(所有列都相同。

In this case the proper output would be:

在这种情况下,正确的输出将是:

ColA ColB Count
1    1    3
1    2    2
2    1    1
3    2    1

I have tried something of the sort:

我尝试过这样的事情:

df.groupby(['ColA','ColB']).ColA.count()

but this gives me some ugly output I am having trouble formatting

但这给了我一些丑陋的输出我在格式化时遇到问题

回答by jezrael

You can use sizewith reset_index:

你可以用sizereset_index

print df.groupby(['ColA','ColB']).size().reset_index(name='Count')
   ColA  ColB  Count
0     1     1      3
1     1     2      2
2     2     1      1
3     3     2      1

回答by eddygeek

I only needed to count the unique rowsand have used this alternative:

我只需要计算唯一行并使用了这个替代方案:

len(df[['ColA','ColB']].drop_duplicates())

For this task, on my data, it was twice faster than len(df.groupby(['ColA','ColB']))like in the above, more general solution.

对于这个任务,在我的数据上,它比len(df.groupby(['ColA','ColB']))上面更通用的解决方案快两倍。