pandas 熊猫计算唯一行

Question

提问by qwertylpc

I have a pandas data frame similar to:

我有一个类似于以下内容的Pandas数据框：

I want an output that has the same function as Counter. I need to know how many time each row appears (with all of the columns being the same.

我想要一个与Counter具有相同功能的输出。我需要知道每行出现多少次（所有列都相同。

In this case the proper output would be:

在这种情况下，正确的输出将是：

ColA ColB Count
1    1    3
1    2    2
2    1    1
3    2    1

I have tried something of the sort:

我尝试过这样的事情：

df.groupby(['ColA','ColB']).ColA.count()

but this gives me some ugly output I am having trouble formatting

但这给了我一些丑陋的输出我在格式化时遇到问题

Answer 1

回答by jezrael

You can use sizewith reset_index:

你可以用size与reset_index：

print df.groupby(['ColA','ColB']).size().reset_index(name='Count')
   ColA  ColB  Count
0     1     1      3
1     1     2      2
2     2     1      1
3     3     2      1

Answer 2

回答by eddygeek

I only needed to count the unique rowsand have used this alternative:

我只需要计算唯一行并使用了这个替代方案：

len(df[['ColA','ColB']].drop_duplicates())

For this task, on my data, it was twice faster than len(df.groupby(['ColA','ColB']))like in the above, more general solution.

对于这个任务，在我的数据上，它比len(df.groupby(['ColA','ColB']))上面更通用的解决方案快两倍。

pandas 熊猫计算唯一行

提问by qwertylpc

回答by jezrael

回答by eddygeek

相关推荐

最近更新

标签

pandas 熊猫计算唯一行

提问by qwertylpc

回答by jezrael

回答by eddygeek

相关推荐

检测 pandas.DataFrame 中的列是否为分类的一个好的启发式方法是什么？

使用 Pandas 中的系列连接 DataFrame

pandas 使用列表理解修改数据框列

Pandas DataFrame 能否高效计算 PMI（Pointwise Mutual Information）？

相关推荐

最近更新

标签