pandas 获取熊猫数据框中所有唯一行的计数

Question

提问by Yashu Seth

I have a Pandas DataFrame -

我有一个 Pandas DataFrame -

>>> import numpy as np
>>> import pandas as pd
>>> data = pd.DataFrame(np.random.randint(low=0, high=2,size=(5,3)),
...                       columns=['A', 'B', 'C'])
>>> data
   A  B  C
0  0  1  0
1  1  0  1
2  1  0  1
3  0  1  1
4  1  1  0

Now I use this to get the count of rows only for column A

现在我用它来获取 A 列的行数

>>> data.ix[:, 'A'].value_counts()
1    3
0    2
dtype: int64

What is the most efficient way to get the count of rows for column A and B i.e something like the following output -

获取 A 列和 B 列的行数的最有效方法是什么，即类似于以下输出 -

0    0    0
0    1    2
1    0    2
1    1    1

And then finally how can I convert it into a numpy array such as -

然后最后我怎么能把它转换成一个 numpy 数组，比如 -

array([[0, 2],
       [2, 1]])

Please give a solution that is also consistent with

请给出一个也符合的解决方案

>>>> data = pd.DataFrame(np.random.randint(low=0, high=2,size=(5,2)),
...                       columns=['A', 'B'])

Answer 1

回答by Andy Hayden

You can use groupby sizeand then unstack:

您可以使用 groupby size然后unstack：

In [11]: data.groupby(["A","B"]).size()
Out[11]:
A  B
0  1    2
1  0    2
   1    1
dtype: int64

In [12]: data.groupby(["A","B"]).size().unstack("B")
Out[12]:
B   0  1
A
0 NaN  2
1   2  1

In [13]: data.groupby(["A","B"]).size().unstack("B").fillna(0)
Out[13]:
B  0  1
A
0  0  2
1  2  1

Howeverwhenever you do a groupby followed by an unstack you should think: pivot_table:

但是，每当您执行 groupby 后跟unstack 时，您应该考虑：pivot_table：

In [21]: data.pivot_table(index="A", columns="B", aggfunc="count", fill_value=0)
Out[21]:
   C
B  0  1
A
0  0  2
1  2  1

This will be the most efficient solution as well as being the most direct.

这将是最有效的解决方案，也是最直接的。

Answer 2

回答by Anton Protopopov

You could use groupbyon A and B columns and then do counton the result. But with that you'll get only values which you have in your original dataframe. In your case you won't have 0 0counts. After that you could call valuesmethod to get numpyarray:

您可以groupby在 A 和 B 列上使用，然后count对结果进行处理。但是这样一来，您将只能获得原始数据框中的值。在你的情况下，你不会有0 0计数。之后，您可以调用values方法来获取numpy数组：

In [52]: df
Out[52]: 
   A  B  C
0  0  1  0
1  1  0  1
2  1  0  1
3  0  1  1
4  1  1  0

In [56]: df.groupby(['A', 'B'], as_index=False).count()
Out[56]: 
   A  B  C
0  0  1  2
1  1  0  2
2  1  1  1

In [57]: df.groupby(['A', 'B'], as_index=False).count().C.values
Out[57]: array([2, 2, 1])

Then you could use reshapemethod of numpy array

然后你可以使用reshapenumpy数组的方法

For dataframe with all values:

对于具有所有值的数据框：

In [71]: df
Out[71]: 
   A  B  C
0  1  0  1
1  1  1  1
2  1  0  1
3  1  1  0
4  0  1  1
5  0  0  1
6  1  1  1
7  0  0  1
8  0  1  0
9  1  1  0

In [73]: df.groupby(['A', 'B'], as_index=False).count()
Out[73]: 
   A  B  C
0  0  0  2
1  0  1  2
2  1  0  2
3  1  1  4


In [75]: df.groupby(['A', 'B'], as_index=False).count().C.values.reshape(2,2)
Out[75]: 
array([[2, 2],
       [2, 4]])

Answer 3

回答by Alexander

Assuming that all of your data is binary, you can just sum the columns. To be safe, you then use countto get the total of all non null values in the column (the difference between this count and the previous sum is the number of zeros).

假设您的所有数据都是二进制的，您只需对列求和即可。为安全起见，您然后使用count获取列中所有非空值的总和（此计数与前一个总和之间的差是零的数量）。

s = data[['A', 'B']].sum().values
>>> np.matrix([s, data[['A', 'B']].count().values - s])
matrix([[3, 3],
        [2, 2]]

If you are sure that there are no null values, you can save some computational time by just taking the number of rows from the first shape parameter.

如果您确定没有空值，则可以通过仅从第一个形状参数中获取行数来节省一些计算时间。

>>> np.matrix([s, data.shape[0] - s])
matrix([[3, 3],
        [2, 2]]

pandas 获取熊猫数据框中所有唯一行的计数

提问by Yashu Seth

回答by Andy Hayden

回答by Anton Protopopov

回答by Alexander

相关推荐

最近更新

标签

pandas 获取熊猫数据框中所有唯一行的计数

提问by Yashu Seth

回答by Andy Hayden

回答by Anton Protopopov

回答by Alexander

相关推荐

pandas 'DataFrame' 对象没有属性 'read_csv'

pandas Python matplotlib.pyplot 饼图：如何去掉左边的标签？

pandas 如何跨多列使用groupby转换

Python Pandas，将 groupby() 组标签设置为新数据帧中的索引

相关推荐

最近更新

标签