Python 如何在 Pandas DataFrame 中一次获取多列的值计数？

Question

提问by Xin

Given a Pandas DataFrame that has multiple columns with categorical values (0 or 1), is it possible to conveniently get the value_counts for every column at the same time?

给定一个 Pandas DataFrame 有多个带有分类值（0 或 1）的列，是否可以方便地同时获取每列的 value_counts？

For example, suppose I generate a DataFrame as follows:

例如，假设我生成一个 DataFrame 如下：

import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))

I can get a DataFrame like this:

我可以得到这样的 DataFrame：

   a  b  c  d
0  0  1  1  0
1  1  1  1  1
2  1  1  1  0
3  0  1  0  0
4  0  0  0  1
5  0  1  1  0
6  0  1  1  1
7  1  0  1  0
8  1  0  1  1
9  0  1  1  0

How do I conveniently get the value counts for every column and obtain the following conveniently?

如何方便地获取每列的值计数并方便地获取以下内容？

   a  b  c  d
0  6  3  2  6
1  4  7  8  4

My current solution is:

我目前的解决方案是：

pieces = []
for col in df.columns:
    tmp_series = df[col].value_counts()
    tmp_series.name = col
    pieces.append(tmp_series)
df_value_counts = pd.concat(pieces, axis=1)

But there must be a simpler way, like stacking, pivoting, or groupby?

但是一定有更简单的方法，比如堆叠、旋转或分组？

Answer 1

采纳答案by EdChum

Just call applyand pass pd.Series.value_counts:

只需致电apply并通过pd.Series.value_counts：

In [212]:
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
df.apply(pd.Series.value_counts)
Out[212]:
   a  b  c  d
0  4  6  4  3
1  6  4  6  7

Answer 2

回答by Ted Petrou

There is actually a fairly interesting and advanced way of doing this problem with crosstaband melt

其实是有一个相当有趣的，先进的与做这个问题的方式crosstab和melt

df = pd.DataFrame({'a': ['table', 'chair', 'chair', 'lamp', 'bed'],
                   'b': ['lamp', 'candle', 'chair', 'lamp', 'bed'],
                   'c': ['mirror', 'mirror', 'mirror', 'mirror', 'mirror']})

df

       a       b       c
0  table    lamp  mirror
1  chair  candle  mirror
2  chair   chair  mirror
3   lamp    lamp  mirror
4    bed     bed  mirror

We can first melt the DataFrame

我们可以先融化DataFrame

df1 = df.melt(var_name='columns', value_name='index')
df1

   columns   index
0        a   table
1        a   chair
2        a   chair
3        a    lamp
4        a     bed
5        b    lamp
6        b  candle
7        b   chair
8        b    lamp
9        b     bed
10       c  mirror
11       c  mirror
12       c  mirror
13       c  mirror
14       c  mirror

And then use the crosstab function to count the values for each column. This preserves the data type as ints which wouldn't be the case for the currently selected answer:

然后使用交叉表函数计算每列的值。这会将数据类型保留为整数，而当前选择的答案并非如此：

pd.crosstab(index=df1['index'], columns=df1['columns'])

columns  a  b  c
index           
bed      1  1  0
candle   0  1  0
chair    2  1  0
lamp     1  2  0
mirror   0  0  5
table    1  0  0

Or in one line, which expands the column names to parameter names with **(this is advanced)

或者在一行中，将列名扩展为参数名**（这是高级的）

pd.crosstab(**df.melt(var_name='columns', value_name='index'))

Also, value_countsis now a top-level function. So you can simplify the currently selected answer to the following:

此外，value_counts现在是顶级功能。因此，您可以将当前选择的答案简化为以下内容：

df.apply(pd.value_counts)

Answer 3

回答by Ajay Kumar

You can also try this code

你也可以试试这个代码

for i in heart.columns: x = heart[i].value_counts() print("Column name is:",i,"and it value is:",x) print()

Python 如何在 Pandas DataFrame 中一次获取多列的值计数？

提问by Xin

采纳答案by EdChum

回答by Ted Petrou

回答by Ajay Kumar

相关推荐

最近更新

标签

Python 如何在 Pandas DataFrame 中一次获取多列的值计数？

提问by Xin

采纳答案by EdChum

回答by Ted Petrou

回答by Ajay Kumar

相关推荐

Python 如何在有约束的 scipy 中使用最小化函数

Python PySpark：使用过滤功能后取一列的平均值

Python Selenium WebDriver：Firefox 启动，但不打开 URL

Python 编译单个语句时发现多个语句

相关推荐

最近更新

标签