Python 如何在 Pandas DataFrame 中一次获取多列的值计数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32589829/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:51:26  来源:igfitidea点击:

How to get value counts for multiple columns at once in Pandas DataFrame?

pythonnumpypandas

提问by Xin

Given a Pandas DataFrame that has multiple columns with categorical values (0 or 1), is it possible to conveniently get the value_counts for every column at the same time?

给定一个 Pandas DataFrame 有多个带有分类值(0 或 1)的列,是否可以方便地同时获取每列的 value_counts?

For example, suppose I generate a DataFrame as follows:

例如,假设我生成一个 DataFrame 如下:

import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))

I can get a DataFrame like this:

我可以得到这样的 DataFrame:

   a  b  c  d
0  0  1  1  0
1  1  1  1  1
2  1  1  1  0
3  0  1  0  0
4  0  0  0  1
5  0  1  1  0
6  0  1  1  1
7  1  0  1  0
8  1  0  1  1
9  0  1  1  0

How do I conveniently get the value counts for every column and obtain the following conveniently?

如何方便地获取每列的值计数并方便地获取以下内容?

   a  b  c  d
0  6  3  2  6
1  4  7  8  4

My current solution is:

我目前的解决方案是:

pieces = []
for col in df.columns:
    tmp_series = df[col].value_counts()
    tmp_series.name = col
    pieces.append(tmp_series)
df_value_counts = pd.concat(pieces, axis=1)

But there must be a simpler way, like stacking, pivoting, or groupby?

但是一定有更简单的方法,比如堆叠、旋转或分组?

采纳答案by EdChum

Just call applyand pass pd.Series.value_counts:

只需致电apply并通过pd.Series.value_counts

In [212]:
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
df.apply(pd.Series.value_counts)
Out[212]:
   a  b  c  d
0  4  6  4  3
1  6  4  6  7

回答by Ted Petrou

There is actually a fairly interesting and advanced way of doing this problem with crosstaband melt

其实是有一个相当有趣的,先进的与做这个问题的方式crosstabmelt

df = pd.DataFrame({'a': ['table', 'chair', 'chair', 'lamp', 'bed'],
                   'b': ['lamp', 'candle', 'chair', 'lamp', 'bed'],
                   'c': ['mirror', 'mirror', 'mirror', 'mirror', 'mirror']})

df

       a       b       c
0  table    lamp  mirror
1  chair  candle  mirror
2  chair   chair  mirror
3   lamp    lamp  mirror
4    bed     bed  mirror

We can first melt the DataFrame

我们可以先融化DataFrame

df1 = df.melt(var_name='columns', value_name='index')
df1

   columns   index
0        a   table
1        a   chair
2        a   chair
3        a    lamp
4        a     bed
5        b    lamp
6        b  candle
7        b   chair
8        b    lamp
9        b     bed
10       c  mirror
11       c  mirror
12       c  mirror
13       c  mirror
14       c  mirror

And then use the crosstab function to count the values for each column. This preserves the data type as ints which wouldn't be the case for the currently selected answer:

然后使用交叉表函数计算每列的值。这会将数据类型保留为整数,而当前选择的答案并非如此:

pd.crosstab(index=df1['index'], columns=df1['columns'])

columns  a  b  c
index           
bed      1  1  0
candle   0  1  0
chair    2  1  0
lamp     1  2  0
mirror   0  0  5
table    1  0  0

Or in one line, which expands the column names to parameter names with **(this is advanced)

或者在一行中,将列名扩展为参数名**(这是高级的)

pd.crosstab(**df.melt(var_name='columns', value_name='index'))

Also, value_countsis now a top-level function. So you can simplify the currently selected answer to the following:

此外,value_counts现在是顶级功能。因此,您可以将当前选择的答案简化为以下内容:

df.apply(pd.value_counts)

回答by Ajay Kumar

You can also try this code

你也可以试试这个代码

for i in heart.columns: x = heart[i].value_counts() print("Column name is:",i,"and it value is:",x) print()

for i in heart.columns: x = heart[i].value_counts() print("Column name is:",i,"and it value is:",x) print()