Python 如何在 Pandas DataFrame 中一次获取多列的值计数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32589829/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get value counts for multiple columns at once in Pandas DataFrame?
提问by Xin
Given a Pandas DataFrame that has multiple columns with categorical values (0 or 1), is it possible to conveniently get the value_counts for every column at the same time?
给定一个 Pandas DataFrame 有多个带有分类值(0 或 1)的列,是否可以方便地同时获取每列的 value_counts?
For example, suppose I generate a DataFrame as follows:
例如,假设我生成一个 DataFrame 如下:
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
I can get a DataFrame like this:
我可以得到这样的 DataFrame:
a b c d
0 0 1 1 0
1 1 1 1 1
2 1 1 1 0
3 0 1 0 0
4 0 0 0 1
5 0 1 1 0
6 0 1 1 1
7 1 0 1 0
8 1 0 1 1
9 0 1 1 0
How do I conveniently get the value counts for every column and obtain the following conveniently?
如何方便地获取每列的值计数并方便地获取以下内容?
a b c d
0 6 3 2 6
1 4 7 8 4
My current solution is:
我目前的解决方案是:
pieces = []
for col in df.columns:
tmp_series = df[col].value_counts()
tmp_series.name = col
pieces.append(tmp_series)
df_value_counts = pd.concat(pieces, axis=1)
But there must be a simpler way, like stacking, pivoting, or groupby?
但是一定有更简单的方法,比如堆叠、旋转或分组?
采纳答案by EdChum
Just call apply
and pass pd.Series.value_counts
:
只需致电apply
并通过pd.Series.value_counts
:
In [212]:
df = pd.DataFrame(np.random.randint(0, 2, (10, 4)), columns=list('abcd'))
df.apply(pd.Series.value_counts)
Out[212]:
a b c d
0 4 6 4 3
1 6 4 6 7
回答by Ted Petrou
There is actually a fairly interesting and advanced way of doing this problem with crosstab
and melt
其实是有一个相当有趣的,先进的与做这个问题的方式crosstab
和melt
df = pd.DataFrame({'a': ['table', 'chair', 'chair', 'lamp', 'bed'],
'b': ['lamp', 'candle', 'chair', 'lamp', 'bed'],
'c': ['mirror', 'mirror', 'mirror', 'mirror', 'mirror']})
df
a b c
0 table lamp mirror
1 chair candle mirror
2 chair chair mirror
3 lamp lamp mirror
4 bed bed mirror
We can first melt the DataFrame
我们可以先融化DataFrame
df1 = df.melt(var_name='columns', value_name='index')
df1
columns index
0 a table
1 a chair
2 a chair
3 a lamp
4 a bed
5 b lamp
6 b candle
7 b chair
8 b lamp
9 b bed
10 c mirror
11 c mirror
12 c mirror
13 c mirror
14 c mirror
And then use the crosstab function to count the values for each column. This preserves the data type as ints which wouldn't be the case for the currently selected answer:
然后使用交叉表函数计算每列的值。这会将数据类型保留为整数,而当前选择的答案并非如此:
pd.crosstab(index=df1['index'], columns=df1['columns'])
columns a b c
index
bed 1 1 0
candle 0 1 0
chair 2 1 0
lamp 1 2 0
mirror 0 0 5
table 1 0 0
Or in one line, which expands the column names to parameter names with **
(this is advanced)
或者在一行中,将列名扩展为参数名**
(这是高级的)
pd.crosstab(**df.melt(var_name='columns', value_name='index'))
Also, value_counts
is now a top-level function. So you can simplify the currently selected answer to the following:
此外,value_counts
现在是顶级功能。因此,您可以将当前选择的答案简化为以下内容:
df.apply(pd.value_counts)
回答by Ajay Kumar
You can also try this code
你也可以试试这个代码
for i in heart.columns:
x = heart[i].value_counts()
print("Column name is:",i,"and it value is:",x)
print()
for i in heart.columns:
x = heart[i].value_counts()
print("Column name is:",i,"and it value is:",x)
print()