pandas 关于pandas value_counts 函数的说明
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/21966065/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Explanation about pandas value_counts function
提问by Quazi Farhan
Can someone please explain what does the line
有人可以解释一下这条线是什么吗
result = data.apply(pd.value_counts).fillna(0)  
does in here?
在这里吗?
import pandas as pd 
from pandas import Series, DataFrame
data = DataFrame({'Qu1': [1, 3, 4, 3, 4],
                  'Qu2': [2, 3, 1, 2, 3],
                  'Qu3': [1, 5, 2, 4, 4]})
result = data.apply(pd.value_counts).fillna(0)  
In [26]:data
Out[26]:
Qu1 Qu2 Qu3
0 1 2 1
1 3 3 5
2 4 1 2
3 3 2 4
4 4 3 4
In [27]:result
Out[28]:
Qu1 Qu2 Qu3
1 1 1 1
2 0 2 1
3 2 2 0
4 2 0 2
5 0 0 1
采纳答案by U2EF1
From the docs, it produces a histogram of non-null values. Looking just at column Qu1of result, we can tell that there is one 1, zero 2's, two 3's, two 4's, and zero 5's in the original column data.Qu1.
从文档中,它生成非空值的直方图。只是看在列Qu1的result,我们可以告诉大家,有一个1,零点2的两个3的,两个4的和零5名在原始列data.Qu1。
回答by Andy Hayden
I think the easiest way to understand what's going on is to break it down.
我认为了解正在发生的事情的最简单方法是将其分解。
One each column, value_counts simply counts the number of occurrences of each value in the Series (i.e. in 4 appears twice in the Qu1 column):
每列一个,value_counts 简单地计算系列中每个值的出现次数(即在 4 中在 Qu1 列中出现两次):
In [11]: pd.value_counts(data.Qu1)
Out[11]:
4    2
3    2
1    1
dtype: int64
When you do an apply each column is realigned with the other results, since every value between 1 and 5 is seen it's aligned with range(1, 6):
当您执行应用时,每列都会与其他结果重新对齐,因为可以看到 1 到 5 之间的每个值都与 对齐range(1, 6):
In [12]: pd.value_counts(data.Qu1).reindex(range(1, 6))
Out[12]:
1     1
2   NaN
3     2
4     2
5   NaN
dtype: float64
You want to count the values you didn't see as 0 rather than NaN, hence the fillna:
您想将您没有看到的值计算为 0 而不是 NaN,因此填充:
In [13]: pd.value_counts(data.Qu1).reindex(range(1, 6)).fillna(0)
Out[13]:
1    1
2    0
3    2
4    2
5    0
dtype: float64
When you do the apply, it concats the result of doing this for each column:
当您执行 apply 时,它会为每一列合并执行此操作的结果:
In [14]: pd.concat((pd.value_counts(data[col]).reindex(range(1, 6)).fillna(0)
                       for col in data.columns),
                   axis=1, keys=data.columns)
Out[14]:
   Qu1  Qu2  Qu3
1    1    1    1
2    0    2    1
3    2    2    0
4    2    0    2
5    0    0    1

