pandas 关于pandas value_counts 函数的说明

Question

提问by Quazi Farhan

Can someone please explain what does the line

有人可以解释一下这条线是什么吗

result = data.apply(pd.value_counts).fillna(0)

does in here?

在这里吗？

import pandas as pd 
from pandas import Series, DataFrame

data = DataFrame({'Qu1': [1, 3, 4, 3, 4],
                  'Qu2': [2, 3, 1, 2, 3],
                  'Qu3': [1, 5, 2, 4, 4]})

result = data.apply(pd.value_counts).fillna(0)  

In [26]:data
Out[26]:
Qu1 Qu2 Qu3
0 1 2 1
1 3 3 5
2 4 1 2
3 3 2 4
4 4 3 4

In [27]:result
Out[28]:
Qu1 Qu2 Qu3
1 1 1 1
2 0 2 1
3 2 2 0
4 2 0 2
5 0 0 1

Answer 1

采纳答案by U2EF1

From the docs, it produces a histogram of non-null values. Looking just at column Qu1of result, we can tell that there is one 1, zero 2's, two 3's, two 4's, and zero 5's in the original column data.Qu1.

从文档中，它生成非空值的直方图。只是看在列Qu1的result，我们可以告诉大家，有一个1，零点2的两个3的，两个4的和零5名在原始列data.Qu1。

Answer 2

回答by Andy Hayden

I think the easiest way to understand what's going on is to break it down.

我认为了解正在发生的事情的最简单方法是将其分解。

One each column, value_counts simply counts the number of occurrences of each value in the Series (i.e. in 4 appears twice in the Qu1 column):

每列一个，value_counts 简单地计算系列中每个值的出现次数（即在 4 中在 Qu1 列中出现两次）：

In [11]: pd.value_counts(data.Qu1)
Out[11]:
4    2
3    2
1    1
dtype: int64

When you do an apply each column is realigned with the other results, since every value between 1 and 5 is seen it's aligned with range(1, 6):

当您执行应用时，每列都会与其他结果重新对齐，因为可以看到 1 到 5 之间的每个值都与对齐range(1, 6)：

In [12]: pd.value_counts(data.Qu1).reindex(range(1, 6))
Out[12]:
1     1
2   NaN
3     2
4     2
5   NaN
dtype: float64

You want to count the values you didn't see as 0 rather than NaN, hence the fillna:

您想将您没有看到的值计算为 0 而不是 NaN，因此填充：

In [13]: pd.value_counts(data.Qu1).reindex(range(1, 6)).fillna(0)
Out[13]:
1    1
2    0
3    2
4    2
5    0
dtype: float64

When you do the apply, it concats the result of doing this for each column:

当您执行 apply 时，它会为每一列合并执行此操作的结果：

In [14]: pd.concat((pd.value_counts(data[col]).reindex(range(1, 6)).fillna(0)
                       for col in data.columns),
                   axis=1, keys=data.columns)
Out[14]:
   Qu1  Qu2  Qu3
1    1    1    1
2    0    2    1
3    2    2    0
4    2    0    2
5    0    0    1

pandas 关于pandas value_counts 函数的说明

提问by Quazi Farhan

采纳答案by U2EF1

回答by Andy Hayden

相关推荐

最近更新

标签

pandas 关于pandas value_counts 函数的说明

提问by Quazi Farhan

采纳答案by U2EF1

回答by Andy Hayden

相关推荐

pandas 如何创建熊猫时间戳对象？

如何按天拆分 Pandas 数据帧或系列（可能使用迭代器）

pandas 如何按一天中的时间对熊猫时间序列进行子集

pandas 为什么pandas apply计算两次

相关推荐

最近更新

标签