pandas 计算 DataFrame 每一行中系列中项目的出现次数

Question

提问by sriramn

I have a pandas.DataFramethat looks like this.

我有一个pandas.DataFrame看起来像这样的。

COL1    COL2    COL3
C1      None    None
C1      C2      None
C1      C1      None
C1      C2      C3

For each row in this dataframe I would like to count the occurrences of each of C1, C2, C3 and append this information as columns to this dataframe. For instance, the first row has 1 C1, 0 C2 and 0 C3. The final data frame should look like this

对于此数据框中的每一行，我想计算 C1、C2、C3 中每一个的出现次数，并将此信息作为列附加到此数据框中。例如，第一行有 1 个 C1、0 个 C2 和 0 个 C3。最终的数据框应该是这样的

COL1    COL2    COL3    C1  C2  C3
C1      None    None    1   0   0
C1      C2      None    1   1   0
C1      C1      None    2   0   0
C1      C2      C3      1   1   1

So, I have created a Series with C1, C2 and C3 as the values - one way top count this is to loop over the rows and columns of the DataFrame and then over this Series and increment the counter if it matches. But is there an applyapproach that can achieve this in a compact fashion?

因此，我创建了一个以 C1、C2 和 C3 作为值的系列 - 一种最高计数的方法是循环遍历 DataFrame 的行和列，然后遍历这个系列，如果匹配则增加计数器。但是有没有一种apply方法可以以紧凑的方式实现这一目标？

Answer 1

回答by Andy Hayden

You could apply value_counts:

你可以申请value_counts：

In [11]: df.apply(pd.Series.value_counts, axis=1)
Out[11]: 
   C1  C2  C3  None
0   1 NaN NaN     2
1   1   1 NaN     1
2   2 NaN NaN     1
3   1   1   1   NaN

So you can fill the NaN and applend just the base values you want:

因此，您可以仅填充 NaN 和 applend 所需的基本值：

In [12]: df.apply(pd.Series.value_counts, axis=1)[['C1', 'C2', 'C3']].fillna(0)
Out[12]: 
   C1  C2  C3
0   1   0   0
1   1   1   0
2   2   0   0
3   1   1   1

Note: there's an open issue to have a value_counts method directly for a DataFrame (which I think should be introduced by pandas 0.15).

注意：直接为 DataFrame 使用 value_counts 方法是一个悬而未决的问题（我认为应该由 pandas 0.15 引入）。

Answer 2

回答by Zero

Andy's answer is spot on.

安迪的回答很到位。

I'm adding this answer, if C1,C2...Cn list is huge and we want to view only subset of them.

如果 C1,C2...Cn 列表很大并且我们只想查看它们的子集，我将添加此答案。

dff = df.copy()
dff['C1']=(df == 'C1').T.sum()
dff['C2']=(df == 'C2').T.sum()
dff['C3']=(df == 'C3').T.sum()
dff
  COL1  COL2  COL3  C1  C2  C3
0   C1  None  None   1   0   0
1   C1    C2  None   1   1   0
2   C1    C1  None   2   0   0
3   C1    C2    C3   1   1   1

Answer 3

回答by YOBEN_S

Usually apply+ serisefunction to whole dataframe will slowing down the whole process , Additional Reading : Link

通常对整个数据帧的apply+serise函数会减慢整个过程，补充阅读：链接

df.mask(df.eq('None')).stack().str.get_dummies().sum(level=0)
Out[165]: 
   C1  C2  C3
0   1   0   0
1   1   1   0
2   2   0   0
3   1   1   1

Or you can do with Counter

或者你可以做 Counter

from  collections import Counter

pd.DataFrame([ Counter(x) for x in df.values]).drop('None',1)
Out[170]: 
   C1   C2   C3
0   1  NaN  NaN
1   1  1.0  NaN
2   2  NaN  NaN
3   1  1.0  1.0

pandas 计算 DataFrame 每一行中系列中项目的出现次数

提问by sriramn

回答by Andy Hayden

回答by Zero

回答by YOBEN_S

相关推荐

最近更新

标签

pandas 计算 DataFrame 每一行中系列中项目的出现次数

提问by sriramn

回答by Andy Hayden

回答by Zero

回答by YOBEN_S

相关推荐

pandas Python 中的字典分组和聚合列表

pandas 熊猫将函数应用于多列和多行

pandas 使用 DataFrame.Plot 在同一图上绘制多个图

在函数内迭代 Pandas 系列的行

相关推荐

最近更新

标签