Pandas 查找列值在数据集中出现的次数

Question

提问by if name is None

I am trying to sort data by the Namecolumn, by popularity.

我正在尝试Name按受欢迎程度按列对数据进行排序。

Right now, I'm doing this:

现在，我正在这样做：

df['Count'] = df.apply(lambda x: len(df[df['Name'] == x['Name']]), axis=1)
df[df['Count'] > 50][['Name', 'Description', 'Count']].drop_duplicates('Name').sort_values('Count', ascending=False).head(100)

However this query is very slow, it takes hours to run.

但是这个查询很慢，需要几个小时才能运行。

What would be a more efficient way to do this?

什么是更有效的方法来做到这一点？

Answer 1

回答by if name is None

The solution I have been looking for is:

我一直在寻找的解决方案是：

df['Count'] = df.groupby('Name')['Name'].transform('count')

Big thanks to @Lynob for providing a link with an answer.

非常感谢@Lynob 提供带有答案的链接。

Answer 2

回答by Alex

You can use Series.value_counts.

您可以使用Series.value_counts.

df = pd.DataFrame([[0, 1], [1, 0], [1, 1]], columns=['a', 'b'])
print(df['b'].value_counts())

outputs

输出

1    2
0    1
Name: b, dtype: int64

Answer 3

回答by Merlin

Try this:

尝试这个：

a = ["jim"]*5  + ["jane"]*10 + ["john"]*15 
n = pd.Series(a)

sorted((n.value_counts()[n.value_counts() > 5]).index)

['jane', 'john']

Pandas 查找列值在数据集中出现的次数

提问by if name is None

回答by if name is None

回答by Alex

回答by Merlin

相关推荐

最近更新

标签

Pandas 查找列值在数据集中出现的次数

提问by if __name__ is None

回答by if __name__ is None

回答by Alex

回答by Merlin

相关推荐

填充 Pandas Dataframe 列中缺失的日期值

Pandas DataFrame 选择具有 NaN 值的特定列

在 Pandas 中分组、转置和追加？

将 Pandas Dataframe 单元格中的嵌套数组值拆分为多行

相关推荐

最近更新

标签

提问by if name is None

回答by if name is None