Pandas 查找列值在数据集中出现的次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38487497/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:37:58  来源:igfitidea点击:

Pandas find how many times a column value appears in dataset

pythonpandas

提问by if __name__ is None

I am trying to sort data by the Namecolumn, by popularity.

我正在尝试Name按受欢迎程度按列对数据进行排序。

Right now, I'm doing this:

现在,我正在这样做:

df['Count'] = df.apply(lambda x: len(df[df['Name'] == x['Name']]), axis=1)
df[df['Count'] > 50][['Name', 'Description', 'Count']].drop_duplicates('Name').sort_values('Count', ascending=False).head(100)

However this query is very slow, it takes hours to run.

但是这个查询很慢,需要几个小时才能运行。

What would be a more efficient way to do this?

什么是更有效的方法来做到这一点?

回答by if __name__ is None

The solution I have been looking for is:

我一直在寻找的解决方案是:

df['Count'] = df.groupby('Name')['Name'].transform('count')

Big thanks to @Lynob for providing a link with an answer.

非常感谢@Lynob 提供带有答案的链接。

回答by Alex

You can use Series.value_counts.

您可以使用Series.value_counts.

df = pd.DataFrame([[0, 1], [1, 0], [1, 1]], columns=['a', 'b'])
print(df['b'].value_counts())

outputs

输出

1    2
0    1
Name: b, dtype: int64

回答by Merlin

Try this:

尝试这个:

a = ["jim"]*5  + ["jane"]*10 + ["john"]*15 
n = pd.Series(a)

sorted((n.value_counts()[n.value_counts() > 5]).index)

['jane', 'john']