Pandas 查找列值在数据集中出现的次数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38487497/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas find how many times a column value appears in dataset
提问by if __name__ is None
I am trying to sort data by the Name
column, by popularity.
我正在尝试Name
按受欢迎程度按列对数据进行排序。
Right now, I'm doing this:
现在,我正在这样做:
df['Count'] = df.apply(lambda x: len(df[df['Name'] == x['Name']]), axis=1)
df[df['Count'] > 50][['Name', 'Description', 'Count']].drop_duplicates('Name').sort_values('Count', ascending=False).head(100)
However this query is very slow, it takes hours to run.
但是这个查询很慢,需要几个小时才能运行。
What would be a more efficient way to do this?
什么是更有效的方法来做到这一点?
回答by if __name__ is None
The solution I have been looking for is:
我一直在寻找的解决方案是:
df['Count'] = df.groupby('Name')['Name'].transform('count')
Big thanks to @Lynob for providing a link with an answer.
非常感谢@Lynob 提供带有答案的链接。
回答by Alex
You can use Series.value_counts
.
您可以使用Series.value_counts
.
df = pd.DataFrame([[0, 1], [1, 0], [1, 1]], columns=['a', 'b'])
print(df['b'].value_counts())
outputs
输出
1 2
0 1
Name: b, dtype: int64
回答by Merlin
Try this:
尝试这个:
a = ["jim"]*5 + ["jane"]*10 + ["john"]*15
n = pd.Series(a)
sorted((n.value_counts()[n.value_counts() > 5]).index)
['jane', 'john']