Python Pandas 获取列中出现频率最高的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48590268/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:46:30  来源:igfitidea点击:

Pandas get the most frequent values of a column

pythonpandasdataframe

提问by aleale

i have this dataframe:

我有这个数据框:

0 name data
1 alex asd
2 helen sdd
3 alex dss
4 helen sdsd
5 john sdadd

so i am trying to get the most frequent value or values(in this case its values)so what i do is:

所以我试图获得最频繁的一个或多个值(在这种情况下是它的值),所以我要做的是:

dataframe['name'].value_counts().idxmax()

but it returns only the value: Alexeven if it Helenappears two times as well.

但它只返回值:Alex,即使Helen 也出现了两次。

回答by YOBEN_S

By using mode

通过使用 mode

df.name.mode()
Out[712]: 
0     alex
1    helen
dtype: object

回答by Jared Wilber

To get the nmost frequent values, just subset .value_counts()and grab the index:

要获得n最频繁的值,只需设置子集.value_counts()并获取索引:

# get top 10 most frequent names
n = 10
dataframe['name'].value_counts()[:n].index.tolist()

回答by Lunar_one

You could try argmaxlike this:

你可以这样尝试argmax

dataframe['name'].value_counts().argmax() Out[13]: 'alex'

dataframe['name'].value_counts().argmax() Out[13]: 'alex'

The value_countswill return a count object of pandas.core.series.Seriesand argmaxcould be used to achieve the key of max values.

value_counts返回的计数的对象pandas.core.series.Series,并argmax可以用来实现最大价值的关键。

回答by paul okoduwa

You can use this to get a perfect count, it calculates the mode a particular column

您可以使用它来获得完美的计数,它计算特定列的模式

df['name'].value_counts()

回答by Taie

df['name'].value_counts()[:5].sort_values(ascending=False)

The value_countswill return a count object of pandas.core.series.Seriesand sort_values(ascending=False)will get you the highest values first.

value_counts返回的计数对象pandas.core.series.Series,并sort_values(ascending=False)会得到你的最高值第一。

回答by piRSquared

Not Obvious, But Fast

不明显,但很快

f, u = pd.factorize(df.name.values)
counts = np.bincount(f)
u[counts == counts.max()]

array(['alex', 'helen'], dtype=object)

回答by pault

Here's one way:

这是一种方法:

df['name'].value_counts()[df['name'].value_counts() == df['name'].value_counts().max()]

which prints:

打印:

helen    2
alex     2
Name: name, dtype: int64

回答by Brian

You could use .apply and pd.value_counts to get a count the occurrence of all the names in the name column.

您可以使用 .apply 和 pd.value_counts 来计算名称列中所有名称的出现次数。

dataframe['name'].apply(pd.value_counts)

回答by Naomi Fridman

to get top 5:

获得前 5 名:

dataframe['name'].value_counts()[0:5]

回答by venergiac

my best solution to get the first is

我获得第一个的最佳解决方案是

 df['my_column'].value_counts().sort_values(ascending=False).argmax()