Python 如何获取列中出现频率最高的值的个数？

Question

提问by Roman

I have a data frame and I would like to know how many times a given column has the most frequent value.

我有一个数据框，我想知道给定列具有最频繁值的次数。

I try to do it in the following way:

我尝试通过以下方式做到这一点：

items_counts = df['item'].value_counts()
max_item = items_counts.max()

As a result I get:

结果我得到：

ValueError: cannot convert float NaN to integer

As far as I understand, with the first line I get series in which the values from a column are used as key and frequency of these values are used as values. So, I just need to find the largest value in the series and, because of some reason, it does not work. Does anybody know how this problem can be solved?

据我了解，第一行我得到了一系列，其中列中的值用作键，这些值的频率用作值。所以，我只需要找到系列中的最大值，但由于某种原因，它不起作用。有谁知道如何解决这个问题？

Answer 1

采纳答案by beardc

It looks like you may have some nulls in the column. You can drop them with df = df.dropna(subset=['item']). Then df['item'].value_counts().max()should give you the max counts, and df['item'].value_counts().idxmax()should give you the most frequent value.

看起来您的列中可能有一些空值。您可以使用df = df.dropna(subset=['item']). 然后df['item'].value_counts().max()应该给你最大计数，并且df['item'].value_counts().idxmax()应该给你最频繁的值。

Answer 2

回答by jonathanrocher

You may also consider using scipy's modefunction which ignores NaN. A solution using it could look like:

您也可以考虑使用mode忽略 NaN 的scipy函数。使用它的解决方案可能如下所示：

from scipy.stats import mode
from numpy import nan
df = DataFrame({"a": [1,2,2,4,2], "b": [nan, nan, nan, 3, 3]})
print mode(df)

The output would look like

输出看起来像

(array([[ 2.,  3.]]), array([[ 3.,  2.]]))

meaning that the most common values are 2for the first columns and 3for the second, with frequencies 3and 2respectively.

这意味着最常见的值是2第一列和3第二列3，2分别有频率和。

Answer 3

回答by Anton Protopopov

To continue to @jonathanrocher answer you could use modein pandas DataFrame. It'll give a most frequent values (one or two) across the rows or columns:

要继续@jonathanrocher 回答，您可以mode在 Pandas DataFrame 中使用。它将在行或列中提供最频繁的值（一或两个）：

import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [1,2,2,4,2], "b": [np.nan, np.nan, np.nan, 3, 3]})

In [2]: df.mode()
Out[2]: 
   a    b
0  2  3.0

Answer 4

回答by jpp

Just take the first row of your items_countsseries:

只需取items_counts系列的第一行：

top = items_counts.head(1)  # or items_counts.iloc[[0]]
value, count = top.index[0], top.iat[0]

This works because pd.Series.value_countshas sort=Trueby default and so is already orderedby counts, highest count first. Extracting a value from an index by location has O(1) complexity, while pd.Series.idxmaxhas O(n) complexity where nis the number of categories.

这是有效的，因为默认情况下pd.Series.value_countshassort=True和 so已经按计数排序，最高计数在前。按位置从索引中提取值具有 O(1) 复杂度，而pd.Series.idxmax具有 O( n) 复杂度，其中n是类别数。

Specifying sort=Falseis still possible and then idxmaxis recommended:

指定sort=False仍然是可能的，然后idxmax建议：

items_counts = df['item'].value_counts(sort=False)
top = items_counts.loc[[items_counts.idxmax()]]
value, count = top.index[0], top.iat[0]

Notice in this case you don't need to call maxand idxmaxseparately, just extract the index via idxmaxand feed to the loclabel-based indexer.

请注意，在这种情况下，您不需要单独调用max和idxmax，只需通过提取索引idxmax并将其提供给loc基于标签的索引器即可。

Answer 5

回答by user9114146

Add this line of code to find the most frequent value

添加这行代码以查找最频繁的值

df["item"].value_counts().nlargest(n=1).values[0]

Answer 6

回答by Ambati Vaishnavi

The NaN values are omitted for calculating frequencies. Please check your code functionality hereBut you can use the below code for same functionality.

计算频率时省略 NaN 值。请在此处检查您的代码功能但您可以使用以下代码实现相同的功能。

**>> Code:**
    # Importing required module
    from collections import Counter

    # Creating a dataframe
    df = pd.DataFrame({ 'A':["jan","jan","jan","mar","mar","feb","jan","dec",
                             "mar","jan","dec"]  }) 
    # Creating a counter object
    count = Counter(df['A'])
    # Calling a method of Counter object(count)
    count.most_common(3)

**>> Output:**

    [('jan', 5), ('mar', 3), ('dec', 2)]

Python 如何获取列中出现频率最高的值的个数？

提问by Roman

采纳答案by beardc

回答by jonathanrocher

回答by Anton Protopopov

回答by jpp

回答by user9114146

回答by Ambati Vaishnavi

相关推荐

最近更新

标签

Python 如何获取列中出现频率最高的值的个数？

提问by Roman

采纳答案by beardc

回答by jonathanrocher

回答by Anton Protopopov

回答by jpp

回答by user9114146

回答by Ambati Vaishnavi

相关推荐

Python 在 Ruby 中是否有像 ||= 这样的“或等于”函数？

Python 是否存在重边阶跃函数？

Python 如何将 Nonetype 转换为 int 或 string？

如何通过 adb（或通过 Python 命令）获取 android 内核版本？

相关推荐

最近更新

标签