pandas 检查数据框列中的所有值是否相同

Question

提问by HelloToEarth

I want to do a quick and easy check if all column values for countsare the same in a dataframe:

我想快速轻松地检查counts数据框中的所有列值是否相同：

In:

在：

import pandas as pd

d = {'names': ['Jim', 'Ted', 'Mal', 'Ted'], 'counts': [3, 4, 3, 3]}
pd.DataFrame(data=d)

Out:

出去：

  names  counts
0   Jim       3
1   Ted       4
2   Mal       3
3   Ted       3

I want just a simple condition that if all counts = same valuethen print('True').

我只想要一个简单的条件，if all counts = same value然后print('True').

Is there a fast way to do this?

有没有快速的方法来做到这一点？

Answer 1

回答by yatu

An efficient way to do this is by comparing the first value with the rest, and using all:

一种有效的方法是将第一个值与其余值进行比较，然后使用all：

def is_unique(s):
    a = s.to_numpy() # s.values (pandas<0.24)
    return (a[0] == a).all()

is_unique(df['counts'])
# False

For an entire dataframe

对于整个数据框

In the case of wanting to perform the same task on an entire dataframe, we can extend the above by setting axis=0in all:

在想要对整个数据帧执行相同任务的情况下，我们可以通过设置axis=0in来扩展上述内容all：

def unique_cols(df):
    a = df.to_numpy() # df.values (pandas<0.24)
    return (a[0] == a).all(0)

For the shared example, we'd get:

对于共享示例，我们将得到：

unique_cols(df)
# array([False, False])

Here's a benchmark of the above methods compared with some other approaches, such as using nunique(for a pd.Series):

这是上述方法与其他一些方法相比的基准，例如使用nunique(for a pd.Series)：

s_num = pd.Series(np.random.randint(0, 1_000, 1_100_000))

perfplot.show(
    setup=lambda n: s_num.iloc[:int(n)], 

    kernels=[
        lambda s: s.nunique() == 1,
        lambda s: is_unique(s)
    ],

    labels=['nunique', 'first_vs_rest'],
    n_range=[2**k for k in range(0, 20)],
    xlabel='N'
)

And bellow are the timings for a pd.DataFrame. Let's compare too with a numbaapproach, which is especially useful here since we can take advantage of short-cutting as soon as we see a repeated value in a given column (note: the numba approach will only work with numerical data):

下面是pd.DataFrame. 让我们与一种numba方法进行比较，这种方法在这里特别有用，因为我们可以在看到给定列中的重复值时立即利用快捷方式（注意：numba 方法仅适用于数字数据）：

from numba import njit

@njit
def unique_cols_nb(a):
    n_cols = a.shape[1]
    out = np.zeros(n_cols, dtype=np.int32)
    for i in range(n_cols):
        init = a[0, i]
        for j in a[1:, i]:
            if j != init:
                break
        else:
            out[i] = 1
    return out

If we compare the three methods:

如果我们比较这三种方法：

df = pd.DataFrame(np.concatenate([np.random.randint(0, 1_000, (500_000, 200)), 
                                  np.zeros((500_000, 10))], axis=1))

perfplot.show(
    setup=lambda n: df.iloc[:int(n),:], 

    kernels=[
        lambda df: (df.nunique(0) == 1).values,
        lambda df: unique_cols_nb(df.values).astype(bool),
        lambda df: unique_cols(df) 
    ],

    labels=['nunique', 'unique_cols_nb', 'unique_cols'],
    n_range=[2**k for k in range(0, 20)],
    xlabel='N'
)

Answer 2

回答by YOBEN_S

Update using np.unique

更新使用 np.unique

len(np.unique(df.counts))==1
False

Or

或者

len(set(df.counts.tolist()))==1

Or

或者

df.counts.eq(df.counts.iloc[0]).all()
False

Or

或者

df.counts.std()==0
False

Answer 3

回答by Michel de Ruiter

I think nuniquedoes much more work than necessary. Iteration can stop at the first difference. This simple and generic solution uses itertools:

我认为nunique做的工作比必要的要多得多。迭代可以在第一个差异处停止。这个简单而通用的解决方案使用itertools：

import itertools

def all_equal(iterable):
    "Returns True if all elements are equal to each other"
    g = itertools.groupby(iterable)
    return next(g, True) and not next(g, False)

all_equal(df.counts)

One can use this even to find allcolumns with constant contents in one go:

甚至可以使用它一次性找到所有内容不变的列：

constant_columns = df.columns[df.apply(all_equal)]

A slightly more readable but less performant alternative:

一个更易读但性能更差的替代方案：

df.counts.min() == df.counts.max()

Add skipna=Falsehere if necessary.

skipna=False如有必要，请在此处添加。

pandas 检查数据框列中的所有值是否相同

提问by HelloToEarth

回答by yatu

For an entire dataframe

对于整个数据框

回答by YOBEN_S

回答by Michel de Ruiter

相关推荐

最近更新

标签

pandas 检查数据框列中的所有值是否相同

提问by HelloToEarth

回答by yatu

For an entire dataframe

对于整个数据框

回答by YOBEN_S

回答by Michel de Ruiter

相关推荐

Pandas groupby 两列并绘制

pandas df.head() 和 df.head 有什么区别？

pandas 基于 Python 中的另一个数据框选择数据框的行

在 Pandas 数据框中混洗一列

相关推荐

最近更新

标签