Pandas:获取所有具有常量值的列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50582168/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:37:18  来源:igfitidea点击:

Pandas: Get all columns that have constant value

pythonpandas

提问by tbienias

I want to get the names of the columns which have same values across all rows for each column.

我想获取在每列的所有行中具有相同值的列的名称。

My data:

我的数据:

   A   B  C  D
0  1  hi  2  a
1  3  hi  2  b
2  4  hi  2  c

Desired output:

期望的输出:

['B', 'C']

Code:

代码:

import pandas as pd

d = {'A': [1,3,4], 'B': ['hi','hi','hi'], 'C': [2,2,2], 'D': ['a','b','c']}
df = pd.DataFrame(data=d)

I've been playing around with df.columnsand .any(), but can't figure out how to do this.

我一直在玩df.columns.any(),但不知道如何做到这一点。

回答by smci

Use the pandas not-so-well-known builtin nunique():

使用不太知名的内置Pandasnunique()

df.columns[df.nunique() <= 1]
Index(['B', 'C'], dtype='object')

Notes:

笔记:

  • Use dropna=Falseoption if you want na's counted as a separate value
  • It's the cleanest code, but not the fastest
  • dropna=False如果您希望将 na 计为单独的值,请使用选项
  • 这是最干净的代码,但不是最快的

回答by jezrael

Solution 1:

解决方案1:

c = [c for c in df.columns if len(set(df[c])) == 1]
print (c)

['B', 'C']

Solution 2:

解决方案2:

c = df.columns[df.eq(df.iloc[0]).all()].tolist()
print (c)
['B', 'C']

Explanation for Solution 2:

解决方案 2 的说明

First compare all rows to the first row with DataFrame.eq...

首先将所有行与第一行进行比较DataFrame.eq...

print (df.eq(df.iloc[0]))
       A     B     C      D
0   True  True  True   True
1  False  True  True  False
2  False  True  True  False

... then check each column is all Trues with DataFrame.all...

...然后检查每一列都是Trues 与DataFrame.all...

print (df.eq(df.iloc[0]).all())
A    False
B     True
C     True
D    False
dtype: bool

... finally filter columns' names for which result is True:

... 最后过滤结果为 True 的列名称:

print (df.columns[df.eq(df.iloc[0]).all()])
Index(['B', 'C'], dtype='object')

Timings:

时间

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(1000,100)))

df[np.random.randint(100, size=20)] = 100
print (df)

# Solution 1 (second-fastest):
In [243]: %timeit ([c for c in df.columns if len(set(df[c])) == 1])
3.59 ms ± 43.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Solution 2 (fastest):
In [244]: %timeit df.columns[df.eq(df.iloc[0]).all()].tolist()
1.62 ms ± 13.3 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

#Mohamed Thasin ah solution
In [245]: %timeit ([col for col in df.columns if len(df[col].unique())==1])
6.8 ms ± 352 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#jpp solution
In [246]: %%timeit
     ...: vals = df.apply(set, axis=0)
     ...: res = vals[vals.map(len) == 1].index
     ...: 
5.59 ms ± 64.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#smci solution 1
In [275]: %timeit df.columns[ df.nunique()==1 ]
11 ms ± 105 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#smci solution 2
In [276]: %timeit [col for col in df.columns if not df[col].is_unique]
9.25 ms ± 80 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#smci solution 3
In [277]: %timeit df.columns[ df.apply(lambda col: not col.is_unique) ]
11.1 ms ± 511 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

回答by Mohamed Thasin ah

try this,

尝试这个,

print [col for col in df.columns if len(df[col].unique())==1]

Output:

输出:

['B', 'C']

回答by jpp

You can use setand apply a filter on a series:

您可以set在系列上使用和应用过滤器:

vals = df.apply(set, axis=0)
res = vals[vals.map(len) == 1].index

print(res)

Index(['B', 'C'], dtype='object')

Use res.tolist()if having a list output is important.

使用res.tolist()如果有一个列表输出是很重要的。