Pandas Dataframe 查找所有列相等的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22701799/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Dataframe Find Rows Where all Columns Equal
提问by Lisa L
I have a dataframe that has characters in it - I want a boolean result by row that tells me if all columns for that row have the same value.
我有一个包含字符的数据框 - 我想要一个按行的布尔结果,告诉我该行的所有列是否具有相同的值。
For example, I have
例如,我有
df = [  a   b   c   d
0  'C'   'C'   'C'   'C' 
1  'C'   'C'   'A'   'A'
2  'A'   'A'   'A'   'A' ]
and I want the result to be
我希望结果是
0  True
1  False
2  True
I've tried .all but it seems I can only check if all are equal to one letter. The only other way I can think of doing it is by doing a unique on each row and see if that equals 1? Thanks in advance.
我试过 .all 但似乎我只能检查所有是否等于一个字母。我能想到的唯一另一种方法是在每一行上做一个唯一的,看看它是否等于 1?提前致谢。
回答by Andy Hayden
I think the cleanest way is to check all columns against the first column using eq:
我认为最干净的方法是使用 eq 对照第一列检查所有列:
In [11]: df
Out[11]: 
   a  b  c  d
0  C  C  C  C
1  C  C  A  A
2  A  A  A  A
In [12]: df.iloc[:, 0]
Out[12]: 
0    C
1    C
2    A
Name: a, dtype: object
In [13]: df.eq(df.iloc[:, 0], axis=0)
Out[13]: 
      a     b      c      d
0  True  True   True   True
1  True  True  False  False
2  True  True   True   True
Now you can use all (if they are all equal to the first item, they are all equal):
现在您可以使用 all(如果它们都等于第一项,则它们都相等):
In [14]: df.eq(df.iloc[:, 0], axis=0).all(1)
Out[14]: 
0     True
1    False
2     True
dtype: bool
回答by jezrael
Compare arrayby first column and check if all Trues per row:
array按第一列进行比较并检查True每行是否所有s:
Same solution in numpy for better performance:
numpy 中的相同解决方案以获得更好的性能:
a = df.values
b = (a == a[:, [0]]).all(axis=1)
print (b)
[ True  True False]
And if need Series:
如果需要Series:
s = pd.Series(b, axis=df.index)
Comparing solutions:
比较解决方案:
data = [[10,10,10],[12,12,12],[10,12,10]]
df = pd.DataFrame(data,columns=['Col1','Col2','Col3'])
#[30000 rows x 3 columns]
df = pd.concat([df] * 10000, ignore_index=True)
#jez - numpy array
In [14]: %%timeit
    ...: a = df.values
    ...: b = (a == a[:, [0]]).all(axis=1)
141 μs ± 3.23 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#jez - Series 
In [15]: %%timeit
    ...: a = df.values
    ...: b = (a == a[:, [0]]).all(axis=1)
    ...: pd.Series(b, index=df.index)
169 μs ± 2.02 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#Andy Hayden
In [16]: %%timeit
    ...: df.eq(df.iloc[:, 0], axis=0).all(axis=1)
2.22 ms ± 68.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#Wen1
In [17]: %%timeit
    ...: list(map(lambda x : len(set(x))==1,df.values))
56.8 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#K.-Michael Aye
In [18]: %%timeit
    ...: df.apply(lambda x: len(set(x)) == 1, axis=1)
686 ms ± 23.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Wen2    
In [19]: %%timeit
    ...: df.nunique(1).eq(1)
2.87 s ± 115 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
回答by YOBEN_S
nunique: New in version 0.20.0.(Base on timing benchmarkfrom Jez , if performance is not important you can using this one)
nunique:0.20.0 版的新功能。(基于Jez 的计时基准,如果性能不重要,您可以使用这个)
df.nunique(axis = 1).eq(1)
Out[308]: 
0     True
1    False
2     True
dtype: bool
Or you can using mapwith set
或者你也可以使用map与set
list(map(lambda x : len(set(x))==1,df.values))
回答by K.-Michael Aye
df = pd.DataFrame.from_dict({'a':'C C A'.split(),
                        'b':'C C A'.split(),
                        'c':'C A A'.split(),
                        'd':'C A A'.split()})
df.apply(lambda x: len(set(x)) == 1, axis=1)
0     True
1    False
2     True
dtype: bool
Explanation: set(x) has only 1 element, if all elements of the row are the same. The axis=1 option applies any given function over the rows instead.
说明:如果行的所有元素都相同,则 set(x) 只有 1 个元素。axis=1 选项改为在行上应用任何给定的函数。
回答by Duke
You can use nunique(axis=1)so the results (added to a new column) can be obtained by:
您可以使用nunique(axis=1)这样的结果(添加到新列)可以通过以下方式获得:
df['unique'] = df.nunique(axis=1) == 1
The answer by @yo-and-ben-w uses eq(1)but I think == 1is easier to read.
@yo-and-ben-w 的答案使用eq(1)但我认为== 1更容易阅读。

