pandas 获取在熊猫的列中具有相同值的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21231478/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get rows that have the same value across its columns in pandas
提问by kentwait
In pandas, given a DataFrame D:
在 Pandas 中,给定一个 DataFrame D:
+-----+--------+--------+--------+
| | 1 | 2 | 3 |
+-----+--------+--------+--------+
| 0 | apple | banana | banana |
| 1 | orange | orange | orange |
| 2 | banana | apple | orange |
| 3 | NaN | NaN | NaN |
| 4 | apple | apple | apple |
+-----+--------+--------+--------+
How do I return rows that have the same contents across all of its columns when there are three columns or more such that it returns this:
当有三列或更多列时,如何返回在其所有列中具有相同内容的行,以便返回:
+-----+--------+--------+--------+
| | 1 | 2 | 3 |
+-----+--------+--------+--------+
| 1 | orange | orange | orange |
| 4 | apple | apple | apple |
+-----+--------+--------+--------+
Note that it skips rows when all values are NaN.
请注意,当所有值都是 NaN 时,它会跳过行。
If this were only two columns, I usually do D[D[1]==D[2]]but I don't know how to generalize this for more than 2 column DataFrames.
如果这只是两列,我通常会这样做,D[D[1]==D[2]]但我不知道如何将其概括为超过 2 列的 DataFrame。
采纳答案by lowtech
Similar to Andy Hayden answer with check if min equal to max (then row elements are all duplicates):
类似于 Andy Hayden 的回答,检查 min 是否等于 max (然后行元素都是重复的):
df[df.apply(lambda x: min(x) == max(x), 1)]
回答by DSM
My entry:
我的条目:
>>> df
0 1 2
0 apple banana banana
1 orange orange orange
2 banana apple orange
3 NaN NaN NaN
4 apple apple apple
[5 rows x 3 columns]
>>> df[df.apply(pd.Series.nunique, axis=1) == 1]
0 1 2
1 orange orange orange
4 apple apple apple
[2 rows x 3 columns]
This works because calling pd.Series.nuniqueon the rows gives:
这是有效的,因为调用pd.Series.nunique行给出:
>>> df.apply(pd.Series.nunique, axis=1)
0 2
1 1
2 3
3 0
4 1
dtype: int64
Note:this would, however, keep rows which look like [nan, nan, apple]or [nan, apple, apple]. Usually I want that, but that might be the wrong answer for your use case.
注意:然而,这会保留看起来像[nan, nan, apple]或 的行[nan, apple, apple]。通常我想要那个,但这可能是您用例的错误答案。
回答by Andy Hayden
I would check whether each row is equalto its first element:
我会检查每一行是否等于它的第一个元素:
In [11]: df.eq(df[1], axis='index') # Note: funky broadcasting with df == df[1]
Out[11]:
1 2 3
0 True False False
1 True True True
2 True False False
3 True True True
4 True True True
[5 rows x 3 columns]
If all in the row are True, then all elements in the row are the same:
如果行中的所有元素都为 True,则该行中的所有元素都相同:
In [12]: df.eq(df[1], axis='index').all(1)
Out[12]:
0 False
1 True
2 False
3 True
4 True
dtype: bool
Restrict just to the rows and optionally dropna:
仅限于行和可选的 dropna:
In [13]: df[df.eq(df[1], axis='index').all(1)]
Out[13]:
1 2 3
1 orange orange orange
3 NaN NaN NaN
4 apple apple apple
[3 rows x 3 columns]
In [14]: df[df.eq(df[1], axis='index').all(1)].dropna()
Out[14]:
1 2 3
1 orange orange orange
4 apple apple apple
[2 rows x 3 columns]
回答by Tu Dang
based on DSM's answer, you may want this method:
根据DSM 的回答,您可能需要这种方法:
import pandas as pd
def filter_data(df):
df = df.dropna(inplace = True)
df = df[df.apply(pd.Series.nunique, axis=1)]
return df
回答by Zero
In newer versions of pandas, you can use nunique
在较新版本的Pandas中,您可以使用 nunique
In [815]: df[df.nunique(1).eq(1)]
Out[815]:
0 1 2
1 orange orange orange
4 apple apple apple
Details
细节
In [816]: df
Out[816]:
0 1 2
0 apple banana banana
1 orange orange orange
2 banana apple orange
3 NaN NaN NaN
4 apple apple apple
In [817]: df.nunique(1)
Out[817]:
0 2
1 1
2 3
3 0
4 1
dtype: int64
In [818]: df.nunique(1).eq(1)
Out[818]:
0 False
1 True
2 False
3 False
4 True
dtype: bool
回答by Woody Pride
You could use set to create a list of the index locations that conform to your rule, and then use that list to slice the data frame. For example:
您可以使用 set 创建符合您规则的索引位置列表,然后使用该列表对数据框进行切片。例如:
import pandas as pd
import numpy as np
D = {0 : ['apple' , 'banana', 'banana'], 1 : ['orange', 'orange', 'orange'], 2: ['banana', 'apple', 'orange'], 3: [np.nan, np.nan, np.nan], 4 : ['apple', 'apple', 'apple']}
DF = pd.DataFrame(D).T
Equal = [row for row in DF.index if len(set(DF.iloc[row])) == 1]
DF.iloc[Equal]
Note that this excludes the missing value row without you having to expressly exclude missing values. This is because due to the nature of missing values in a series.
请注意,这会排除缺失值行,而无需明确排除缺失值。这是因为系列中缺失值的性质。

