pandas 获取在熊猫的列中具有相同值的行

Question

提问by kentwait

In pandas, given a DataFrame D:

在 Pandas 中，给定一个 DataFrame D：

+-----+--------+--------+--------+   
|     |    1   |    2   |    3   |
+-----+--------+--------+--------+
|  0  | apple  | banana | banana |
|  1  | orange | orange | orange |
|  2  | banana | apple  | orange |
|  3  | NaN    | NaN    | NaN    |
|  4  | apple  | apple  | apple  |
+-----+--------+--------+--------+

How do I return rows that have the same contents across all of its columns when there are three columns or more such that it returns this:

当有三列或更多列时，如何返回在其所有列中具有相同内容的行，以便返回：

+-----+--------+--------+--------+   
|     |    1   |    2   |    3   |
+-----+--------+--------+--------+
|  1  | orange | orange | orange |
|  4  | apple  | apple  | apple  |
+-----+--------+--------+--------+

Note that it skips rows when all values are NaN.

请注意，当所有值都是 NaN 时，它会跳过行。

If this were only two columns, I usually do D[D[1]==D[2]]but I don't know how to generalize this for more than 2 column DataFrames.

如果这只是两列，我通常会这样做，D[D[1]==D[2]]但我不知道如何将其概括为超过 2 列的 DataFrame。

Answer 1

采纳答案by lowtech

Similar to Andy Hayden answer with check if min equal to max (then row elements are all duplicates):

类似于 Andy Hayden 的回答，检查 min 是否等于 max （然后行元素都是重复的）：

df[df.apply(lambda x: min(x) == max(x), 1)]

Answer 2

回答by DSM

My entry:

我的条目：

>>> df
        0       1       2
0   apple  banana  banana
1  orange  orange  orange
2  banana   apple  orange
3     NaN     NaN     NaN
4   apple   apple   apple

[5 rows x 3 columns]
>>> df[df.apply(pd.Series.nunique, axis=1) == 1]
        0       1       2
1  orange  orange  orange
4   apple   apple   apple

[2 rows x 3 columns]

This works because calling pd.Series.nuniqueon the rows gives:

这是有效的，因为调用pd.Series.nunique行给出：

>>> df.apply(pd.Series.nunique, axis=1)
0    2
1    1
2    3
3    0
4    1
dtype: int64

Note:this would, however, keep rows which look like [nan, nan, apple]or [nan, apple, apple]. Usually I want that, but that might be the wrong answer for your use case.

注意：然而，这会保留看起来像[nan, nan, apple]或的行[nan, apple, apple]。通常我想要那个，但这可能是您用例的错误答案。

Answer 3

回答by Andy Hayden

I would check whether each row is equalto its first element:

我会检查每一行是否等于它的第一个元素：

In [11]: df.eq(df[1], axis='index')  # Note: funky broadcasting with df == df[1]
Out[11]: 
      1      2      3
0  True  False  False
1  True   True   True
2  True  False  False
3  True   True   True
4  True   True   True

[5 rows x 3 columns]

If all in the row are True, then all elements in the row are the same:

如果行中的所有元素都为 True，则该行中的所有元素都相同：

In [12]: df.eq(df[1], axis='index').all(1)
Out[12]: 
0    False
1     True
2    False
3     True
4     True
dtype: bool

Restrict just to the rows and optionally dropna:

仅限于行和可选的 dropna：

In [13]: df[df.eq(df[1], axis='index').all(1)]
Out[13]: 
        1       2       3
1  orange  orange  orange
3     NaN     NaN     NaN
4   apple   apple   apple

[3 rows x 3 columns]

In [14]: df[df.eq(df[1], axis='index').all(1)].dropna()
Out[14]: 
        1       2       3
1  orange  orange  orange
4   apple   apple   apple

[2 rows x 3 columns]

Answer 4

回答by Tu Dang

based on DSM's answer, you may want this method:

根据DSM 的回答，您可能需要这种方法：

import pandas as pd

def filter_data(df):
    df = df.dropna(inplace = True)
    df = df[df.apply(pd.Series.nunique, axis=1)]
    return df

Answer 5

回答by Zero

In newer versions of pandas, you can use nunique

在较新版本的Pandas中，您可以使用 nunique

In [815]: df[df.nunique(1).eq(1)]
Out[815]:
        0       1       2
1  orange  orange  orange
4   apple   apple   apple

Details

细节

In [816]: df
Out[816]:
        0       1       2
0   apple  banana  banana
1  orange  orange  orange
2  banana   apple  orange
3     NaN     NaN     NaN
4   apple   apple   apple

In [817]: df.nunique(1)
Out[817]:
0    2
1    1
2    3
3    0
4    1
dtype: int64

In [818]: df.nunique(1).eq(1)
Out[818]:
0    False
1     True
2    False
3    False
4     True
dtype: bool

Answer 6

回答by Woody Pride

You could use set to create a list of the index locations that conform to your rule, and then use that list to slice the data frame. For example:

您可以使用 set 创建符合您规则的索引位置列表，然后使用该列表对数据框进行切片。例如：

import pandas as pd
import numpy as np

D = {0  : ['apple' , 'banana', 'banana'], 1 : ['orange', 'orange', 'orange'], 2: ['banana', 'apple', 'orange'], 3: [np.nan, np.nan, np.nan], 4 : ['apple', 'apple', 'apple']} 
DF = pd.DataFrame(D).T

Equal = [row for row in DF.index if len(set(DF.iloc[row])) == 1]

DF.iloc[Equal]

Note that this excludes the missing value row without you having to expressly exclude missing values. This is because due to the nature of missing values in a series.

请注意，这会排除缺失值行，而无需明确排除缺失值。这是因为系列中缺失值的性质。

pandas 获取在熊猫的列中具有相同值的行

提问by kentwait

采纳答案by lowtech

回答by DSM

回答by Andy Hayden

回答by Tu Dang

回答by Zero

回答by Woody Pride

相关推荐

最近更新

标签

pandas 获取在熊猫的列中具有相同值的行

提问by kentwait

采纳答案by lowtech

回答by DSM

回答by Andy Hayden

回答by Tu Dang

回答by Zero

回答by Woody Pride

相关推荐

pandas 熊猫，将系列连接到 DF 作为行

pandas 在将数据帧写入 csv 文件时解决错误“分隔符必须是 1 个字符的字符串”

使用 Pandas 数据帧中的值注释热图

如何从 Python Pandas 系列或数据框中的行中删除省略号，当长行/宽列被截断时显示？

相关推荐

最近更新

标签