Pandas DataFrame 中哪些列是二进制的？

Question

提问by na899

I have a pandas dataframe with a large number of columns and I need to find which columns are binary (with values 0 or 1 only) without looking at the data. Which function should be used?

我有一个包含大量列的 Pandas 数据框，我需要在不查看数据的情况下找到哪些列是二进制的（只有值 0 或 1）。应该使用哪个功能？

Answer 1

回答by Alexander

To my knowledge, there is no direct function to test for this. Rather, you need to build something based on how the data was encoded (e.g. 1/0, T/F, True/False, etc.). In addition, if your column has a missing value, the entire column will be encoded as a float instead of an int.

据我所知，没有直接的函数来测试这个。相反，您需要根据数据的编码方式（例如 1/0、T/F、True/False 等）构建一些东西。此外，如果您的列有缺失值，整个列将被编码为浮点数而不是整数。

In the example below, I test whether all unique non null values are either '1' or '0'. It returns a list of all such columns.

在下面的示例中，我测试所有唯一的非空值是“1”还是“0”。它返回所有此类列的列表。

df = pd.DataFrame({'bool': [1, 0, 1, None], 
                   'floats': [1.2, 3.1, 4.4, 5.5], 
                   'ints': [1, 2, 3, 4], 
                   'str': ['a', 'b', 'c', 'd']})

bool_cols = [col for col in df 
             if df[[col]].dropna().unique().isin([0, 1]).all().values]

# 2019-09-10 EDIT (per Hardik Gupta)
bool_cols = [col for col in df 
             if np.isin(df[col].dropna().unique(), [0, 1]).all()]

>>> bool_cols
['bool']

>>> df[bool_cols]
   bool
0     1
1     0
2     1
3   NaN

Answer 2

回答by lucas

def is_binary(series, allow_na=False):
    if allow_na:
        series.dropna(inplace=True)
    return sorted(series.unique()) == [0, 1]

This is the most efficient solution I found. It is quicker than the answers above. When handling large data sets, the difference in timing becomes relevant.

这是我找到的最有效的解决方案。它比上面的答案更快。在处理大型数据集时，时间差异变得重要。

Answer 3

回答by Aiden

To expand on the answer just above, using value_counts().index instead of unique() should do the trick:

为了扩展上面的答案，使用 value_counts().index 而不是 unique() 应该可以解决问题：

bool_cols = [col for col in df if 
               df[col].dropna().value_counts().index.isin([0,1]).all()]

Answer 4

回答by sedeh

Improving upon @Aiden to avoid returning an empty column:

改进@Aiden 以避免返回空列：

[col for col in df if (len(df[col].value_counts()) > 0) & all(df[col].value_counts().index.isin([0, 1]))]

Answer 5

回答by Hardik Gupta

Using Alexander's answer, with python version - 3.6.6

使用 Alexander 的回答，python 版本 - 3.6.6

[col for col in df if np.isin(df[col].unique(), [0, 1]).all()]

Pandas DataFrame 中哪些列是二进制的？

提问by na899

回答by Alexander

回答by lucas

回答by Aiden

回答by sedeh

回答by Hardik Gupta

相关推荐

最近更新

标签

Pandas DataFrame 中哪些列是二进制的？

提问by na899

回答by Alexander

回答by lucas

回答by Aiden

回答by sedeh

回答by Hardik Gupta

相关推荐

pandas Python：降低精度熊猫时间戳数据帧

pandas 在将总行附加到数据帧后删除熊猫数据帧索引的名称

如果日期不是工作日，Pandas 将 DatetimeIndex 偏移到下一个业务

Pandas python .describe() 格式/输出

相关推荐

最近更新

标签