Pandas DataFrame 中哪些列是二进制的?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32982034/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Which columns are binary in a Pandas DataFrame?
提问by na899
I have a pandas dataframe with a large number of columns and I need to find which columns are binary (with values 0 or 1 only) without looking at the data. Which function should be used?
我有一个包含大量列的 Pandas 数据框,我需要在不查看数据的情况下找到哪些列是二进制的(只有值 0 或 1)。应该使用哪个功能?
回答by Alexander
To my knowledge, there is no direct function to test for this. Rather, you need to build something based on how the data was encoded (e.g. 1/0, T/F, True/False, etc.). In addition, if your column has a missing value, the entire column will be encoded as a float instead of an int.
据我所知,没有直接的函数来测试这个。相反,您需要根据数据的编码方式(例如 1/0、T/F、True/False 等)构建一些东西。此外,如果您的列有缺失值,整个列将被编码为浮点数而不是整数。
In the example below, I test whether all unique non null values are either '1' or '0'. It returns a list of all such columns.
在下面的示例中,我测试所有唯一的非空值是“1”还是“0”。它返回所有此类列的列表。
df = pd.DataFrame({'bool': [1, 0, 1, None],
'floats': [1.2, 3.1, 4.4, 5.5],
'ints': [1, 2, 3, 4],
'str': ['a', 'b', 'c', 'd']})
bool_cols = [col for col in df
if df[[col]].dropna().unique().isin([0, 1]).all().values]
# 2019-09-10 EDIT (per Hardik Gupta)
bool_cols = [col for col in df
if np.isin(df[col].dropna().unique(), [0, 1]).all()]
>>> bool_cols
['bool']
>>> df[bool_cols]
bool
0 1
1 0
2 1
3 NaN
回答by lucas
def is_binary(series, allow_na=False):
if allow_na:
series.dropna(inplace=True)
return sorted(series.unique()) == [0, 1]
This is the most efficient solution I found. It is quicker than the answers above. When handling large data sets, the difference in timing becomes relevant.
这是我找到的最有效的解决方案。它比上面的答案更快。在处理大型数据集时,时间差异变得重要。
回答by Aiden
To expand on the answer just above, using value_counts().index instead of unique() should do the trick:
为了扩展上面的答案,使用 value_counts().index 而不是 unique() 应该可以解决问题:
bool_cols = [col for col in df if
df[col].dropna().value_counts().index.isin([0,1]).all()]
回答by sedeh
Improving upon @Aiden to avoid returning an empty column:
改进@Aiden 以避免返回空列:
[col for col in df if (len(df[col].value_counts()) > 0) & all(df[col].value_counts().index.isin([0, 1]))]
回答by Hardik Gupta
Using Alexander's answer, with python version - 3.6.6
使用 Alexander 的回答,python 版本 - 3.6.6
[col for col in df if np.isin(df[col].unique(), [0, 1]).all()]

