Python pandas 数据框将 INT64 列转换为布尔值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18748171/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:41:32  来源:igfitidea点击:

pandas data frame transform INT64 columns to boolean

pythonnumpybooleanpandas

提问by user1893148

Some column in dataframe df, df.column, is stored as datatype int64.

数据帧 df 中的某些列 df.column 存储为数据类型 int64。

The values are all 1s or 0s.

这些值都是 1 或 0。

Is there a way to replace these values with boolean values?

有没有办法用布尔值替换这些值?

回答by unutbu

df['column_name'] = df['column_name'].astype('bool')

For example:

例如:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.random_integers(0,1,size=5), 
                  columns=['foo'])
print(df)
#    foo
# 0    0
# 1    1
# 2    0
# 3    1
# 4    1

df['foo'] = df['foo'].astype('bool')
print(df)

yields

产量

     foo
0  False
1   True
2  False
3   True
4   True


Given a list of column_names, you could convert multiple columns to booldtype using:

给定 的列表column_names,您可以bool使用以下方法将多列转换为dtype:

df[column_names] = df[column_names].astype(bool)

If you don't have a list of column names, but wish to convert, say, all numeric columns, then you could use

如果您没有列名列表,但希望转换所有数字列,那么您可以使用

column_names = df.select_dtypes(include=[np.number]).columns
df[column_names] = df[column_names].astype(bool)

回答by mel el

Reference: Stack Overflow unutbu (Jan 9 at 13:25), BrenBarn (Sep 18 2017)

参考:Stack Overflow unutbu(1 月 9 日 13:25),BrenBarn(2017 年 9 月 18 日)

I had numerical columns like age and ID which I did not want to convert to Boolean. So after identifying the numerical columns like unutbu showed us, I filtered out the columns which had a maximum more than 1.

我有像年龄和 ID 这样的数字列,我不想将它们转换为布尔值。因此,在确定像 unutbu 向我们展示的数字列之后,我过滤掉了最大值超过 1 的列。

# code as per unutbu
column_names = df.select_dtypes(include=[np.number]).columns 

# re-extracting the columns of numerical type (using awesome np.number1 :)) then getting the max of those and storing them in a temporary variable m.
m=df[df.select_dtypes(include=[np.number]).columns].max().reset_index(name='max')

# I then did a filter like BrenBarn showed in another post to extract the rows which had the max == 1 and stored it in a temporary variable n.
n=m.loc[m['max']==1, 'max']

# I then extracted the indexes of the rows from n and stored them in temporary variable p.
# These indexes are the same as the indexes from my original dataframe 'df'.
p=column_names[n.index]

# I then used the final piece of the code from unutbu calling the indexes of the rows which had the max == 1 as stored in my variable p.
# If I used column_names directly instead of p, all my numerical columns would turn into Booleans.
df[p] = df[p].astype(bool)