pandas 如何将布尔值的数据帧转换为 1 和 np.NaN 的数据帧?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14178913/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:34:12  来源:igfitidea点击:

how to convert dataframe of booleans to dataframe of 1 and np.NaN?

pandas

提问by d l

I have a dataframe filled with True and False values, and I'd like to get a dataframe from it with the True replaced with 1 and the False replaced with np.NaN. I've tried using dataframe.replace, but it gave a dataframe filled with all True. Is there a way to do it without using for loops and if's?

我有一个填充了 True 和 False 值的数据框,我想从中获取一个数据框,其中 True 替换为 1,False 替换为 np.NaN。我试过使用 dataframe.replace,但它提供了一个填充所有 True 的数据框。有没有办法在不使用 for 循环和 if 的情况下做到这一点?

Example, this is the dataframe I have, with T for True and F for False (not strings 'T' and 'F'; sorry, could not figure out how to format a nicely spaced table in the wiki):

例如,这是我拥有的数据框,T 代表 True,F 代表 False(不是字符串 'T' 和 'F';抱歉,无法弄清楚如何在 wiki 中格式化一个间隔良好的表格):

2008-01-02 16:00:00 T T F
2008-01-03 16:00:00 T T T
2008-01-04 16:00:00 T T F
2008-01-07 16:00:00 T T T
2008-01-08 16:00:00 T T F

2008-01-02 16:00:00 TTF
2008-01-03 16:00:00 TTT
2008-01-04 16:00:00 TTF
2008-01-07 16:00:00 TTT
2008-01-08 16 :00:00 TTF

This is what I would like to change it to:

这是我想将其更改为:

2008-01-02 16:00:00 1 1 np.NaN
2008-01-03 16:00:00 1 1 1
2008-01-04 16:00:00 1 1 np.NaN
2008-01-07 16:00:00 1 1 1
2008-01-08 16:00:00 1 1 np.NaN

2008-01-02 16:00:00 1 1 np.NaN
2008-01-03 16:00:00 1 1 1
2008-01-04 16:00:00 1 1 np.NaN
2008-01-07 16: 00:00 1 1 1
2008-01-08 16:00:00 1 1 np.NaN

These are the lines I tried to replace the True and False, and got a dataframe filled with all True values:

这些是我尝试替换 True 和 False 的行,并得到一个填充所有 True 值的数据框:

df.replace(to_replace=True, value=1, inplace=True, method=None)   
df.replace(to_replace=False, value=np.NAN, inplace=True, method=None)  

When tried separately, the first line alone does not change anything; the second line converts all the values to True.

单独尝试时,仅第一行不会改变任何内容;第二行将所有值转换为 True。

回答by Zelazny7

applymap()can be used to apply a function to every element of a dataframe

applymap()可用于将函数应用于 a 的每个元素 dataframe

In [1]: df = DataFrame([[True, True, False],[False, False, True]]).T

In [2]: df
Out[2]:
       0      1
0   True  False
1   True  False
2  False   True

In [3]: df.applymap(lambda x: 1 if x else np.nan)
Out[3]:
    0   1
0   1 NaN
1   1 NaN
2 NaN   1

You can also use a dict:

您还可以使用dict

In [4]: d = {True:1, False:np.nan}

In [5]: df.applymap(lambda x: d[x])
Out[5]:
    0   1
0   1 NaN
1   1 NaN
2 NaN   1

Addressing DSM's comment from below. I misread the OP and assumed the datetime was an index. If it's not an index this worked for me:

从下面解决 DSM 的评论。我误读了 OP 并认为日期时间是一个索引。如果它不是对我有用的索引:

In [6]: df.applymap(lambda x: d.get(x,x))
Out[6]:
    0   1                    2
0   1 NaN  2012-01-01 00:00:00
1 NaN   1  2012-01-01 00:00:00

回答by Jeff

try this. whereworks because the first use by default nans out the not-found entries (e.g. anything that is not == 'T'), then 2nd replaces the non-found entries with the 1

尝试这个。where有效,因为默认情况nan下第一次使用会排除未找到的条目(例如,任何不是 == 'T' 的内容),然后 2nd 将未找到的条目替换为 1

In [48]: df = pd.DataFrame([ 'T', 'T', 'T', 'F', 'F' ], columns=['value'],index=pd.date_range('20010101',periods=5))

In [49]: df
Out[49]: 
           value
2001-01-01     T
2001-01-02     T
2001-01-03     T
2001-01-04     F
2001-01-05     F

In [50]: df.where(df=='T').where(df!='T',1)
Out[50]: 
           value
2001-01-01     1
2001-01-02     1
2001-01-03     1
2001-01-04   NaN
2001-01-05   NaN