Python 将包含字符串的 Pandas 系列转换为布尔值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17702272/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert Pandas series containing string to boolean
提问by working4coins
I have a DataFrame named df
as
我有一个名为数据帧df
作为
Order Number Status
1 1668 Undelivered
2 19771 Undelivered
3 100032108 Undelivered
4 2229 Delivered
5 00056 Undelivered
I would like to convert the Status
column to boolean (True
when Status is Delivered and False
when Status is Undelivered)
but if Status is neither 'Undelivered' neither 'Delivered' it should be considered as NotANumber
or something like that.
我想将Status
列转换为布尔值(True
当状态已交付False
时状态未交付时)但如果状态既不是“未交付”也不是“已交付”,则应将其视为NotANumber
或类似的内容。
I would like to use a dict
我想使用字典
d = {
'Delivered': True,
'Undelivered': False
}
so I could easily add other string which could be either considered as True
or False
.
所以我可以轻松添加其他字符串,这些字符串可以被视为True
或False
。
采纳答案by joris
You can just use map
:
你可以只使用map
:
In [7]: df = pd.DataFrame({'Status':['Delivered', 'Delivered', 'Undelivered',
'SomethingElse']})
In [8]: df
Out[8]:
Status
0 Delivered
1 Delivered
2 Undelivered
3 SomethingElse
In [9]: d = {'Delivered': True, 'Undelivered': False}
In [10]: df['Status'].map(d)
Out[10]:
0 True
1 True
2 False
3 NaN
Name: Status, dtype: object
回答by Dan Allan
回答by Kappa Leonis
An example of replace
method to replace values only in the specified column C2
and get result as DataFrame
type.
replace
仅替换指定列中的值C2
并将结果作为DataFrame
类型获取的方法示例。
import pandas as pd
df = pd.DataFrame({'C1':['X', 'Y', 'X', 'Y'], 'C2':['Y', 'Y', 'X', 'X']})
C1 C2
0 X Y
1 Y Y
2 X X
3 Y X
df.replace({'C2': {'X': True, 'Y': False}})
C1 C2
0 X False
1 Y False
2 X True
3 Y True
回答by Yaakov Bressler
Expanding on the previous answers:
扩展以前的答案:
Map method explained:
地图方法解释:
- Pandas will lookup each row's value in the corresponding
d
dictionary, replacing any found keys with values fromd
. - Values without keys in
d
will be set asNaN
. This can be corrected withfillna()
methods. - Does not work on multiple columns, since pandas operates through serialization of
pd.Series
here. - Documentation: pd.Series.map
- Pandas 将在相应的
d
字典中查找每一行的值,用来自 的值替换任何找到的键d
。 - 没有键的值
d
将被设置为NaN
. 这可以通过fillna()
方法来纠正。 - 不适用于多列,因为 Pandas 是通过
pd.Series
here 的序列化操作的。 - 文档:pd.Series.map
d = {'Delivered': True, 'Undelivered': False}
df["Status"].map(d)
Replace method explained:
替换方法说明:
- Pandas will lookup each row's value in the corresponding
d
dictionary, and attemptto replace any found keys with values fromd
. - Values without keys in
d
will be be retained. - Works with single and multiple columns (
pd.Series
orpd.DataFrame
objects). - Documentation: pd.DataFrame.replace
- Pandas 将在相应的
d
字典中查找每一行的值,并尝试用来自 的值替换任何找到的键d
。 - 没有键的值
d
将被保留。 - 适用于单列和多列(
pd.Series
或pd.DataFrame
对象)。 - 文档:pd.DataFrame.replace
d = {'Delivered': True, 'Undelivered': False}
df["Status"].replace(d)
Overall, the replace method is more robustand allows finer control over how data is mapped + how to handle missing or nan values.
总的来说,replace 方法更健壮,可以更好地控制数据的映射方式+如何处理缺失值或 nan 值。