Python 将包含字符串的 Pandas 系列转换为布尔值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17702272/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:57:02  来源:igfitidea点击:

Convert Pandas series containing string to boolean

pythonpandasbooleantype-conversionseries

提问by working4coins

I have a DataFrame named dfas

我有一个名为数据帧df作为

  Order Number       Status
1         1668  Undelivered
2        19771  Undelivered
3    100032108  Undelivered
4         2229    Delivered
5        00056  Undelivered

I would like to convert the Statuscolumn to boolean (Truewhen Status is Delivered and Falsewhen Status is Undelivered) but if Status is neither 'Undelivered' neither 'Delivered' it should be considered as NotANumberor something like that.

我想将Status列转换为布尔值(True当状态已交付False时状态未交付时)但如果状态既不是“未交付”也不是“已交付”,则应将其视为NotANumber或类似的内容。

I would like to use a dict

我想使用字典

d = {
  'Delivered': True,
  'Undelivered': False
}

so I could easily add other string which could be either considered as Trueor False.

所以我可以轻松添加其他字符串,这些字符串可以被视为TrueFalse

采纳答案by joris

You can just use map:

你可以只使用map

In [7]: df = pd.DataFrame({'Status':['Delivered', 'Delivered', 'Undelivered',
                                     'SomethingElse']})

In [8]: df
Out[8]:
          Status
0      Delivered
1      Delivered
2    Undelivered
3  SomethingElse

In [9]: d = {'Delivered': True, 'Undelivered': False}

In [10]: df['Status'].map(d)
Out[10]:
0     True
1     True
2    False
3      NaN
Name: Status, dtype: object

回答by Dan Allan

You've got everything you need. You'll be happy to discover replace:

你有你需要的一切。你会很高兴地发现replace

df.replace(d)

回答by Kappa Leonis

An example of replacemethod to replace values only in the specified column C2and get result as DataFrametype.

replace仅替换指定列中的值C2并将结果作为DataFrame类型获取的方法示例。

import pandas as pd
df = pd.DataFrame({'C1':['X', 'Y', 'X', 'Y'], 'C2':['Y', 'Y', 'X', 'X']})

  C1 C2
0  X  Y
1  Y  Y
2  X  X
3  Y  X

df.replace({'C2': {'X': True, 'Y': False}})

  C1     C2
0  X  False
1  Y  False
2  X   True
3  Y   True

回答by Yaakov Bressler

Expanding on the previous answers:

扩展以前的答案:

Map method explained:

地图方法解释:

  • Pandas will lookup each row's value in the corresponding ddictionary, replacing any found keys with values from d.
  • Values without keys in dwill be set as NaN. This can be corrected with fillna()methods.
  • Does not work on multiple columns, since pandas operates through serialization of pd.Serieshere.
  • Documentation: pd.Series.map
  • Pandas 将在相应的d字典中查找每一行的值,用来自 的值替换任何找到的键d
  • 没有键的值d将被设置为NaN. 这可以通过fillna()方法来纠正。
  • 不适用于多列,因为 Pandas 是通过pd.Serieshere 的序列化操作的。
  • 文档:pd.Series.map
d = {'Delivered': True, 'Undelivered': False}
df["Status"].map(d)

Replace method explained:

替换方法说明:

  • Pandas will lookup each row's value in the corresponding ddictionary, and attemptto replace any found keys with values from d.
  • Values without keys in dwill be be retained.
  • Works with single and multiple columns (pd.Seriesor pd.DataFrameobjects).
  • Documentation: pd.DataFrame.replace
  • Pandas 将在相应的d字典中查找每一行的值,并尝试用来自 的值替换任何找到的键d
  • 没有键的值d将被保留。
  • 适用于单列和多列(pd.Seriespd.DataFrame对象)。
  • 文档:pd.DataFrame.replace
d = {'Delivered': True, 'Undelivered': False}
df["Status"].replace(d)


Overall, the replace method is more robustand allows finer control over how data is mapped + how to handle missing or nan values.

总的来说,replace 方法更健壮,可以更好地控制数据的映射方式+如何处理缺失值或 nan 值。