Pandas:无法根据字符串相等进行过滤
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31303728/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: cannot filter based on string equality
提问by vpk
Using pandas 0.16.2 on python 2.7, OSX.
在 python 2.7、OSX 上使用 pandas 0.16.2。
I read a data-frame from a csv file like this:
我从这样的 csv 文件中读取数据帧:
import pandas as pd
data = pd.read_csv("my_csv_file.csv",sep='\t', skiprows=(0), header=(0))
The output of data.dtypesis:
的输出data.dtypes是:
name object
weight float64
ethnicity object
dtype: object
I was expecting string types for name, and ethnicity. But I found reasons here on SO on why they're "object" in newer pandas versions.
我期待名称和种族的字符串类型。但是我在这里找到了关于为什么它们在较新的Pandas版本中是“对象”的原因。
Now, I want to select rows based on ethnicity, for example:
现在,我想根据种族选择行,例如:
data[data['ethnicity']=='Asian']
Out[3]:
Empty DataFrame
Columns: [name, weight, ethnicity]
Index: []
I get the same result with data[data.ethnicity=='Asian']or data[data['ethnicity']=="Asian"].
我用data[data.ethnicity=='Asian']or 得到相同的结果data[data['ethnicity']=="Asian"]。
But when I try the following:
但是当我尝试以下操作时:
data[data['ethnicity'].str.contains('Asian')].head(3)
I get the results I want.
我得到了我想要的结果。
However, I do not want to use "contains"- I would like to check for direct equality.
但是,我不想使用“包含”-我想检查直接相等性。
Please note that data[data['ethnicity'].str=='Asian']raises an error.
请注意,这data[data['ethnicity'].str=='Asian']会引发错误。
Am I doing something wrong? How to do this correctly?
难道我做错了什么?如何正确地做到这一点?
回答by unutbu
There is probably whitespace in your strings, for example,
例如,您的字符串中可能有空格,
data = pd.DataFrame({'ethnicity':[' Asian', ' Asian']})
data.loc[data['ethnicity'].str.contains('Asian'), 'ethnicity'].tolist()
# [' Asian', ' Asian']
print(data[data['ethnicity'].str.contains('Asian')])
yields
产量
ethnicity
0 Asian
1 Asian
To strip the leading or trailing whitespace off the strings, you could use
要从字符串中去除前导或尾随空格,您可以使用
data['ethnicity'] = data['ethnicity'].str.strip()
after which,
之后,
data.loc[data['ethnicity'] == 'Asian']
yields
产量
ethnicity
0 Asian
1 Asian
回答by Daniel Martin
You might try this:
你可以试试这个:
data[data['ethnicity'].str.strip()=='Asian']

