Pandas:无法根据字符串相等进行过滤

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31303728/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:35:40  来源:igfitidea点击:

Pandas: cannot filter based on string equality

pythonstringpandasfilteringselection

提问by vpk

Using pandas 0.16.2 on python 2.7, OSX.

在 python 2.7、OSX 上使用 pandas 0.16.2。

I read a data-frame from a csv file like this:

我从这样的 csv 文件中读取数据帧:

import pandas as pd

data = pd.read_csv("my_csv_file.csv",sep='\t', skiprows=(0), header=(0))

The output of data.dtypesis:

的输出data.dtypes是:

name       object
weight     float64
ethnicity  object
dtype: object

I was expecting string types for name, and ethnicity. But I found reasons here on SO on why they're "object" in newer pandas versions.

我期待名称和种族的字符串类型。但是我在这里找到了关于为什么它们在较新的Pandas版本中是“对象”的原因。

Now, I want to select rows based on ethnicity, for example:

现在,我想根据种族选择行,例如:

data[data['ethnicity']=='Asian']
Out[3]: 
Empty DataFrame
Columns: [name, weight, ethnicity]
Index: []

I get the same result with data[data.ethnicity=='Asian']or data[data['ethnicity']=="Asian"].

我用data[data.ethnicity=='Asian']or 得到相同的结果data[data['ethnicity']=="Asian"]

But when I try the following:

但是当我尝试以下操作时:

data[data['ethnicity'].str.contains('Asian')].head(3)

I get the results I want.

我得到了我想要的结果。

However, I do not want to use "contains"- I would like to check for direct equality.

但是,我不想使用“包含”-我想检查直接相等性。

Please note that data[data['ethnicity'].str=='Asian']raises an error.

请注意,这data[data['ethnicity'].str=='Asian']会引发错误。

Am I doing something wrong? How to do this correctly?

难道我做错了什么?如何正确地做到这一点?

回答by unutbu

There is probably whitespace in your strings, for example,

例如,您的字符串中可能有空格,

data = pd.DataFrame({'ethnicity':[' Asian', '  Asian']})
data.loc[data['ethnicity'].str.contains('Asian'), 'ethnicity'].tolist()
# [' Asian', '  Asian']
print(data[data['ethnicity'].str.contains('Asian')])

yields

产量

  ethnicity
0     Asian
1     Asian

To strip the leading or trailing whitespace off the strings, you could use

要从字符串中去除前导或尾随空格,您可以使用

data['ethnicity'] = data['ethnicity'].str.strip()

after which,

之后,

data.loc[data['ethnicity'] == 'Asian']

yields

产量

  ethnicity
0     Asian
1     Asian

回答by Daniel Martin

You might try this:

你可以试试这个:

data[data['ethnicity'].str.strip()=='Asian']