Pandas：无法根据字符串相等进行过滤

Question

提问by vpk

Using pandas 0.16.2 on python 2.7, OSX.

在 python 2.7、OSX 上使用 pandas 0.16.2。

I read a data-frame from a csv file like this:

我从这样的 csv 文件中读取数据帧：

import pandas as pd

data = pd.read_csv("my_csv_file.csv",sep='\t', skiprows=(0), header=(0))

The output of data.dtypesis:

的输出data.dtypes是：

name       object
weight     float64
ethnicity  object
dtype: object

I was expecting string types for name, and ethnicity. But I found reasons here on SO on why they're "object" in newer pandas versions.

我期待名称和种族的字符串类型。但是我在这里找到了关于为什么它们在较新的Pandas版本中是“对象”的原因。

Now, I want to select rows based on ethnicity, for example:

现在，我想根据种族选择行，例如：

data[data['ethnicity']=='Asian']
Out[3]: 
Empty DataFrame
Columns: [name, weight, ethnicity]
Index: []

I get the same result with data[data.ethnicity=='Asian']or data[data['ethnicity']=="Asian"].

我用data[data.ethnicity=='Asian']or 得到相同的结果data[data['ethnicity']=="Asian"]。

But when I try the following:

但是当我尝试以下操作时：

data[data['ethnicity'].str.contains('Asian')].head(3)

I get the results I want.

我得到了我想要的结果。

However, I do not want to use "contains"- I would like to check for direct equality.

但是，我不想使用“包含”-我想检查直接相等性。

Please note that data[data['ethnicity'].str=='Asian']raises an error.

请注意，这data[data['ethnicity'].str=='Asian']会引发错误。

Am I doing something wrong? How to do this correctly?

难道我做错了什么？如何正确地做到这一点？

Answer 1

回答by unutbu

There is probably whitespace in your strings, for example,

例如，您的字符串中可能有空格，

data = pd.DataFrame({'ethnicity':[' Asian', '  Asian']})
data.loc[data['ethnicity'].str.contains('Asian'), 'ethnicity'].tolist()
# [' Asian', '  Asian']
print(data[data['ethnicity'].str.contains('Asian')])

yields

产量

  ethnicity
0     Asian
1     Asian

To strip the leading or trailing whitespace off the strings, you could use

要从字符串中去除前导或尾随空格，您可以使用

data['ethnicity'] = data['ethnicity'].str.strip()

after which,

之后，

data.loc[data['ethnicity'] == 'Asian']

yields

产量

  ethnicity
0     Asian
1     Asian

Answer 2

回答by Daniel Martin

You might try this:

你可以试试这个：

data[data['ethnicity'].str.strip()=='Asian']

Pandas：无法根据字符串相等进行过滤

提问by vpk

回答by unutbu

回答by Daniel Martin

相关推荐

最近更新

标签

Pandas：无法根据字符串相等进行过滤

提问by vpk

回答by unutbu

回答by Daniel Martin

相关推荐

Pandas：将 Lambda 应用于多个数据帧

pandas 创建计数的熊猫数据框

Pandas 中的 pyspark flatMap

pandas 按数据框计算分类数据熊猫组

相关推荐

最近更新

标签