python pandas.Series.isin 不区分大小写

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45680267/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:14:40  来源:igfitidea点击:

python pandas.Series.isin with case insensitive

pythonpandasseries

提问by haoping

I want to filter out some rows with one of DataFrame's column which data is in a list.

我想用 DataFrame 的列之一过滤掉一些数据在列表中的行。

df[df['column'].isin(mylist)]

But I found that it's case sensitive. Is there any method using ".isin()" with case insensitive?

但我发现它区分大小写。是否有任何使用“.isin()”且不区分大小写的方法?

回答by Vaishali

One way would be by comparing the lower or upper case of the Series with the same for the list

一种方法是将系列的小写或大写与列表的相同

df[df['column'].str.lower().isin([x.lower() for x in mylist])]

The advantage here is that we are not saving any changes to the original df or the list making the operation more efficient

这里的优点是我们不会保存对原始 df 或列表的任何更改,从而使操作更高效

Consider this dummy df:

考虑这个虚拟 df:

    Color   Val
0   Green   1
1   Green   1
2   Red     2
3   Red     2
4   Blue    3
5   Blue    3

For the list l:

对于列表 l:

l = ['green', 'BLUE']

You can use isin()

您可以使用 isin()

df[df['Color'].str.lower().isin([x.lower() for x in l])]

You get

你得到

    Color   Val
0   Green   1
1   Green   1
4   Blue    3
5   Blue    3

回答by Uri Goren

I prefer to use the general .apply

我更喜欢使用一般的 .apply

myset = set([s.lower() for s in mylist])
df[df['column'].apply(lambda v: v.lower() in myset)]

A lookup in a setis faster than a lookup in a list

在 aset中查找比在 a 中查找快list

回答by Cory Madden

Convert it to a strusing the strmethod and get the lowercase version

str使用str方法将其转换为 a并获取小写版本

In [23]: df =pd.DataFrame([['A', 'B', 'C'], ['D', 'E', 6]], columns=['A', 'B', '
    ...: C'])

In [24]: df
Out[24]: 
   A  B  C
0  A  B  C
1  D  E  6

In [25]: df.A
Out[25]: 
0    A
1    D
Name: A, dtype: object

In [26]: df.A.str.lower().isin(['a', 'b', 'c'])
Out[26]: 
0     True
1    False
Name: A, dtype: bool

回答by Dayantat

I would put my list into a CSV and load it as a dataframe. Afterwards I would run the command:

我会将我的列表放入 CSV 并将其作为数据框加载。之后我会运行命令:

df_done = df[df["Server Name"].str.lower().isin(df_compare["Computer Name"].str.lower())]

This avoids using for loop and can handle large amounts of data easily.

这避免了使用 for 循环并且可以轻松处理大量数据。

df = 5000 rows
df_compare = 1000 rows