python pandas.Series.isin 不区分大小写
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45680267/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas.Series.isin with case insensitive
提问by haoping
I want to filter out some rows with one of DataFrame's column which data is in a list.
我想用 DataFrame 的列之一过滤掉一些数据在列表中的行。
df[df['column'].isin(mylist)]
But I found that it's case sensitive. Is there any method using ".isin()" with case insensitive?
但我发现它区分大小写。是否有任何使用“.isin()”且不区分大小写的方法?
回答by Vaishali
One way would be by comparing the lower or upper case of the Series with the same for the list
一种方法是将系列的小写或大写与列表的相同
df[df['column'].str.lower().isin([x.lower() for x in mylist])]
The advantage here is that we are not saving any changes to the original df or the list making the operation more efficient
这里的优点是我们不会保存对原始 df 或列表的任何更改,从而使操作更高效
Consider this dummy df:
考虑这个虚拟 df:
Color Val
0 Green 1
1 Green 1
2 Red 2
3 Red 2
4 Blue 3
5 Blue 3
For the list l:
对于列表 l:
l = ['green', 'BLUE']
You can use isin()
您可以使用 isin()
df[df['Color'].str.lower().isin([x.lower() for x in l])]
You get
你得到
Color Val
0 Green 1
1 Green 1
4 Blue 3
5 Blue 3
回答by Uri Goren
I prefer to use the general .apply
我更喜欢使用一般的 .apply
myset = set([s.lower() for s in mylist])
df[df['column'].apply(lambda v: v.lower() in myset)]
A lookup in a set
is faster than a lookup in a list
在 aset
中查找比在 a 中查找快list
回答by Cory Madden
Convert it to a str
using the str
method and get the lowercase version
str
使用str
方法将其转换为 a并获取小写版本
In [23]: df =pd.DataFrame([['A', 'B', 'C'], ['D', 'E', 6]], columns=['A', 'B', '
...: C'])
In [24]: df
Out[24]:
A B C
0 A B C
1 D E 6
In [25]: df.A
Out[25]:
0 A
1 D
Name: A, dtype: object
In [26]: df.A.str.lower().isin(['a', 'b', 'c'])
Out[26]:
0 True
1 False
Name: A, dtype: bool
回答by Dayantat
I would put my list into a CSV and load it as a dataframe. Afterwards I would run the command:
我会将我的列表放入 CSV 并将其作为数据框加载。之后我会运行命令:
df_done = df[df["Server Name"].str.lower().isin(df_compare["Computer Name"].str.lower())]
This avoids using for loop and can handle large amounts of data easily.
这避免了使用 for 循环并且可以轻松处理大量数据。
df = 5000 rows
df_compare = 1000 rows