Pandas str.contains 用于部分字符串的精确匹配

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33193792/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:03:57  来源:igfitidea点击:

Pandas str.contains for exact matches of partial strings

pythonregexpandascontains

提问by endangeredoxen

I have a DataFrame (I'll call it test) with a column containing file paths and I want to filter the data using a partial path.

我有一个 DataFrame(我称之为test),其中有一列包含文件路径,我想使用部分路径过滤数据。

                              full_path
0    C:\data\Data Files\BER\figure1.png
1    C:\data\Data Files\BER\figure2.png
2    C:\data\Previous\Error\summary.png
3        C:\data\Data Files\Valx2.png
4        C:\data\Data Files\Valx2.png
5         C:\data\Microscopy\defect.png

The partial path to find is:

找到的部分路径是:

ex = 'C:\data\Microscopy'

I've tried str.containsbut,

我试过了,str.contains但是

test.full_path.str.contains(ex)

0    False
1    False
2    False
3    False
4    False
5    False

I would have expected a value of Truefor index 5. At first I thought the problem might be with the path strings not actually matching due to differences with the escape character, but:

我本来希望True索引 5的值为。起初我认为问题可能是由于与转义字符不同,路径字符串实际上不匹配,但是:

ex in test.full_path.iloc[5]

equals True. After some digging, I'm thinking the argument to str.containsis supposed to be a regex expression so maybe the "\"s in the partial path are messing things up?

等于True. 经过一番挖掘,我str.contains认为 to的论点应该是一个正则表达式,所以也许部分路径中的“\”把事情搞砸了?

I also tried:

我也试过:

test.full_path.apply(lambda x: ex in x)

but this gives NameError: name 'ex' is not defined. These DataFrames can have a lot of rows in them so I'm also concerned that the applyfunction might not be very efficient.

但这给NameError: name 'ex' is not defined. 这些 DataFrame 中可以有很多行,所以我也担心该apply函数可能不是很有效。

Any suggestions on how to search a DataFrame column for exactpartial string matches?

关于如何在 DataFrame 列中搜索精确的部分字符串匹配的任何建议?

Thanks!

谢谢!

采纳答案by DSM

You can pass regex=Falseto avoid confusion in the interpretation of the argument to str.contains:

您可以传递regex=False以避免混淆参数的解释str.contains

>>> df.full_path.str.contains(ex)
0    False
1    False
2    False
3    False
4    False
5    False
Name: full_path, dtype: bool
>>> df.full_path.str.contains(ex, regex=False)
0    False
1    False
2    False
3    False
4    False
5     True
Name: full_path, dtype: bool

(Aside: your lambda x: ex in xshould have worked. The NameError is a sign that you hadn't defined exfor some reason.)

(旁白:你lambda x: ex in x应该已经工作了。NameError 是你ex由于某种原因没有定义的标志。)