Pandas str.contains 用于部分字符串的精确匹配
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33193792/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas str.contains for exact matches of partial strings
提问by endangeredoxen
I have a DataFrame (I'll call it test
) with a column containing file paths and I want to filter the data using a partial path.
我有一个 DataFrame(我称之为test
),其中有一列包含文件路径,我想使用部分路径过滤数据。
full_path
0 C:\data\Data Files\BER\figure1.png
1 C:\data\Data Files\BER\figure2.png
2 C:\data\Previous\Error\summary.png
3 C:\data\Data Files\Valx2.png
4 C:\data\Data Files\Valx2.png
5 C:\data\Microscopy\defect.png
The partial path to find is:
找到的部分路径是:
ex = 'C:\data\Microscopy'
I've tried str.contains
but,
我试过了,str.contains
但是
test.full_path.str.contains(ex)
0 False
1 False
2 False
3 False
4 False
5 False
I would have expected a value of True
for index 5. At first I thought the problem might be with the path strings not actually matching due to differences with the escape character, but:
我本来希望True
索引 5的值为。起初我认为问题可能是由于与转义字符不同,路径字符串实际上不匹配,但是:
ex in test.full_path.iloc[5]
equals True
. After some digging, I'm thinking the argument to str.contains
is supposed to be a regex expression so maybe the "\"s in the partial path are messing things up?
等于True
. 经过一番挖掘,我str.contains
认为 to的论点应该是一个正则表达式,所以也许部分路径中的“\”把事情搞砸了?
I also tried:
我也试过:
test.full_path.apply(lambda x: ex in x)
but this gives NameError: name 'ex' is not defined
. These DataFrames can have a lot of rows in them so I'm also concerned that the apply
function might not be very efficient.
但这给NameError: name 'ex' is not defined
. 这些 DataFrame 中可以有很多行,所以我也担心该apply
函数可能不是很有效。
Any suggestions on how to search a DataFrame column for exactpartial string matches?
关于如何在 DataFrame 列中搜索精确的部分字符串匹配的任何建议?
Thanks!
谢谢!
采纳答案by DSM
You can pass regex=False
to avoid confusion in the interpretation of the argument to str.contains
:
您可以传递regex=False
以避免混淆参数的解释str.contains
:
>>> df.full_path.str.contains(ex)
0 False
1 False
2 False
3 False
4 False
5 False
Name: full_path, dtype: bool
>>> df.full_path.str.contains(ex, regex=False)
0 False
1 False
2 False
3 False
4 False
5 True
Name: full_path, dtype: bool
(Aside: your lambda x: ex in x
should have worked. The NameError is a sign that you hadn't defined ex
for some reason.)
(旁白:你lambda x: ex in x
应该已经工作了。NameError 是你ex
由于某种原因没有定义的标志。)