Pandas str.contains 用于部分字符串的精确匹配

Question

提问by endangeredoxen

I have a DataFrame (I'll call it test) with a column containing file paths and I want to filter the data using a partial path.

我有一个 DataFrame（我称之为test），其中有一列包含文件路径，我想使用部分路径过滤数据。

                              full_path
0    C:\data\Data Files\BER\figure1.png
1    C:\data\Data Files\BER\figure2.png
2    C:\data\Previous\Error\summary.png
3        C:\data\Data Files\Valx2.png
4        C:\data\Data Files\Valx2.png
5         C:\data\Microscopy\defect.png

The partial path to find is:

找到的部分路径是：

ex = 'C:\data\Microscopy'

I've tried str.containsbut,

我试过了，str.contains但是

test.full_path.str.contains(ex)

0    False
1    False
2    False
3    False
4    False
5    False

I would have expected a value of Truefor index 5. At first I thought the problem might be with the path strings not actually matching due to differences with the escape character, but:

我本来希望True索引 5的值为。起初我认为问题可能是由于与转义字符不同，路径字符串实际上不匹配，但是：

ex in test.full_path.iloc[5]

equals True. After some digging, I'm thinking the argument to str.containsis supposed to be a regex expression so maybe the "\"s in the partial path are messing things up?

等于True. 经过一番挖掘，我str.contains认为 to的论点应该是一个正则表达式，所以也许部分路径中的“\”把事情搞砸了？

I also tried:

我也试过：

test.full_path.apply(lambda x: ex in x)

but this gives NameError: name 'ex' is not defined. These DataFrames can have a lot of rows in them so I'm also concerned that the applyfunction might not be very efficient.

但这给NameError: name 'ex' is not defined. 这些 DataFrame 中可以有很多行，所以我也担心该apply函数可能不是很有效。

Any suggestions on how to search a DataFrame column for exactpartial string matches?

关于如何在 DataFrame 列中搜索精确的部分字符串匹配的任何建议？

Thanks!

谢谢！

Answer 1

采纳答案by DSM

You can pass regex=Falseto avoid confusion in the interpretation of the argument to str.contains:

您可以传递regex=False以避免混淆参数的解释str.contains：

>>> df.full_path.str.contains(ex)
0    False
1    False
2    False
3    False
4    False
5    False
Name: full_path, dtype: bool
>>> df.full_path.str.contains(ex, regex=False)
0    False
1    False
2    False
3    False
4    False
5     True
Name: full_path, dtype: bool

(Aside: your lambda x: ex in xshould have worked. The NameError is a sign that you hadn't defined exfor some reason.)

（旁白：你lambda x: ex in x应该已经工作了。NameError 是你ex由于某种原因没有定义的标志。）

Pandas str.contains 用于部分字符串的精确匹配

提问by endangeredoxen

采纳答案by DSM

相关推荐

最近更新

标签

Pandas str.contains 用于部分字符串的精确匹配

提问by endangeredoxen

采纳答案by DSM

相关推荐

从 Pandas DataFrame 返回单个单元格值

从 Pandas 数据框中删除重复项并保留原始数据

从一个函数在 Pandas Dataframe 中创建多列

pandas ValueError：不支持连续

相关推荐

最近更新

标签