像 SQL 的 LIKE 一样匹配 Pandas 文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22291565/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:13:14  来源:igfitidea点击:

Pandas text matching like SQL's LIKE?

pandasstring-matchingsql-like

提问by naught101

Is there a way to do something similar to SQL's LIKE syntaxon a pandas text DataFrame column, such that it returns a list of indices, or a list of booleans that can be used for indexing the dataframe? For example, I would like to be able to match all rows where the column starts with 'prefix_', similar to WHERE <col> LIKE prefix_%in SQL.

有没有办法在 Pandas 文本 DataFrame 列上执行类似于SQL 的 LIKE 语法的操作,以便它返回索引列表或可用于索引数据帧的布尔值列表?例如,我希望能够匹配列以“prefix_”开头的所有行,类似于WHERE <col> LIKE prefix_%SQL。

回答by Andy Hayden

You can use the Series method str.startswith(which takes a regex):

您可以使用 Series 方法str.startswith(采用正则表达式):

In [11]: s = pd.Series(['aa', 'ab', 'ca', np.nan])

In [12]: s.str.startswith('a', na=False)
Out[12]: 
0     True
1     True
2    False
3    False
dtype: bool

You can also do the same with str.contains(using a regex):

你也可以用str.contains(使用正则表达式)做同样的事情:

In [13]: s.str.contains('^a', na=False)
Out[13]: 
0     True
1     True
2    False
3    False
dtype: bool

So you can do df[col].str.startswith...

所以你可以做df[col].str.startswith...

See also the SQL comparison section of the docs.

另请参阅文档的 SQL 比较部分。

Note: (as pointed out by OP) by default NaNs will propagate (and hence cause an indexing error if you want to use the result as a boolean mask), we use this flag to say that NaN should map to False.

注意:(正如 OP 所指出的)默认情况下 NaN 将传播(如果您想将结果用作布尔掩码,因此会导致索引错误),我们使用此标志表示 NaN 应该映射到 False。

In [14]: s.str.startswith('a')  # can't use as boolean mask
Out[14]:
0     True
1     True
2    False
3      NaN
dtype: object

回答by sushmit

you can use

您可以使用

s.str.contains('a', case = False)

回答by H Raihan

  1. To find all the values from the series that starts with a pattern "s":
  1. 要从以模式“s”开头的系列中查找所有值:

SQL - WHERE column_name LIKE 's%'
Python - column_name.str.startswith('s')

SQL - WHERE column_name LIKE 's%'
Python - column_name.str.startswith('s')

  1. To find all the values from the series that ends with a pattern "s":
  1. 要从以模式“s”结尾的系列中查找所有值:

SQL - WHERE column_name LIKE '%s'
Python - column_name.str.endswith('s')

SQL - WHERE column_name LIKE '%s'
Python - column_name.str.endswith('s')

  1. To find all the values from the series that contains pattern "s":
  1. 要从包含模式“s”的系列中查找所有值:

SQL - WHERE column_name LIKE '%s%'
Python - column_name.str.contains('s')

SQL - WHERE column_name LIKE '%s%'
Python - column_name.str.contains('s')

For more options, check : https://pandas.pydata.org/pandas-docs/stable/reference/series.html

有关更多选项,请检查:https: //pandas.pydata.org/pandas-docs/stable/reference/series.html