Python 在 pandas.query() 中使用 LIKE

Question

提问by Pradeep M

I have been using Pandas for more than 3 months and I have an fair idea about the dataframes accessing and querying etc.

我已经使用 Pandas 超过 3 个月了，我对数据帧访问和查询等有一个很好的了解。

I have got an requirement wherein I wanted to query the dataframe using LIKE keyword (LIKE similar to SQL) in pandas.query().

我有一个要求，我想在pandas.query() 中使用 LIKE 关键字（LIKE 类似于 SQL）查询数据帧。

i.e: Am trying to execute pandas.query("column_name LIKE 'abc%'")command but its failing.

即：我正在尝试执行pandas.query("column_name LIKE 'abc%'")命令，但它失败了。

I know an alternative approach which is to use str.contains("abc%") but this doesn't meet our requirement.

我知道另一种方法是使用 str.contains("abc%") 但这不符合我们的要求。

We wanted to execute LIKE inside pandas.query(). How can I do so?

我们想在 pandas.query() 中执行 LIKE。我怎么能这样做？

Answer 1

回答by khammel

Not using query(), but this will give you what you're looking for:

不使用 query()，但这会给你你正在寻找的东西：

df[df.col_name.str.startswith('abc')]


df
Out[93]: 
  col_name
0     this
1     that
2     abcd

df[df.col_name.str.startswith('abc')]
Out[94]: 
  col_name
2     abcd

Query uses the pandas eval()and is limited in what you can use within it. If you want to use pure SQL you could consider pandasqlwhere the following statement would work for you:

Query 使用 pandas eval()并且您可以在其中使用的内容受到限制。如果您想使用纯 SQL，您可以考虑使用以下语句的pandasql：

sqldf("select col_name from df where col_name like 'abc%';", locals())

Or alternately if your problem with the pandas strmethods was that your column wasn't entirely of string type you could do the following:

或者，如果您对 Pandasstr方法的问题是您的列不完全是字符串类型，您可以执行以下操作：

df[df.col_name.str.startswith('abc').fillna(False)]

Answer 2

回答by Terrance DeJesus

Super late to this post, but for anyone that comes across it. You can use boolean indexing by making your search criteria based on a string method check str.contains.

这篇文章太晚了，但对于遇到它的任何人。您可以通过基于字符串方法 check 设置搜索条件来使用布尔索引str.contains。

Example:

例子：

dataframe[dataframe.summary.str.contains('Windows Failed Login', case=False)]

In the code above, the snippet inside the brackets refers to the summary column of the dataframe and uses the .str.containsmethod to search for 'Windows Failed Login'within every value of that Series. Case sensitive can be set to true or false. This will return boolean index which is then used to return the dataframe your looking for. You can use .fillna()with this in the brackets as well if you run into any Nan errors.

在上面的代码中，括号内的代码段指的是数据框的摘要列，并使用该.str.contains方法'Windows Failed Login'在该系列的每个值中进行搜索。区分大小写可以设置为 true 或 false。这将返回布尔索引，然后用于返回您要查找的数据帧。.fillna()如果遇到任何 Nan 错误，您也可以在括号中使用它。

Hope this helps!

希望这可以帮助！

Answer 3

回答by volodymyr

If you have to use df.query(), the correct syntax is:

如果必须使用 df.query()，正确的语法是：

df.query('column_name.str.contains("abc")', engine='python')

You can easily combine this with other conditions:

您可以轻松地将其与其他条件结合起来：

df.query('column_a.str.contains("abc") or column_b.str.contains("xyz") and column_c>100', engine='python')

It is not a full equivalent of SQL Like, however, but can be useful nevertheless.

然而，它并不完全等同于 SQL Like，但仍然很有用。

Answer 4

回答by P.Panayotov

@volodymyr is right, but the thing he forgets is that you need to set engine='python' to expression to work.

@volodymyr 是对的，但他忘记了你需要将 engine='python' 设置为 expression 才能工作。

Example: >>> pd_df.query('column_name.str.contains("abc")', engine='python')

例子： >>> pd_df.query('column_name.str.contains("abc")', engine='python')

Hereis more information on default engine ('numexpr') and 'python' engine. Also, have in mind that 'python' is slower on big data.

这里是关于默认引擎 ('numexpr') 和 'python' 引擎的更多信息。另外，请记住，“python”在大数据上的速度较慢。

Answer 5

回答by vsnishad

I know this is a pretty old post but I'm just going to leave this here for those who are looking for answers.

我知道这是一篇很老的帖子，但我只想把它留在这里给那些正在寻找答案的人。

df.query('column_name == "value"')

This worked for me when I needed to query the dataframe for matching string.

当我需要查询数据框以匹配字符串时，这对我有用。

Answer 6

回答by Shovalt

A trick I just came up with for "starts with":

我刚刚为“开始”想出了一个技巧：

pandas.query('"abc" <= column_name <= "abc~"')

Explanation:pandas accepts "greater" and "less than" statements for strings in a query, so anything starting with "abc" will be greater or equal to "abc" in the lexicographic order. The tilde (~) is the largest character in the ASCII table, so anything starting with "abc" will be less than or equal to "abc~".

说明：pandas 接受查询中字符串的“大于”和“小于”语句，因此以“abc”开头的任何内容在字典顺序中都将大于或等于“abc”。波浪号 (~) 是ASCII 表中最大的字符，因此以“abc”开头的任何内容都将小于或等于“abc~”。

A few things to take into consideration:

需要考虑以下几点：

This is of course case sensitive. All lower case characters come after all upper cases characters in the ASCII table.
This won't work fully for Unicode strings, but the general principle should be the same.
I couldn't come up with parallel tricks for "contains" or "ends with".

这当然区分大小写。在 ASCII 表中，所有小写字符都在所有大写字符之后。
这不适用于 Unicode 字符串，但一般原则应该是相同的。
我想不出“包含”或“结束于”的平行技巧。

Python 在 pandas.query() 中使用 LIKE

提问by Pradeep M

回答by khammel

回答by Terrance DeJesus

回答by volodymyr

回答by P.Panayotov

回答by vsnishad

回答by Shovalt

相关推荐

最近更新

标签

Python 在 pandas.query() 中使用 LIKE

提问by Pradeep M

回答by khammel

回答by Terrance DeJesus

回答by volodymyr

回答by P.Panayotov

回答by vsnishad

回答by Shovalt

相关推荐

Python 在 Heroku 上安装 PyODBC 时找不到 sql.h

为什么我在 python 3 中无法调用“模块”对象？

JavaScript 是否支持像 Python 那样的数组/列表推导式？

Python 在 Pandas DataFrame 中使用 set_index

相关推荐

最近更新

标签