Python 在 pandas.query() 中使用 LIKE
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31391275/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
USING LIKE inside pandas.query()
提问by Pradeep M
I have been using Pandas for more than 3 months and I have an fair idea about the dataframes accessing and querying etc.
我已经使用 Pandas 超过 3 个月了,我对数据帧访问和查询等有一个很好的了解。
I have got an requirement wherein I wanted to query the dataframe using LIKE keyword (LIKE similar to SQL) in pandas.query().
我有一个要求,我想在pandas.query() 中使用 LIKE 关键字(LIKE 类似于 SQL)查询数据帧。
i.e: Am trying to execute pandas.query("column_name LIKE 'abc%'")command but its failing.
即:我正在尝试执行pandas.query("column_name LIKE 'abc%'")命令,但它失败了。
I know an alternative approach which is to use str.contains("abc%") but this doesn't meet our requirement.
我知道另一种方法是使用 str.contains("abc%") 但这不符合我们的要求。
We wanted to execute LIKE inside pandas.query(). How can I do so?
我们想在 pandas.query() 中执行 LIKE。我怎么能这样做?
回答by khammel
Not using query(), but this will give you what you're looking for:
不使用 query(),但这会给你你正在寻找的东西:
df[df.col_name.str.startswith('abc')]
df
Out[93]:
col_name
0 this
1 that
2 abcd
df[df.col_name.str.startswith('abc')]
Out[94]:
col_name
2 abcd
Query uses the pandas eval()and is limited in what you can use within it. If you want to use pure SQL you could consider pandasqlwhere the following statement would work for you:
Query 使用 pandas eval()并且您可以在其中使用的内容受到限制。如果您想使用纯 SQL,您可以考虑使用以下语句的pandasql:
sqldf("select col_name from df where col_name like 'abc%';", locals())
Or alternately if your problem with the pandas str
methods was that your column wasn't entirely of string type you could do the following:
或者,如果您对 Pandasstr
方法的问题是您的列不完全是字符串类型,您可以执行以下操作:
df[df.col_name.str.startswith('abc').fillna(False)]
回答by Terrance DeJesus
Super late to this post, but for anyone that comes across it. You can use boolean indexing by making your search criteria based on a string method check str.contains
.
这篇文章太晚了,但对于遇到它的任何人。您可以通过基于字符串方法 check 设置搜索条件来使用布尔索引str.contains
。
Example:
例子:
dataframe[dataframe.summary.str.contains('Windows Failed Login', case=False)]
In the code above, the snippet inside the brackets refers to the summary column of the dataframe and uses the .str.contains
method to search for 'Windows Failed Login'
within every value of that Series. Case sensitive can be set to true or false. This will return boolean index which is then used to return the dataframe your looking for. You can use .fillna()
with this in the brackets as well if you run into any Nan errors.
在上面的代码中,括号内的代码段指的是数据框的摘要列,并使用该.str.contains
方法'Windows Failed Login'
在该系列的每个值中进行搜索。区分大小写可以设置为 true 或 false。这将返回布尔索引,然后用于返回您要查找的数据帧。.fillna()
如果遇到任何 Nan 错误,您也可以在括号中使用它。
Hope this helps!
希望这可以帮助!
回答by volodymyr
If you have to use df.query(), the correct syntax is:
如果必须使用 df.query(),正确的语法是:
df.query('column_name.str.contains("abc")', engine='python')
You can easily combine this with other conditions:
您可以轻松地将其与其他条件结合起来:
df.query('column_a.str.contains("abc") or column_b.str.contains("xyz") and column_c>100', engine='python')
It is not a full equivalent of SQL Like, however, but can be useful nevertheless.
然而,它并不完全等同于 SQL Like,但仍然很有用。
回答by P.Panayotov
@volodymyr is right, but the thing he forgets is that you need to set engine='python' to expression to work.
@volodymyr 是对的,但他忘记了你需要将 engine='python' 设置为 expression 才能工作。
Example:
>>> pd_df.query('column_name.str.contains("abc")', engine='python')
例子:
>>> pd_df.query('column_name.str.contains("abc")', engine='python')
Hereis more information on default engine ('numexpr') and 'python' engine. Also, have in mind that 'python' is slower on big data.
这里是关于默认引擎 ('numexpr') 和 'python' 引擎的更多信息。另外,请记住,“python”在大数据上的速度较慢。
回答by vsnishad
I know this is a pretty old post but I'm just going to leave this here for those who are looking for answers.
我知道这是一篇很老的帖子,但我只想把它留在这里给那些正在寻找答案的人。
df.query('column_name == "value"')
This worked for me when I needed to query the dataframe for matching string.
当我需要查询数据框以匹配字符串时,这对我有用。
回答by Shovalt
A trick I just came up with for "starts with":
我刚刚为“开始”想出了一个技巧:
pandas.query('"abc" <= column_name <= "abc~"')
Explanation:pandas accepts "greater" and "less than" statements for strings in a query, so anything starting with "abc" will be greater or equal to "abc" in the lexicographic order. The tilde (~) is the largest character in the ASCII table, so anything starting with "abc" will be less than or equal to "abc~".
说明:pandas 接受查询中字符串的“大于”和“小于”语句,因此以“abc”开头的任何内容在字典顺序中都将大于或等于“abc”。波浪号 (~) 是ASCII 表中最大的字符,因此以“abc”开头的任何内容都将小于或等于“abc~”。
A few things to take into consideration:
需要考虑以下几点:
- This is of course case sensitive. All lower case characters come after all upper cases characters in the ASCII table.
- This won't work fully for Unicode strings, but the general principle should be the same.
- I couldn't come up with parallel tricks for "contains" or "ends with".
- 这当然区分大小写。在 ASCII 表中,所有小写字符都在所有大写字符之后。
- 这不适用于 Unicode 字符串,但一般原则应该是相同的。
- 我想不出“包含”或“结束于”的平行技巧。