Pandas 查询功能不适用于列名中的空格
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50697536/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas query function not working with spaces in column names
提问by Bhushan Pant
I have a dataframe with spaces in column names. I am trying to use query
method to get the results. It is working fine with 'c' column but getting error for 'a b'
我有一个列名中有空格的数据框。我正在尝试使用query
方法来获得结果。'c' 列工作正常,但出现 'a b' 错误
import pandas as pd
a = pd.DataFrame(columns=["a b", "c"])
a["a b"] = [1,2,3,4]
a["c"] = [5,6,7,8]
a.query('a b==5')
For this I am getting this error:
为此,我收到此错误:
a b ==5
^
SyntaxError: invalid syntax
I don't want to fill up space with other characters like '_' etc.
我不想用“_”等其他字符填充空间。
There is one hack using pandasql to put variable name inside brackets example: [a b]
有一个使用 pandasql 将变量名放在括号内的技巧:[ab]
采纳答案by Jarno
From pandas 0.25
onward you will be able to escape column names with backticks so you can do
从Pandas0.25
开始,您将能够使用反引号转义列名称,以便您可以
a.query('`a b` == 5')
回答by jpp
Pandas 0.25+
Pandas 0.25+
As described here:
如上所述这里:
DataFrame.query()
andDataFrame.eval()
now supports quoting column names with backticks to refer to names with spaces (GH6508)
DataFrame.query()
而DataFrame.eval()
现在支持引用列名与反引号引用带有空格的名称(GH6508)
So you can use:
所以你可以使用:
a.query('`a b`==5')
Pandas pre-0.25
0.25 之前的Pandas
You cannot use pd.DataFrame.query
if you have whitespace in your column name. Consider what would happen if you had columns named a
, b
and a b
; there would be ambiguity as to what you require.
pd.DataFrame.query
如果列名中有空格,则不能使用。考虑一下如果您有名为a
,b
和 的列会发生什么a b
;你需要什么会有歧义。
Instead, you can use pd.DataFrame.loc
:
相反,您可以使用pd.DataFrame.loc
:
df = df.loc[df['a b'] == 5]
Since you are only filtering rows, you can omit .loc
accessor altogether:
由于您只是过滤行,因此您可以.loc
完全省略访问器:
df = df[df['a b'] == 5]
回答by jezrael
It is not possible yet. Check github issue #6508:
目前还不可能。检查github 问题 #6508:
Note that in reality .queryis just a nice-to-have interface, in fact it has very specific guarantees, meaning its meant to parse like a query language, and not a fully general interface.
请注意,实际上.query只是一个很好的接口,实际上它有非常具体的保证,这意味着它的目的是像查询语言一样解析,而不是一个完全通用的接口。
Reason is for query
need string to be a valid python expression, so column names must be valid python identifiers.
原因是query
需要字符串是有效的 python 表达式,所以列名必须是有效的 python 标识符。
Solution is boolean indexing
:
解决方案是boolean indexing
:
df = df[df['a b'] == 5]
回答by Simeon Ikudabo
Instead of using the pandas.query function I would create a condition in this case to lookup values and where the condition is True. For example:
在这种情况下,我将创建一个条件来查找值和条件为 True,而不是使用 pandas.query 函数。例如:
import pandas as pd
a = pd.DataFrame(columns=["a b", "c"])
a["a b"] = [1,2,3,5]
a["c"] = [5,6,7,8]
#a.query('a b==5') Remove the query because it cannot lookup columns with spaces in the name.
condition = a['a b'] == 5
print(a['a b'][condition])
output:
3 5
We see that at index 3 your condition evaluates to True (if you want the specific index and not Series of Boolean values).
我们看到,在索引 3 处,您的条件评估为 True(如果您想要特定索引而不是布尔值系列)。
回答by DTT
I am afraid that the query method does not accept column name with empty space. In any case you can query the dataframe in this way:
恐怕查询方法不接受带有空格的列名。在任何情况下,您都可以通过以下方式查询数据框:
import pandas as pd
a = pd.DataFrame({'a b':[1,2,3,4], 'c':[5,6,7,8]})
a[a['a b']==1]