Pandas 查询功能不适用于列名中的空格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50697536/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:39:07  来源:igfitidea点击:

Pandas query function not working with spaces in column names

pythonsqlpandasdataframe

提问by Bhushan Pant

I have a dataframe with spaces in column names. I am trying to use querymethod to get the results. It is working fine with 'c' column but getting error for 'a b'

我有一个列名中有空格的数据框。我正在尝试使用query方法来获得结果。'c' 列工作正常,但出现 'a b' 错误

import pandas as pd
a = pd.DataFrame(columns=["a b", "c"])
a["a b"] = [1,2,3,4]
a["c"] = [5,6,7,8]
a.query('a b==5')

For this I am getting this error:

为此,我收到此错误:

a b ==5
  ^
SyntaxError: invalid syntax

I don't want to fill up space with other characters like '_' etc.

我不想用“_”等其他字符填充空间。

There is one hack using pandasql to put variable name inside brackets example: [a b]

有一个使用 pandasql 将变量名放在括号内的技巧:[ab]

采纳答案by Jarno

From pandas 0.25onward you will be able to escape column names with backticks so you can do

从Pandas0.25开始,您将能够使用反引号转义列名称,以便您可以

a.query('`a b` == 5') 

回答by jpp

Pandas 0.25+

Pandas 0.25+

As described here:

如上所述这里

DataFrame.query()and DataFrame.eval()now supports quoting column names with backticks to refer to names with spaces (GH6508)

DataFrame.query()DataFrame.eval()现在支持引用列名与反引号引用带有空格的名称(GH6508

So you can use:

所以你可以使用:

a.query('`a b`==5')

Pandas pre-0.25

0.25 之前的Pandas

You cannot use pd.DataFrame.queryif you have whitespace in your column name. Consider what would happen if you had columns named a, band a b; there would be ambiguity as to what you require.

pd.DataFrame.query如果列名中有空格,则不能使用。考虑一下如果您有名为a,b和 的列会发生什么a b;你需要什么会有歧义。

Instead, you can use pd.DataFrame.loc:

相反,您可以使用pd.DataFrame.loc

df = df.loc[df['a b'] == 5]

Since you are only filtering rows, you can omit .locaccessor altogether:

由于您只是过滤行,因此您可以.loc完全省略访问器:

df = df[df['a b'] == 5]

回答by jezrael

It is not possible yet. Check github issue #6508:

目前还不可能。检查github 问题 #6508

Note that in reality .queryis just a nice-to-have interface, in fact it has very specific guarantees, meaning its meant to parse like a query language, and not a fully general interface.

请注意,实际上.query只是一个很好的接口,实际上它有非常具体的保证,这意味着它的目的是像查询语言一样解析,而不是一个完全通用的接口。

Reason is for queryneed string to be a valid python expression, so column names must be valid python identifiers.

原因是query需要字符串是有效的 python 表达式,所以列名必须是有效的 python 标识符。

Solution is boolean indexing:

解决方案是boolean indexing

df = df[df['a b'] == 5]

回答by Simeon Ikudabo

Instead of using the pandas.query function I would create a condition in this case to lookup values and where the condition is True. For example:

在这种情况下,我将创建一个条件来查找值和条件为 True,而不是使用 pandas.query 函数。例如:

import pandas as pd
a = pd.DataFrame(columns=["a b", "c"])
a["a b"] = [1,2,3,5]
a["c"] = [5,6,7,8]
#a.query('a b==5') Remove the query because it cannot lookup columns with spaces in the name.

condition = a['a b'] == 5
print(a['a b'][condition])
output:

    3    5

We see that at index 3 your condition evaluates to True (if you want the specific index and not Series of Boolean values).

我们看到,在索引 3 处,您的条件评估为 True(如果您想要特定索引而不是布尔值系列)。

回答by DTT

I am afraid that the query method does not accept column name with empty space. In any case you can query the dataframe in this way:

恐怕查询方法不接受带有空格的列名。在任何情况下,您都可以通过以下方式查询数据框:

import pandas as pd
a = pd.DataFrame({'a b':[1,2,3,4], 'c':[5,6,7,8]})
a[a['a b']==1]