Pandas：查询字符串，其中列名包含特殊字符

Question

提问by Joe

I am working with a data frame that has a structure something like the following:

我正在使用具有如下结构的数据框：

In[75]: df.head(2)
Out[75]: 
  statusdata             participant_id association  latency response  \
0   complete  CLIENT-TEST-1476362617727       seeya      715  dislike   
1   complete  CLIENT-TEST-1476362617727      welome      800     like   

   stimuli elementdata statusmetadata demo$gender  demo$question2  \
0  Sample B    semi_imp       complete        male              23   
1  Sample C    semi_imp       complete      female              23

I want to be able to run a query string against the column demo$gender.

我希望能够对列运行查询字符串demo$gender。

I.e,

IE，

df.query("demo$gender=='male'")

But this has a problem with the $sign. If I replace the $sign with another delimited (like -) then the problem persists. Can I fix up my query string to avoid this problem. I would prefer not to rename the columns as these correspond tightly with other parts of my application.

但是这个$标志有问题。如果我$用另一个分隔符（如-）替换该符号，则问题仍然存在。我可以修复我的查询字符串以避免这个问题。我不想重命名这些列，因为它们与我的应用程序的其他部分紧密对应。

I really want to stick with a query string as it is supplied by another component of our tech stack and creating a parser would be a heavy lift for what seems like a simple problem.

我真的很想坚持使用查询字符串，因为它是由我们技术堆栈的另一个组件提供的，而创建解析器对于看似简单的问题来说将是一项艰巨的任务。

Thanks in advance.

提前致谢。

Answer 1

回答by Joe

For the interested here is a simple proceedure I used to accomplish the task:

对于感兴趣的人，这是我用来完成任务的一个简单程序：

# Identify invalid column names
invalid_column_names = [x for x in list(df.columns.values) if not x.isidentifier() ]

# Make replacements in the query and keep track
# NOTE: This method fails if the frame has columns called REPL_0 etc.
replacements = dict()
for cn in invalid_column_names:
    r = 'REPL_'+ str(invalid_column_names.index(cn))
    query = query.replace(cn, r)
    replacements[cn] = r

inv_replacements = {replacements[k] : k for k in replacements.keys()}

df = df.rename(columns=replacements) # Rename the columns
df  = df.query(query) # Carry out query

df = df.rename(columns=inv_replacements)

Which amounts to identifying the invalid column names, transforming the query and renaming the columns. Finally we perform the query and then translate the column names back.

这相当于识别无效的列名，转换查询并重命名列。最后，我们执行查询，然后将列名翻译回来。

Credit to @chrisb for their answer that pointed me in the right direction

感谢@chrisb 的回答，为我指明了正确的方向

Answer 2

回答by chrisb

The current implementation of queryrequires the string to be a valid python expression, so column names must be valid python identifiers. Your two options are renaming the column, or using a plain boolean filter, like this:

的当前实现query要求字符串是有效的 Python 表达式，因此列名必须是有效的 Python 标识符。您的两个选项是重命名列，或使用普通布尔过滤器，如下所示：

df[df['demo$gender'] =='male']

Pandas：查询字符串，其中列名包含特殊字符

提问by Joe

回答by Joe

回答by chrisb

相关推荐

最近更新

标签

Pandas：查询字符串，其中列名包含特殊字符

提问by Joe

回答by Joe

回答by chrisb

相关推荐

Pandas：计算数据框中重复条目的平均值

pandas 如何将压缩的 (gz) CSV 文件读入 dask 数据框？

如何使用正则表达式删除 python pandas DataFrame 中的行？

pandas 如何比较两个熊猫系列的两排？

相关推荐

最近更新

标签