Pandas - 过滤器和正则表达式搜索 DataFrame 的索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35638377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:46:01  来源:igfitidea点击:

Pandas - filter and regex search the index of DataFrame

pythonregexpandas

提问by Shatnerz

I have a DataFrame in which the columns are MultiIndex and the index is a list of names, ie index=['Andrew', 'Bob', 'Calvin',...].

我有一个 DataFrame,其中的列是 MultiIndex,索引是名称列表,即index=['Andrew', 'Bob', 'Calvin',...].

I would like to create a function to return all rows of the dataframe that use the name 'Bob' or perhaps start with the letter 'A' or start with lowercase. How can this be done?

我想创建一个函数来返回使用名称“Bob”或以字母“A”开头或以小写字母开头的数据帧的所有行。如何才能做到这一点?

I looked into the df.filter()with the regex argument, but it fails and I get:

我查看df.filter()了 regex 参数,但它失败了,我得到:

df.filter(regex='a')
TypeError: expected string or buffer

or:

或者:

df.filter(regex=('a',1)
TypeError: first argument must be string or compiled pattern

I've tried other things such as passing re.compile('a')to no avail.

我尝试过其他事情,例如通过re.compile('a')但无济于事。

回答by Ezer K

Maybe try a different approach by using list comprehension and .ix:

也许通过使用列表理解和 .ix 尝试不同的方法:

import pandas as pd

df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])

df.ix[[x for x in df.index if x=='Bob']]

df.ix[[x for x in df.index if x[0]=='A']]

df.ix[[x for x in df.index if x.islower()]]

回答by Shatnerz

So it looks like part of my problem with filterwas that I was using an outdated version of pandas. After updating I no longer get the TypeError. After some playing around, it looks like I can use filterto fit my needs. Here is what I found out.

所以看起来我的部分问题filter是我使用的是过时版本的Pandas。更新后,我不再获得TypeError. 经过一番玩耍,看起来我可以filter用来满足我的需求。这是我发现的。

Simply setting df.filter(regex='string')will return the columns which match the regex. This looks to do the same as df.filter(regex='string', axis=1).

简单的设置df.filter(regex='string')将返回与正则表达式匹配的列。这看起来与df.filter(regex='string', axis=1).

To search the index, I simply need to do df.filter(regex='string', axis=0)

要搜索索引,我只需要做 df.filter(regex='string', axis=0)