Pandas - 过滤器和正则表达式搜索 DataFrame 的索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35638377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - filter and regex search the index of DataFrame
提问by Shatnerz
I have a DataFrame in which the columns are MultiIndex and the index is a list of names, ie index=['Andrew', 'Bob', 'Calvin',...]
.
我有一个 DataFrame,其中的列是 MultiIndex,索引是名称列表,即index=['Andrew', 'Bob', 'Calvin',...]
.
I would like to create a function to return all rows of the dataframe that use the name 'Bob' or perhaps start with the letter 'A' or start with lowercase. How can this be done?
我想创建一个函数来返回使用名称“Bob”或以字母“A”开头或以小写字母开头的数据帧的所有行。如何才能做到这一点?
I looked into the df.filter()
with the regex argument, but it fails and I get:
我查看df.filter()
了 regex 参数,但它失败了,我得到:
df.filter(regex='a')
TypeError: expected string or buffer
or:
或者:
df.filter(regex=('a',1)
TypeError: first argument must be string or compiled pattern
I've tried other things such as passing re.compile('a')
to no avail.
我尝试过其他事情,例如通过re.compile('a')
但无济于事。
回答by Ezer K
Maybe try a different approach by using list comprehension and .ix:
也许通过使用列表理解和 .ix 尝试不同的方法:
import pandas as pd
df = pd.DataFrame(range(4),index=['Andrew', 'Bob', 'Calvin','yosef'])
df.ix[[x for x in df.index if x=='Bob']]
df.ix[[x for x in df.index if x[0]=='A']]
df.ix[[x for x in df.index if x.islower()]]
回答by Shatnerz
So it looks like part of my problem with filter
was that I was using an outdated version of pandas. After updating I no longer get the TypeError
. After some playing around, it looks like I can use filter
to fit my needs. Here is what I found out.
所以看起来我的部分问题filter
是我使用的是过时版本的Pandas。更新后,我不再获得TypeError
. 经过一番玩耍,看起来我可以filter
用来满足我的需求。这是我发现的。
Simply setting df.filter(regex='string')
will return the columns which match the regex. This looks to do the same as df.filter(regex='string', axis=1)
.
简单的设置df.filter(regex='string')
将返回与正则表达式匹配的列。这看起来与df.filter(regex='string', axis=1)
.
To search the index, I simply need to do df.filter(regex='string', axis=0)
要搜索索引,我只需要做 df.filter(regex='string', axis=0)