pandas 通过索引中的部分字符串匹配选择行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16617394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:50:11  来源:igfitidea点击:

Select rows by partial string match in index

pythonpandas

提问by ronszon

Having a series like this:

有这样一个系列:

ds = Series({'wikipedia':10,'wikimedia':22,'wikitravel':33,'google':40})

google        40
wikimedia     22
wikipedia     10
wikitravel    33
dtype: int64

I would like to select the rows where 'wiki' is a part of the index label (a partial string label).

我想选择“wiki”是索引标签(部分字符串标签)的一部分的行。

For the moment I tried

目前我试过了

ds[ds.index.map(lambda x: 'wiki' in x)]

wikimedia     22
wikipedia     10
wikitravel    33
Name: site, dtype: int64

and it does the job, but somehow the index cries for 'contains' just like what the columns have...

它完成了这项工作,但不知何故,索引会像列所具有的那样要求“包含”......

Any better way to do that?

有没有更好的方法来做到这一点?

回答by Andy Hayden

A somewhat cheeky way could be to use loc:

一个有点厚颜无耻的方法可能是使用loc

In [11]: ds.loc['wiki': 'wikj']
Out[11]:
wikimedia     22
wikipedia     10
wikitravel    33
dtype: int64

This is essentially equivalent to ds[ds.index.map(lambda s: s.startswith('wiki'))].

这本质上等同于ds[ds.index.map(lambda s: s.startswith('wiki'))].

To do contains, as @DSM suggests, it's probably nicer to write as:

正如@DSM 所建议的那样,要做到包含,最好写成:

ds[['wiki' in s for s in ds.index]]

回答by Chris

Another solution using filter, see here:

使用的另一种解决方案filter,请参见此处

>>> ds.filter(like='wiki', axis=0)
wikimedia     22
wikipedia     10
wikitravel    33
dtype: int64

回答by cs95

How do I select rows by partial string matching on the index?

如何通过索引上的部分字符串匹配来选择行?

Updated For: 2019

更新日期:2019

We now have "vectorized" string methods for these operations (actually, they've been around for a while). All the solutions are are applicable as-iswith DataFrames.

我们现在已经为这些操作“向量化”了字符串方法(实际上,它们已经存在了一段时间)。所有的解决方案都适用的,是与DataFrames。

Setup

设置

s = pd.Series({'foo': 'x', 'foobar': 'y', 'baz': 'z'})
s

foo       x
foobar    y
baz       z
dtype: object

df = s.to_frame('abc')
df

       abc
foo      x
foobar   y
baz      z

The same solution will apply to both sand df!

相同的解决方案将适用于两者sdf



Searching for Prefix: str.startswith

搜索前缀: str.startswith

strdtype (more accurately, objectdtype) pd.Indexobjects now come with strmethods themselves, so you could more idiomatically specify this with Series.str.startswith,

strdtype(更准确地说,objectdtype)pd.Index对象现在带有str方法本身,因此您可以更惯用地指定它Series.str.startswith

# For the series, 
s.index.str.startswith('foo')         
# Similarly, for the DataFrame,
df.index.str.startswith('foo')

# array([ True,  True, False])

To select with this result, you can use boolean indexing,

要选择此结果,您可以使用布尔索引,

s[s.index.str.startswith('foo') ]

foo       x
foobar    y
dtype: object

df[df.index.str.startswith('foo')]

       abc
foo      x
foobar   y


Search Anywhere: str.contains

随处搜索: str.contains

Use Series.str.containsto perform a substring or regex based search anywhere in the string:

用于Series.str.contains在字符串中的任何位置执行基于子字符串或正则表达式的搜索:

s.index.str.contains('foo')
# Similarly,
df.index.str.contains('foo')

# array([ True,  True, False])

If you're simply matching on substrings, you can safely disable regex based search to improve performance: s.index.str.contains('foo', regex=False)

如果您只是在子字符串上进行匹配,则可以安全地禁用基于正则表达式的搜索以提高性能: s.index.str.contains('foo', regex=False)

For regex, you can use

对于正则表达式,您可以使用

s.index.str.contains('ba')
# Similarly,
df.index.str.contains('ba')

# array([False,  True,  True])


Micro-Optimizing with List Comprehensions

使用列表推导式进行微优化

From the perspective of performance, list comprehensions happen to be faster. The first option can be re-written with,

从性能的角度来看,列表推导会更快。第一个选项可以重写为,

[x.startswith('foo') for x in s.index]
# [True, True, False]

s[[x.startswith('foo') for x in s.index]]

foo       x
foobar    y
dtype: object

With regex, you can pre-compile a pattern and call re.search. For more info, see my extensive writeup at For loops with pandas - When should I care?.

使用正则表达式,您可以预编译模式并调用re.search. 有关更多信息,请参阅我在For loops with pandas - 我什么时候应该关心?.