pandas 通过索引中的部分字符串匹配选择行

Question

提问by ronszon

Having a series like this:

有这样一个系列：

ds = Series({'wikipedia':10,'wikimedia':22,'wikitravel':33,'google':40})

google        40
wikimedia     22
wikipedia     10
wikitravel    33
dtype: int64

I would like to select the rows where 'wiki' is a part of the index label (a partial string label).

我想选择“wiki”是索引标签（部分字符串标签）的一部分的行。

For the moment I tried

目前我试过了

ds[ds.index.map(lambda x: 'wiki' in x)]

wikimedia     22
wikipedia     10
wikitravel    33
Name: site, dtype: int64

and it does the job, but somehow the index cries for 'contains' just like what the columns have...

它完成了这项工作，但不知何故，索引会像列所具有的那样要求“包含”......

Any better way to do that?

有没有更好的方法来做到这一点？

Answer 1

回答by Andy Hayden

A somewhat cheeky way could be to use loc:

一个有点厚颜无耻的方法可能是使用loc：

In [11]: ds.loc['wiki': 'wikj']
Out[11]:
wikimedia     22
wikipedia     10
wikitravel    33
dtype: int64

This is essentially equivalent to ds[ds.index.map(lambda s: s.startswith('wiki'))].

这本质上等同于ds[ds.index.map(lambda s: s.startswith('wiki'))].

To do contains, as @DSM suggests, it's probably nicer to write as:

正如@DSM 所建议的那样，要做到包含，最好写成：

ds[['wiki' in s for s in ds.index]]

Answer 2

回答by Chris

Another solution using filter, see here:

使用的另一种解决方案filter，请参见此处：

>>> ds.filter(like='wiki', axis=0)
wikimedia     22
wikipedia     10
wikitravel    33
dtype: int64

Answer 3

回答by cs95

How do I select rows by partial string matching on the index?

如何通过索引上的部分字符串匹配来选择行？

Updated For: 2019

更新日期：2019

We now have "vectorized" string methods for these operations (actually, they've been around for a while). All the solutions are are applicable as-iswith DataFrames.

我们现在已经为这些操作“向量化”了字符串方法（实际上，它们已经存在了一段时间）。所有的解决方案都适用的，是与DataFrames。

Setup

设置

s = pd.Series({'foo': 'x', 'foobar': 'y', 'baz': 'z'})
s

foo       x
foobar    y
baz       z
dtype: object

df = s.to_frame('abc')
df

       abc
foo      x
foobar   y
baz      z

The same solution will apply to both sand df!

相同的解决方案将适用于两者s和df！

Searching for Prefix: `str.startswith`

搜索前缀： `str.startswith`

strdtype (more accurately, objectdtype) pd.Indexobjects now come with strmethods themselves, so you could more idiomatically specify this with Series.str.startswith,

strdtype（更准确地说，objectdtype）pd.Index对象现在带有str方法本身，因此您可以更惯用地指定它Series.str.startswith，

# For the series, 
s.index.str.startswith('foo')         
# Similarly, for the DataFrame,
df.index.str.startswith('foo')

# array([ True,  True, False])

To select with this result, you can use boolean indexing,

要选择此结果，您可以使用布尔索引，

s[s.index.str.startswith('foo') ]

foo       x
foobar    y
dtype: object

df[df.index.str.startswith('foo')]

       abc
foo      x
foobar   y

Search Anywhere: `str.contains`

随处搜索： `str.contains`

Use Series.str.containsto perform a substring or regex based search anywhere in the string:

用于Series.str.contains在字符串中的任何位置执行基于子字符串或正则表达式的搜索：

s.index.str.contains('foo')
# Similarly,
df.index.str.contains('foo')

# array([ True,  True, False])

If you're simply matching on substrings, you can safely disable regex based search to improve performance: s.index.str.contains('foo', regex=False)

如果您只是在子字符串上进行匹配，则可以安全地禁用基于正则表达式的搜索以提高性能： s.index.str.contains('foo', regex=False)

For regex, you can use

对于正则表达式，您可以使用

s.index.str.contains('ba')
# Similarly,
df.index.str.contains('ba')

# array([False,  True,  True])

Micro-Optimizing with List Comprehensions

使用列表推导式进行微优化

From the perspective of performance, list comprehensions happen to be faster. The first option can be re-written with,

从性能的角度来看，列表推导会更快。第一个选项可以重写为，

[x.startswith('foo') for x in s.index]
# [True, True, False]

s[[x.startswith('foo') for x in s.index]]

foo       x
foobar    y
dtype: object

With regex, you can pre-compile a pattern and call re.search. For more info, see my extensive writeup at For loops with pandas - When should I care?.

使用正则表达式，您可以预编译模式并调用re.search. 有关更多信息，请参阅我在For loops with pandas - 我什么时候应该关心？.

pandas 通过索引中的部分字符串匹配选择行

提问by ronszon

回答by Andy Hayden

回答by Chris

回答by cs95

How do I select rows by partial string matching on the index?

如何通过索引上的部分字符串匹配来选择行？

Updated For: 2019

更新日期：2019

Searching for Prefix: `str.startswith`

搜索前缀： `str.startswith`

Search Anywhere: `str.contains`

随处搜索： `str.contains`

Micro-Optimizing with List Comprehensions

使用列表推导式进行微优化

相关推荐

最近更新

标签

pandas 通过索引中的部分字符串匹配选择行

提问by ronszon

回答by Andy Hayden

回答by Chris

回答by cs95

How do I select rows by partial string matching on the index?

如何通过索引上的部分字符串匹配来选择行？

Updated For: 2019

更新日期：2019

Searching for Prefix: str.startswith

搜索前缀： str.startswith

Search Anywhere: str.contains

随处搜索： str.contains

Micro-Optimizing with List Comprehensions

使用列表推导式进行微优化

相关推荐

Pandas read_csv 用字符串 'nan' 填充空值，而不是解析日期

csv 和 xlsx 文件导入到 Pandas 数据框：速度问题

pandas 使用 NaN 添加两个系列

在 Python 中处理 Pandas DataFrames 列分区中的零

相关推荐

最近更新

标签

Searching for Prefix: `str.startswith`

搜索前缀： `str.startswith`

Search Anywhere: `str.contains`

随处搜索： `str.contains`