pandas 通过索引中的部分字符串匹配选择行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16617394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select rows by partial string match in index
提问by ronszon
Having a series like this:
有这样一个系列:
ds = Series({'wikipedia':10,'wikimedia':22,'wikitravel':33,'google':40})
google 40
wikimedia 22
wikipedia 10
wikitravel 33
dtype: int64
I would like to select the rows where 'wiki' is a part of the index label (a partial string label).
我想选择“wiki”是索引标签(部分字符串标签)的一部分的行。
For the moment I tried
目前我试过了
ds[ds.index.map(lambda x: 'wiki' in x)]
wikimedia 22
wikipedia 10
wikitravel 33
Name: site, dtype: int64
and it does the job, but somehow the index cries for 'contains' just like what the columns have...
它完成了这项工作,但不知何故,索引会像列所具有的那样要求“包含”......
Any better way to do that?
有没有更好的方法来做到这一点?
回答by Andy Hayden
A somewhat cheeky way could be to use loc:
一个有点厚颜无耻的方法可能是使用loc:
In [11]: ds.loc['wiki': 'wikj']
Out[11]:
wikimedia 22
wikipedia 10
wikitravel 33
dtype: int64
This is essentially equivalent to ds[ds.index.map(lambda s: s.startswith('wiki'))].
这本质上等同于ds[ds.index.map(lambda s: s.startswith('wiki'))].
To do contains, as @DSM suggests, it's probably nicer to write as:
正如@DSM 所建议的那样,要做到包含,最好写成:
ds[['wiki' in s for s in ds.index]]
回答by Chris
回答by cs95
How do I select rows by partial string matching on the index?
如何通过索引上的部分字符串匹配来选择行?
Updated For: 2019
更新日期:2019
We now have "vectorized" string methods for these operations (actually, they've been around for a while). All the solutions are are applicable as-iswith DataFrames.
我们现在已经为这些操作“向量化”了字符串方法(实际上,它们已经存在了一段时间)。所有的解决方案都适用的,是与DataFrames。
Setup
设置
s = pd.Series({'foo': 'x', 'foobar': 'y', 'baz': 'z'})
s
foo x
foobar y
baz z
dtype: object
df = s.to_frame('abc')
df
abc
foo x
foobar y
baz z
The same solution will apply to both sand df!
相同的解决方案将适用于两者s和df!
Searching for Prefix: str.startswith
搜索前缀: str.startswith
strdtype (more accurately, objectdtype) pd.Indexobjects now come with strmethods themselves, so you could more idiomatically specify this with Series.str.startswith,
strdtype(更准确地说,objectdtype)pd.Index对象现在带有str方法本身,因此您可以更惯用地指定它Series.str.startswith,
# For the series,
s.index.str.startswith('foo')
# Similarly, for the DataFrame,
df.index.str.startswith('foo')
# array([ True, True, False])
To select with this result, you can use boolean indexing,
要选择此结果,您可以使用布尔索引,
s[s.index.str.startswith('foo') ]
foo x
foobar y
dtype: object
df[df.index.str.startswith('foo')]
abc
foo x
foobar y
Search Anywhere: str.contains
随处搜索: str.contains
Use Series.str.containsto perform a substring or regex based search anywhere in the string:
用于Series.str.contains在字符串中的任何位置执行基于子字符串或正则表达式的搜索:
s.index.str.contains('foo')
# Similarly,
df.index.str.contains('foo')
# array([ True, True, False])
If you're simply matching on substrings, you can safely disable regex based search to improve performance: s.index.str.contains('foo', regex=False)
如果您只是在子字符串上进行匹配,则可以安全地禁用基于正则表达式的搜索以提高性能: s.index.str.contains('foo', regex=False)
For regex, you can use
对于正则表达式,您可以使用
s.index.str.contains('ba')
# Similarly,
df.index.str.contains('ba')
# array([False, True, True])
Micro-Optimizing with List Comprehensions
使用列表推导式进行微优化
From the perspective of performance, list comprehensions happen to be faster. The first option can be re-written with,
从性能的角度来看,列表推导会更快。第一个选项可以重写为,
[x.startswith('foo') for x in s.index]
# [True, True, False]
s[[x.startswith('foo') for x in s.index]]
foo x
foobar y
dtype: object
With regex, you can pre-compile a pattern and call re.search. For more info, see my extensive writeup at For loops with pandas - When should I care?.
使用正则表达式,您可以预编译模式并调用re.search. 有关更多信息,请参阅我在For loops with pandas - 我什么时候应该关心?.

