Python 熊猫数据帧过滤器正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37080612/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas DataFrame filter regex
提问by piRSquared
I don't understand pandas
DataFrame
filter
.
我不明白pandas
DataFrame
filter
。
Setup
设置
import pandas as pd
df = pd.DataFrame(
[
['Hello', 'World'],
['Just', 'Wanted'],
['To', 'Say'],
['I\'m', 'Tired']
]
)
Problem
问题
df.filter([0], regex=r'(Hel|Just)', axis=0)
I'd expect the [0]
to specify the 1st column as the one to look at and axis=0
to specify filtering rows. What I get is this:
我希望[0]
将第一列指定为要查看和axis=0
指定过滤行的列。我得到的是这样的:
0 1
0 Hello World
I was expecting
我期待
0 1
0 Hello World
1 Just Wanted
Question
题
- What would have gotten me what I expected?
- 什么会让我达到我的预期?
回答by unutbu
Per the docs,
根据文档,
Arguments are mutually exclusive, but this is not checked for
参数是互斥的,但这不会被检查
So, it appears, the first optional argument, items=[0]
trumps the third optional argument, regex=r'(Hel|Just)'
.
因此,似乎第一个可选参数items=[0]
胜过第三个可选参数regex=r'(Hel|Just)'
。
In [194]: df.filter([0], regex=r'(Hel|Just)', axis=0)
Out[194]:
0 1
0 Hello World
is equivalent to
相当于
In [201]: df.filter([0], axis=0)
Out[201]:
0 1
0 Hello World
which is merely selecting the row(s) with index values in [0]
along the 0-axis.
这只是选择[0]
沿 0 轴具有索引值的行。
To get the desired result, you could use str.contains
to create a boolean mask,
and use df.loc
to select rows:
要获得所需的结果,您可以使用str.contains
创建一个布尔掩码,并用于df.loc
选择行:
In [210]: df.loc[df.iloc[:,0].str.contains(r'(Hel|Just)')]
Out[210]:
0 1
0 Hello World
1 Just Wanted
回答by Max
This should work:
这应该有效:
df[df[0].str.contains('(Hel|Just)', regex=True)]
df[df[0].str.contains('(Hel|Just)', regex=True)]
回答by Ramin Melikov
Here is a chaining method:
这是一个链接方法:
df.loc[lambda x: x['column_name'].str.contains(regex_patern, regex = True)]