根据包含 Pandas 中特定字符串的列名选择列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43643506/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
select columns based on columns names containing a specific string in pandas
提问by Eric B
I created a dataframe using the following:
我使用以下内容创建了一个数据框:
df = pd.DataFrame(np.random.rand(10, 3), columns=['alp1', 'alp2', 'bet1'])
I'd like to get a dataframe containing every columns from df
that have alp
in their names. This is only a light version of my problem, so my real dataframe will have more columns.
我想获得含有从每列的数据帧df
具有alp
在他们的名字。这只是我的问题的一个简单版本,所以我的真实数据框会有更多的列。
回答by MaxU
alternative methods:
替代方法:
In [13]: df.loc[:, df.columns.str.startswith('alp')]
Out[13]:
alp1 alp2
0 0.357564 0.108907
1 0.341087 0.198098
2 0.416215 0.644166
3 0.814056 0.121044
4 0.382681 0.110829
5 0.130343 0.219829
6 0.110049 0.681618
7 0.949599 0.089632
8 0.047945 0.855116
9 0.561441 0.291182
In [14]: df.loc[:, df.columns.str.contains('alp')]
Out[14]:
alp1 alp2
0 0.357564 0.108907
1 0.341087 0.198098
2 0.416215 0.644166
3 0.814056 0.121044
4 0.382681 0.110829
5 0.130343 0.219829
6 0.110049 0.681618
7 0.949599 0.089632
8 0.047945 0.855116
9 0.561441 0.291182
回答by piRSquared
option 1
Full numpy
+ pd.DataFrame
选项 1
完整numpy
+pd.DataFrame
m = np.core.defchararray.find(df.columns.values.astype(str), 'alp') >= 0
pd.DataFrame(df.values[:, m], df.index, df.columns[m])
alp1 alp2
0 0.819189 0.356867
1 0.900406 0.968947
2 0.201382 0.658768
3 0.700727 0.946509
4 0.176423 0.290426
5 0.132773 0.378251
6 0.749374 0.983251
7 0.768689 0.415869
8 0.292140 0.457596
9 0.214937 0.976780
option 2numpy
+ loc
选项 2numpy
+loc
m = np.core.defchararray.find(df.columns.values.astype(str), 'alp') >= 0
df.loc[:, m]
alp1 alp2
0 0.819189 0.356867
1 0.900406 0.968947
2 0.201382 0.658768
3 0.700727 0.946509
4 0.176423 0.290426
5 0.132773 0.378251
6 0.749374 0.983251
7 0.768689 0.415869
8 0.292140 0.457596
9 0.214937 0.976780
timingnumpy
is faster
时间numpy
更快
回答by CONvid19
You've several options, here's a couple:
您有多种选择,这里有几个:
1 - filter
with like
:
1 -filter
与like
:
df.filter(like='alp')
2 - filter
with regex
:
2 -filter
与regex
:
df.filter(regex='alp')
回答by Harvey
In case @Pedro answer doesn't work here is official way of doing it for pandas 0.25
如果@Pedro 的回答在这里不起作用,这是为Pandas 0.25 做的官方方法
Sample dataframe:
示例数据框:
>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
... index=['mouse', 'rabbit'],
... columns=['one', 'two', 'three'])
one two three mouse 1 2 3 rabbit 4 5 6
one two three mouse 1 2 3 rabbit 4 5 6
Select columns by name
按名称选择列
df.filter(items=['one', 'three'])
one three
mouse 1 3
rabbit 4 6
Select columns by regular expression
通过正则表达式选择列
df.filter(regex='e$', axis=1) #ending with *e*, for checking containing just use it without *$* in the end
one three
mouse 1 3
rabbit 4 6
Select rows containing 'bbi'
选择包含 'bbi' 的行
df.filter(like='bbi', axis=0)
one two three
rabbit 4 5 6