根据包含 Pandas 中特定字符串的列名选择列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43643506/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:29:23  来源:igfitidea点击:

select columns based on columns names containing a specific string in pandas

pythonpandas

提问by Eric B

I created a dataframe using the following:

我使用以下内容创建了一个数据框:

df = pd.DataFrame(np.random.rand(10, 3), columns=['alp1', 'alp2', 'bet1'])

I'd like to get a dataframe containing every columns from dfthat have alpin their names. This is only a light version of my problem, so my real dataframe will have more columns.

我想获得含有从每列的数据帧df具有alp在他们的名字。这只是我的问题的一个简单版本,所以我的真实数据框会有更多的列。

回答by MaxU

alternative methods:

替代方法:

In [13]: df.loc[:, df.columns.str.startswith('alp')]
Out[13]:
       alp1      alp2
0  0.357564  0.108907
1  0.341087  0.198098
2  0.416215  0.644166
3  0.814056  0.121044
4  0.382681  0.110829
5  0.130343  0.219829
6  0.110049  0.681618
7  0.949599  0.089632
8  0.047945  0.855116
9  0.561441  0.291182

In [14]: df.loc[:, df.columns.str.contains('alp')]
Out[14]:
       alp1      alp2
0  0.357564  0.108907
1  0.341087  0.198098
2  0.416215  0.644166
3  0.814056  0.121044
4  0.382681  0.110829
5  0.130343  0.219829
6  0.110049  0.681618
7  0.949599  0.089632
8  0.047945  0.855116
9  0.561441  0.291182

回答by piRSquared

option 1
Full numpy+ pd.DataFrame

选项 1
完整numpy+pd.DataFrame

m = np.core.defchararray.find(df.columns.values.astype(str), 'alp') >= 0
pd.DataFrame(df.values[:, m], df.index, df.columns[m])

       alp1      alp2
0  0.819189  0.356867
1  0.900406  0.968947
2  0.201382  0.658768
3  0.700727  0.946509
4  0.176423  0.290426
5  0.132773  0.378251
6  0.749374  0.983251
7  0.768689  0.415869
8  0.292140  0.457596
9  0.214937  0.976780

option 2
numpy+ loc

选项 2
numpy+loc

m = np.core.defchararray.find(df.columns.values.astype(str), 'alp') >= 0
df.loc[:, m]

       alp1      alp2
0  0.819189  0.356867
1  0.900406  0.968947
2  0.201382  0.658768
3  0.700727  0.946509
4  0.176423  0.290426
5  0.132773  0.378251
6  0.749374  0.983251
7  0.768689  0.415869
8  0.292140  0.457596
9  0.214937  0.976780


timing
numpyis faster

时间
numpy更快

enter image description here

在此处输入图片说明

回答by CONvid19

You've several options, here's a couple:

您有多种选择,这里有几个:

1 - filterwith like:

1 -filterlike

df.filter(like='alp')

2 - filterwith regex:

2 -filterregex

df.filter(regex='alp')

回答by Harvey

In case @Pedro answer doesn't work here is official way of doing it for pandas 0.25

如果@Pedro 的回答在这里不起作用,这是为Pandas 0.25 做的官方方法

Sample dataframe:

示例数据框:

>>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
...                   index=['mouse', 'rabbit'],
...                   columns=['one', 'two', 'three'])
         one two three
mouse     1   2   3
rabbit    4   5   6
         one two three
mouse     1   2   3
rabbit    4   5   6

Select columns by name

按名称选择列

df.filter(items=['one', 'three'])
         one  three
mouse     1      3
rabbit    4      6

Select columns by regular expression

通过正则表达式选择列

df.filter(regex='e$', axis=1) #ending with *e*, for checking containing just use it without *$* in the end
         one  three
mouse     1      3
rabbit    4      6

Select rows containing 'bbi'

选择包含 'bbi' 的行

df.filter(like='bbi', axis=0)
         one  two  three
rabbit    4    5      6