Python 保留/切片熊猫中的特定列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15072005/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
keep/slice specific columns in pandas
提问by bdiamante
I know about these column slice methods:
我知道这些列切片方法:
df2 = df[["col1", "col2", "col3"]]and df2 = df.ix[:,0:2]
df2 = df[["col1", "col2", "col3"]]和 df2 = df.ix[:,0:2]
but I'm wondering if there is a way to slice columns from the front/middle/end of a dataframe in the same slice without specifically listing each one.
但我想知道是否有一种方法可以在同一切片中从数据帧的前/中/尾切片列,而无需专门列出每个列。
For example, a dataframe dfwith columns: col1, col2, col3, col4, col5 and col6.
例如,df具有列的数据框:col1、col2、col3、col4、col5 和 col6。
Is there a way to do something like this?
有没有办法做这样的事情?
df2 = df.ix[:, [0:2, "col5"]]
df2 = df.ix[:, [0:2, "col5"]]
I'm in the situation where I have hundreds of columns and routinely need to slice specific ones for different requests. I've checked through the documentation and haven't seen something like this. Have I overlooked something?
我的情况是我有数百个列,并且经常需要针对不同的请求对特定的列进行切片。我已经检查了文档,并没有看到类似的东西。我是否忽略了什么?
采纳答案by DSM
IIUC, the simplest way I can think of would be something like this:
IIUC,我能想到的最简单的方法是这样的:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.randn(5, 10))
>>> df[list(df.columns[:2]) + [7]]
0 1 7
0 0.210139 0.533249 1.780426
1 0.382136 0.083999 -0.392809
2 -0.237868 0.493646 -1.208330
3 1.242077 -0.781558 2.369851
4 1.910740 -0.643370 0.982876
where the listcall isn't optional because otherwise the Indexobject will try to vector-add itself to the 7.
其中list调用不是可选的,否则Index对象将尝试将自身向量添加到 7.
It would be possible to special-case something like numpy's r_so that
有可能像 numpyr_这样的特殊情况
df[col_[:2, "col5", 3:6]]
would work, although I don't know if it would be worth the trouble.
会起作用,虽然我不知道这是否值得麻烦。
回答by beardc
Not sure exactly what you're asking. If you want the first and last 5 rows of a specific column, you can do something like this
不确定你在问什么。如果您想要特定列的第一行和最后 5 行,您可以执行以下操作
df = pd.DataFrame({'col1': np.random.randint(0,3,1000),
'col2': np.random.rand(1000),
'col5': np.random.rand(1000)})
In [36]: df['col5']
Out[36]:
0 0.566218
1 0.305987
2 0.852257
3 0.932764
4 0.185677
...
996 0.268700
997 0.036250
998 0.470009
999 0.361089
Name: col5, Length: 1000
In [38]: df['col5'][(df.index < 5) | (df.index > (len(df) - 5))]
Out[38]:
0 0.566218
1 0.305987
2 0.852257
3 0.932764
4 0.185677
996 0.268700
997 0.036250
998 0.470009
999 0.361089
Name: col5
Or, more generally, you could write a function
或者,更一般地说,您可以编写一个函数
In [41]: def head_and_tail(df, n=5):
...: return df[(df.index < n) | (df.index > (len(df) - n))]
In [44]: head_and_tail(df, 7)
Out[44]:
col1 col2 col5
0 0 0.489944 0.566218
1 1 0.639213 0.305987
2 1 0.000690 0.852257
3 2 0.620568 0.932764
4 0 0.310816 0.185677
5 0 0.930496 0.678504
6 2 0.165250 0.440811
994 2 0.842181 0.636472
995 0 0.899453 0.830839
996 0 0.418264 0.268700
997 0 0.228304 0.036250
998 2 0.031277 0.470009
999 1 0.542502 0.361089
回答by K.-Michael Aye
If your column names have information that you can filter for, you could use df.filter(regex='name*'). I am using this to filter between my 189 data channels from a1_01 to b3_21 and it works fine.
如果您的列名有可以过滤的信息,您可以使用 df.filter(regex='name*')。我正在使用它在从 a1_01 到 b3_21 的 189 个数据通道之间进行过滤,并且工作正常。

