pandas 按在熊猫中的位置选择多个数据框列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48545076/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selecting multiple dataframe columns by position in pandas
提问by aiden rosenblatt
I have a (large) dataframe. How can I select specific columns by position? e.g. columns 1..3, 5, 6
我有一个(大)数据框。如何按位置选择特定列?例如第 1..3、5、6 列
Rather than just drop column4, I am trying to do it in this way because there are a ton of rows in my dataset and I want to select by position:
我不只是删除 column4,而是尝试以这种方式执行此操作,因为我的数据集中有大量行并且我想按位置选择:
df=df[df.columns[0:2,4:5]]
but that gives IndexError: too many indices for array
但这给 IndexError: too many indices for array
DF input
DF输入
Col1 Col2 Col3 Col4 Col5 Col6
1 apple tomato pear banana banana
1 apple grape nan banana banana
1 apple nan banana banana banana
1 apple tomato banana banana banana
1 apple tomato banana banana banana
1 apple tomato banana banana banana
1 avacado tomato banana banana banana
1 toast tomato banana banana banana
1 grape tomato egg banana banana
DF output - desired
DF 输出 - 所需
Col1 Col2 Col3 Col5 Col6
1 apple tomato banana banana
1 apple grape banana banana
1 apple nan banana banana
1 apple tomato banana banana
1 apple tomato banana banana
1 apple tomato banana banana
1 avacado tomato banana banana
1 toast tomato banana banana
1 grape tomato banana banana
回答by YOBEN_S
回答by jpp
You can select columns 0, 1, 4 in this way:
您可以通过这种方式选择第 0、1、4 列:
df.iloc[:, [0, 1, 4]]
You can read more about this in Indexing and Selecting Data.
您可以在索引和选择数据中阅读有关此内容的更多信息。
? iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with python/numpy slice semantics). Allowed inputs are:
? An integer e.g. 5
? A list or array of integers [4, 3, 0]
? A slice object with ints 1:7
? A boolean array
? A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)
? iloc 主要基于整数位置(从轴的 0 到长度 1),但也可以与布尔数组一起使用。如果请求的索引器越界,.iloc 将引发 IndexError,但允许越界索引的切片索引器除外。(这符合 python/numpy 切片语义)。允许的输入是:
? 一个整数,例如 5
? 整数列表或数组 [4, 3, 0]
? 一个整数为 1:7 的切片对象
? 一个布尔数组
? 一个可调用函数,带有一个参数(调用 Series、DataFrame 或 Panel)并返回有效的索引输出(上述之一)
回答by YanSym
Use the pandas iloc method:
使用Pandas iloc 方法:
df_filtered = df.iloc[:, [1,2,3,5,6]]
回答by Tai
The error OP face is from df.columns[0:2,4:5]
where too many indices were put into. IIUC, you can put all the column names you need together to do a selection.
错误 OP face 来自df.columns[0:2,4:5]
放入太多索引的地方。IIUC,你可以把你需要的所有列名放在一起做一个选择。
from itertools import chain
cols_to_select = list(v for v in chain(df.columns[0:2], df.columns[4:5]))
df_filtered = df[cols_to_select]
If there can be name conflicts in cols_to_select, do selection using iloc
as jp_data_analysis suggested or np.r_
as Wen suggested.
如果 cols_to_select 中可能存在名称冲突,请iloc
按照 jp_data_analysis 建议或np.r_
Wen 建议进行选择。
回答by student
You can also use range
with concatenate
from numpy
and get columns where np.concatenate
is used to combine two different ranges:
您还可以使用range
与concatenate
从numpy
和获得,其中,列np.concatenate
被用于两个不同范围的结合:
import numpy as np
df = df[df.columns[np.concatenate([range(0,3),range(4,6)])]]
df
Output:
输出:
Col1 Col2 Col3 Col5 Col6
0 1 apple tomato banana banana
1 1 apple grape banana banana
2 1 apple NaN banana banana
3 1 apple tomato banana banana
4 1 apple tomato banana banana
5 1 apple tomato banana banana
6 1 avacado tomato banana banana
7 1 toast tomato banana banana
8 1 grape tomato banana banana