pandas Panda .loc 或 .iloc 从数据集中选择列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43464015/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Panda .loc or .iloc to select the columns from a dataset
提问by Naveen Balasubramanian
I have been trying to select a particular set of columns from a dataset for all the rows. I tried something like below.
我一直在尝试从数据集中为所有行选择一组特定的列。我试过类似下面的东西。
train_features = train_df.loc[,[0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]]
I want to mention that all rows are inclusive but only need the numbered columns. Is there any better way to approach this.
我想提一下,所有行都包含在内,但只需要编号的列。有没有更好的方法来解决这个问题。
sample data:
样本数据:
age job marital education default housing loan equities contact duration campaign pdays previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m nr.employed y
56 housemaid married basic.4y 1 1 1 1 0 261 1 999 0 2 1.1 93.994 -36.4 3.299552287 5191 1
37 services married high.school 1 0 1 1 0 226 1 999 0 2 1.1 93.994 -36.4 0.743751247 5191 1
56 services married high.school 1 1 0 1 0 307 1 999 0 2 1.1 93.994 -36.4 1.28265179 5191 1
I'm trying to neglect job, marital, education and y column in my dataset. y column is the target variable.
我试图忽略数据集中的工作、婚姻、教育和 y 列。y 列是目标变量。
回答by jezrael
If need select by positions use iloc
:
如果需要按职位选择,请使用iloc
:
train_features = train_df.iloc[:, [0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]]
print (train_features)
age default housing loan equities contact duration campaign pdays \
0 56 1 1 1 1 0 261 1 999
1 37 1 0 1 1 0 226 1 999
2 56 1 1 0 1 0 307 1 999
previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m \
0 0 2 1.1 93.994 -36.4 3.299552
1 0 2 1.1 93.994 -36.4 0.743751
2 0 2 1.1 93.994 -36.4 1.282652
nr.employed
0 5191
1 5191
2 5191
Another solution is drop
unnecessary columns:
另一种解决方案是drop
不必要的列:
cols= ['job','marital','education','y']
train_features = train_df.drop(cols, axis=1)
print (train_features)
age default housing loan equities contact duration campaign pdays \
0 56 1 1 1 1 0 261 1 999
1 37 1 0 1 1 0 226 1 999
2 56 1 1 0 1 0 307 1 999
previous poutcome emp.var.rate cons.price.idx cons.conf.idx euribor3m \
0 0 2 1.1 93.994 -36.4 3.299552
1 0 2 1.1 93.994 -36.4 0.743751
2 0 2 1.1 93.994 -36.4 1.282652
nr.employed
0 5191
1 5191
2 5191
回答by piRSquared
You can access the column values via the the underlying numpy array
您可以通过底层的 numpy 数组访问列值
Consider the dataframe df
考虑数据框 df
df = pd.DataFrame(np.random.randint(10, size=(5, 20)))
df
You can slice the underlying array
您可以对底层数组进行切片
slc = [0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
df.values[:, slc]
array([[1, 3, 9, 8, 3, 2, 1, 6, 6, 0, 3, 9, 8, 5, 9, 9],
[8, 0, 2, 3, 7, 8, 9, 2, 7, 2, 1, 3, 2, 5, 4, 9],
[1, 1, 9, 3, 5, 8, 8, 8, 8, 4, 8, 0, 5, 4, 9, 0],
[6, 3, 1, 8, 0, 3, 7, 9, 9, 0, 9, 7, 6, 1, 4, 8],
[3, 2, 3, 3, 9, 8, 3, 8, 3, 4, 1, 6, 4, 1, 6, 4]])
Or you can reconstruct a new dataframe from this slice
或者你可以从这个切片重建一个新的数据帧
pd.DataFrame(df.values[:, slc], df.index, df.columns[slc])
This is notas clean and intuitive as
这是不干净和直观,
df.iloc[:, slc]
You could also use slc
to slice the df.columns
object and pass that to df.loc
您还可以使用slc
切片df.columns
对象并将其传递给df.loc
df.loc[:, df.columns[slc]]