pandas Panda .loc 或 .iloc 从数据集中选择列

Question

提问by Naveen Balasubramanian

I have been trying to select a particular set of columns from a dataset for all the rows. I tried something like below.

我一直在尝试从数据集中为所有行选择一组特定的列。我试过类似下面的东西。

train_features = train_df.loc[,[0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]]

I want to mention that all rows are inclusive but only need the numbered columns. Is there any better way to approach this.

我想提一下，所有行都包含在内，但只需要编号的列。有没有更好的方法来解决这个问题。

sample data:

样本数据：

age  job        marital   education    default   housing   loan   equities   contact     duration   campaign   pdays   previous   poutcome   emp.var.rate   cons.price.idx   cons.conf.idx   euribor3m     nr.employed   y
56   housemaid  married   basic.4y     1         1         1      1          0           261        1          999     0          2          1.1            93.994           -36.4           3.299552287   5191          1
37   services   married   high.school  1         0         1      1          0           226        1          999     0          2          1.1            93.994           -36.4           0.743751247   5191          1
56   services   married   high.school  1         1         0      1          0           307        1          999     0          2          1.1            93.994           -36.4           1.28265179    5191          1

I'm trying to neglect job, marital, education and y column in my dataset. y column is the target variable.

我试图忽略数据集中的工作、婚姻、教育和 y 列。y 列是目标变量。

Answer 1

回答by jezrael

If need select by positions use iloc:

如果需要按职位选择，请使用iloc：

train_features = train_df.iloc[:, [0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]]
print (train_features)
   age  default  housing  loan  equities  contact  duration  campaign  pdays  \
0   56        1        1     1         1        0       261         1    999   
1   37        1        0     1         1        0       226         1    999   
2   56        1        1     0         1        0       307         1    999   

   previous  poutcome  emp.var.rate  cons.price.idx  cons.conf.idx  euribor3m  \
0         0         2           1.1          93.994          -36.4   3.299552   
1         0         2           1.1          93.994          -36.4   0.743751   
2         0         2           1.1          93.994          -36.4   1.282652   

   nr.employed  
0         5191  
1         5191  
2         5191

Another solution is dropunnecessary columns:

另一种解决方案是drop不必要的列：

cols= ['job','marital','education','y']
train_features = train_df.drop(cols, axis=1)
print (train_features)
   age  default  housing  loan  equities  contact  duration  campaign  pdays  \
0   56        1        1     1         1        0       261         1    999   
1   37        1        0     1         1        0       226         1    999   
2   56        1        1     0         1        0       307         1    999   

   previous  poutcome  emp.var.rate  cons.price.idx  cons.conf.idx  euribor3m  \
0         0         2           1.1          93.994          -36.4   3.299552   
1         0         2           1.1          93.994          -36.4   0.743751   
2         0         2           1.1          93.994          -36.4   1.282652   

   nr.employed  
0         5191  
1         5191  
2         5191

Answer 2

回答by piRSquared

You can access the column values via the the underlying numpy array

您可以通过底层的 numpy 数组访问列值

Consider the dataframe df

考虑数据框 df

df = pd.DataFrame(np.random.randint(10, size=(5, 20)))
df

You can slice the underlying array

您可以对底层数组进行切片

slc = [0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
df.values[:, slc]

array([[1, 3, 9, 8, 3, 2, 1, 6, 6, 0, 3, 9, 8, 5, 9, 9],
       [8, 0, 2, 3, 7, 8, 9, 2, 7, 2, 1, 3, 2, 5, 4, 9],
       [1, 1, 9, 3, 5, 8, 8, 8, 8, 4, 8, 0, 5, 4, 9, 0],
       [6, 3, 1, 8, 0, 3, 7, 9, 9, 0, 9, 7, 6, 1, 4, 8],
       [3, 2, 3, 3, 9, 8, 3, 8, 3, 4, 1, 6, 4, 1, 6, 4]])

Or you can reconstruct a new dataframe from this slice

或者你可以从这个切片重建一个新的数据帧

pd.DataFrame(df.values[:, slc], df.index, df.columns[slc])

This is notas clean and intuitive as

这是不干净和直观，

df.iloc[:, slc]

You could also use slcto slice the df.columnsobject and pass that to df.loc

您还可以使用slc切片df.columns对象并将其传递给df.loc

df.loc[:, df.columns[slc]]

pandas Panda .loc 或 .iloc 从数据集中选择列

提问by Naveen Balasubramanian

回答by jezrael

回答by piRSquared

相关推荐

最近更新

标签

pandas Panda .loc 或 .iloc 从数据集中选择列

提问by Naveen Balasubramanian

回答by jezrael

回答by piRSquared

相关推荐

pandas 熊猫连接不同的索引

pandas 将包含汉字的熊猫数据框保存到文件

Pandas 计算 groupby 函数中的空值

pandas 列中的熊猫最大值并减去

相关推荐

最近更新

标签