Python AttributeError: 'numpy.ndarray' 对象没有属性 'columns'
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35980747/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
AttributeError: 'numpy.ndarray' object has no attribute 'columns'
提问by Husterwgm
I'm trying to create a function to remove the features that are highly correlated with each other. However, I am getting the error ''AttributeError: 'numpy.ndarray' object has no attribute 'columns' '' ...
我正在尝试创建一个函数来删除彼此高度相关的功能。但是,我收到错误 ''AttributeError: 'numpy.ndarray' object has no attribute 'columns' '' ...
I just want to call pandas to read columns number. What can I do next?
我只想打电话给熊猫来读取列号。我接下来可以做什么?
import pandas as pd
import numpy as np
def remove_features_identical(DataFrame,data_source):
n=len(DataFrame.columns)
print 'dealing with %d features of %s data......... \n' % (n,data_source)
remove_ind = []
R = np.corrcoef(DataFrame.T)
for i in range(n-1):
for j in range(i+1,n):
if R[i,j]==1:
remove_ind.append(j)
DataFrame.drop(remove_ind, axis=1, inplace=True)
DataFrame.drop(remove_ind, axis=1, inplace=True)
print ('deleting %d columns with correration factor >0.99') % ( len(remove_ind))
return DataFrame
if __name__ == "__main__":
# load data and initialize y and x from train set and test set
df_train = pd.read_csv('train.csv')
df_test = pd.read_csv('test.csv')
y_train=df_train['TARGET'].values
X_train =df_train.drop(['ID','TARGET'], axis=1).values
y_test=[]
X_test = df_test.drop(['ID'], axis=1).values
# delete identical feartures in raw data
X_train = remove_features_identical(X_train,'train set')
X_test = remove_features_identical(X_test,'test set')
回答by hpaulj
Check the Pandas documentation, but I think
检查 Pandas 文档,但我认为
X_train =df_train.drop(['ID','TARGET'], axis=1).values
.values
returns a numpy
array, not a Pandas dataframe. An array does not have a columns
attribute.
.values
返回一个numpy
数组,而不是 Pandas 数据帧。数组没有columns
属性。
remove_features_identical
- if you pass this an array, make sure you are only using array, not dataframe, features. Otherwise, make sure you pass it a dataframe. And don't use variable names like DataFrame
.
remove_features_identical
- 如果你传递一个数组,确保你只使用数组,而不是数据框,功能。否则,请确保将数据帧传递给它。并且不要使用像DataFrame
.