如何将 Pandas 数据框转换为 numpy 数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29489712/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to convert pandas data frame into numpy data frame
提问by jax
I have one simple data set with class label and stored as "mydata.csv",
我有一个带有类标签的简单数据集并存储为“mydata.csv”,
GA_ID PN_ID PC_ID MBP_ID GR_ID AP_ID class
0.033 6.652 6.681 0.194 0.874 3.177 0
0.034 9.039 6.224 0.194 1.137 3.177 0
0.035 10.936 10.304 1.015 0.911 4.9 1
0.022 10.11 9.603 1.374 0.848 4.566 1
i simply use given code to convert this data into numpy array so that i can use this data set for predictions and machine learning modeling but due to header is error has been raised "ValueError: could not convert string to float: " when i removed header from the file this method work well for me :
我只是使用给定的代码将此数据转换为 numpy 数组,以便我可以使用此数据集进行预测和机器学习建模,但由于标题错误已引发“ValueError:无法将字符串转换为浮点数:”当我删除标题时从文件中,这种方法对我很有效:
import numpy as np
#from sklearn import metrics
#from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
raw_data = open("/home/me/Desktop/scklearn/data.csv")
dataset = np.loadtxt(raw_data, delimiter=",")
X = dataset[:,0:5]
y = dataset[:,6]
i also tried to skip header but error occurs:
我也尝试跳过标题但发生错误:
dataset = np.loadtxt(raw_data, delimiter=",")[1:]
then i moved to pandas and able import data from this method:
然后我转移到了Pandas并能够从这个方法导入数据:
raw_data = pandas.read_csv("/home/me/Desktop/scklearn/data.csv")
but here I sucked again when i tried to convert this into numpy array its showing error like previous.
但是在这里,当我尝试将其转换为 numpy 数组时,我再次陷入困境,它的显示错误与之前一样。
is there any method available in pandas that can : save heathers as list :
大Pandas中是否有任何可用的方法可以:将石南花另存为列表:
header_list = ('GA_ID','PN_ID','PC_ID' ,'MBP_ID' ,'GR_ID' , 'AP_ID','class')
last column as class label and remaining part(1:4,0:5) to numpy array for model building:
最后一列作为类标签和剩余部分(1:4,0:5)到用于模型构建的 numpy 数组:
I have write down a code to get column list
我已经写下代码来获取列列表
clm_list = []
raw_data = pandas.read_csv("/home/me/Desktop/scklearn/data.csv")
clms = raw_data.columns()
for clm in clms:
clm_list.append(clm)
print clm_list ## produces column list
回答by jax
after reading a lot finally I achieved what I want and successfully implemented data on scikit-learn, code to convert CSV data with scikit-learn compatible form is given bellow. thanks
在阅读了很多之后,我终于实现了我想要的并在 scikit-learn 上成功实现了数据,下面给出了使用 scikit-learn 兼容形式转换 CSV 数据的代码。谢谢
import pandas as pd
r = pd.read_csv("/home/zebrafish/Desktop/ex.csv")
print r.values
clm_list = []
for column in r.columns:
clm_list.append(column)
X = r[clm_list[0:len(clm_list)-1]].values
y = r[clm_list[len(clm_list)-1]].values
print clm_list
print X
print y
out come of this code is exactly what i want :
这段代码的结果正是我想要的:
['GA_ID', 'PN_ID', 'PC_ID', 'MBP_ID', 'GR_ID', 'AP_ID', 'class']
[[ 0.033 6.652 6.681 0.194 0.874 3.177]
[ 0.034 9.039 6.224 0.194 1.137 3.177]
[ 0.035 10.936 10.304 1.015 0.911 4.9 ]
[ 0.022 10.11 9.603 1.374 0.848 4.566]]
[0 0 1 1]

