如何将 Pandas 数据框转换为 numpy 数据框

Question

提问by jax

I have one simple data set with class label and stored as "mydata.csv",

我有一个带有类标签的简单数据集并存储为“mydata.csv”，

GA_ID   PN_ID   PC_ID   MBP_ID  GR_ID   AP_ID   class
0.033   6.652   6.681   0.194   0.874   3.177     0
0.034   9.039   6.224   0.194   1.137   3.177     0
0.035   10.936  10.304  1.015   0.911   4.9       1
0.022   10.11   9.603   1.374   0.848   4.566     1

i simply use given code to convert this data into numpy array so that i can use this data set for predictions and machine learning modeling but due to header is error has been raised "ValueError: could not convert string to float: " when i removed header from the file this method work well for me :

我只是使用给定的代码将此数据转换为 numpy 数组，以便我可以使用此数据集进行预测和机器学习建模，但由于标题错误已引发“ValueError：无法将字符串转换为浮点数：”当我删除标题时从文件中，这种方法对我很有效：

import numpy as np
#from sklearn import metrics
#from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

raw_data = open("/home/me/Desktop/scklearn/data.csv")
dataset = np.loadtxt(raw_data, delimiter=",")
X = dataset[:,0:5]
y = dataset[:,6]

i also tried to skip header but error occurs:

我也尝试跳过标题但发生错误：

dataset = np.loadtxt(raw_data, delimiter=",")[1:]

then i moved to pandas and able import data from this method:

然后我转移到了Pandas并能够从这个方法导入数据：

raw_data = pandas.read_csv("/home/me/Desktop/scklearn/data.csv")

but here I sucked again when i tried to convert this into numpy array its showing error like previous.

但是在这里，当我尝试将其转换为 numpy 数组时，我再次陷入困境，它的显示错误与之前一样。

is there any method available in pandas that can : save heathers as list :

大Pandas中是否有任何可用的方法可以：将石南花另存为列表：

header_list = ('GA_ID','PN_ID','PC_ID' ,'MBP_ID' ,'GR_ID' , 'AP_ID','class')

last column as class label and remaining part(1:4,0:5) to numpy array for model building:

最后一列作为类标签和剩余部分（1:4,0:5）到用于模型构建的 numpy 数组：

I have write down a code to get column list

我已经写下代码来获取列列表

clm_list = []
raw_data = pandas.read_csv("/home/me/Desktop/scklearn/data.csv")
clms = raw_data.columns()
for clm in clms:
    clm_list.append(clm)
print clm_list ## produces column list

Answer 1

回答by jax

after reading a lot finally I achieved what I want and successfully implemented data on scikit-learn, code to convert CSV data with scikit-learn compatible form is given bellow. thanks

在阅读了很多之后，我终于实现了我想要的并在 scikit-learn 上成功实现了数据，下面给出了使用 scikit-learn 兼容形式转换 CSV 数据的代码。谢谢

import pandas as pd
r = pd.read_csv("/home/zebrafish/Desktop/ex.csv")
print r.values

clm_list = []
for column in r.columns:
    clm_list.append(column)


X = r[clm_list[0:len(clm_list)-1]].values
y = r[clm_list[len(clm_list)-1]].values

print clm_list
print X
print y

out come of this code is exactly what i want :

这段代码的结果正是我想要的：

['GA_ID', 'PN_ID', 'PC_ID', 'MBP_ID', 'GR_ID', 'AP_ID', 'class']

[[  0.033   6.652   6.681   0.194   0.874   3.177]
 [  0.034   9.039   6.224   0.194   1.137   3.177]
 [  0.035  10.936  10.304   1.015   0.911   4.9  ]
 [  0.022  10.11    9.603   1.374   0.848   4.566]]

[0 0 1 1]

如何将 Pandas 数据框转换为 numpy 数据框

提问by jax

回答by jax

相关推荐

最近更新

标签

如何将 Pandas 数据框转换为 numpy 数据框

提问by jax

回答by jax

相关推荐

pandas 一个数据帧的每一列的最大值和最小值

pandas 如何获得数据框的简单散点图（最好使用 seaborn）

pandas 如何获得两个数据帧的交集？

pandas 从数据框中的字符串中提取子字符串

相关推荐

最近更新

标签