pandas 如何将训练和测试数据集拆分为 X_Train y_train 和 X_Test y_Test?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47321709/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split train and test dataset to X_Train y_train and X_Test y_Test?
提问by Gaurav Singh
So I successfully split my dataset into Train & Test in a ratio of 70:30 I used this:
所以我成功地将我的数据集以 70:30 的比例拆分为训练和测试,我使用了这个:
df_glass['split'] = np.random.randn(df_glass.shape[0], 1)
msk = np.random.rand(len(df_glass)) <= 0.7
train = df_glass[msk]
test = df_glass[~msk]
print(train)
print(test)
Now how do I split train and test to X_train
and y_train
and X_test
and y_test
Such that, X
denotes the features of the database and y denotes the response?
现在我如何将训练和测试拆分为X_train
andy_train
和X_test
andy_test
这样的,X
表示数据库的特征,y 表示响应?
I need to do supervised learning and apply ML modules on X_Train
and y_Train
.
我需要进行监督学习并在X_Train
和上应用 ML 模块y_Train
。
My database looks like this: Database_snippet
我的数据库如下所示: Database_snippet
回答by Vivek Kalyanarangan
Scikit-Learn has a convenience method for splitting pandas dataframes -
Scikit-Learn 有一个方便的方法来拆分 Pandas 数据帧 -
This will do the split -
这将进行拆分 -
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[list_of_X_cols], df['y'], test_size=0.33, random_state=42)
回答by Ariful Shuvo
i guess you may found this useful to understand..
我想你可能会发现这对理解很有用..
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.linear_model import LinearRegression
#importing dataset
dataset = pd.read_csv('Salary_Data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
#spliting the dataset into training and test set
x_train, x_test, y_train, y_test = train_test_split(x, y,
test_size=1/3, random_state=0)