pandas “类型错误:单例数组不能被视为有效集合”使用 sklearn train_test_split

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53800369/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:13:06  来源:igfitidea点击:

"TypeError: Singleton array cannot be considered a valid collection" using sklearn train_test_split

pythonpandasnumpymachine-learningscikit-learn

提问by John Samuel

TypeError: Singleton array array(0.2) cannot be considered a valid collection.

类型错误:单例数组 array(0.2) 不能被视为有效集合。

X = df.iloc[:, [1,7]].values
y= df.iloc[:,-1].values
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, 0.2)

I am getting this error when trying to train_test_split. I am able to train my model with X and y values. However, i would like to split my dataframe and then train and test it.

尝试 train_test_split 时出现此错误。我能够用 X 和 y 值训练我的模型。但是,我想拆分我的数据帧,然后对其进行训练和测试。

Any help is appreciated.

任何帮助表示赞赏。

回答by cs95

A not-so-commonly known fact is that train_test_splitcan split any number of arrays, not just two ("train", and "test"). See the linked docs and the source codefor more info.

一个鲜为人知的事实是,它train_test_split可以拆分任意数量的数组,而不仅仅是两个(“train”和“test”)。有关更多信息,请参阅链接的文档和源代码

For example,

例如,

np.random.seed(0)
df1 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))
y = df1.pop('C')
z = df1.pop('D')
X = df1

splits = train_test_split(X, y, z, test_size=0.2)
len(splits)
# 6

IOW, the only way to specify the test size is by specifying the keyword argument test_size. All positional arguments are assumed to be collections that are to be split, and in your case, since you do

IOW,指定测试大小的唯一方法是指定关键字参数test_size。假定所有位置参数都是要拆分的集合,在您的情况下,因为您这样做

train_test_split(X, y, 0.2)

The function tries to split 0.2, but since a float is not a collection, the error is raised. The solution is to (as mentioned), specify the keyword argument:

该函数尝试 split 0.2,但由于 float 不是集合,因此会引发错误。解决方案是(如前所述)指定关键字参数:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)