pandas train_test_split 具有多种功能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49829023/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:28:10  来源:igfitidea点击:

train_test_split with multiple features

pythonpython-3.xpandasdataframescikit-learn

提问by Ekkasit Smithipanon

I'm currently trying to train a data set with a decision tree classifier but I couldn't get the train_test_split to work.

我目前正在尝试使用决策树分类器训练数据集,但我无法让 train_test_split 工作。

From the code below CS is the target output and EN SN JT FT PW YR LO LA are features input.

从下面的代码中,CS 是目标输出,而 EN SN JT FT PW YR LO LA 是特征输入。

All variables that went through OHL are in sparse matrix format whereas the other are in array taken straight from the dataframe.

通过 OHL 的所有变量都是稀疏矩阵格式,而其他变量则是直接从数据帧中获取的数组。

def OHL(x, column): #OneHotEncoder
    le = LabelEncoder()
    enc = OneHotEncoder()
    Labeled = le.fit_transform(x[column].astype(str))
    return enc.fit_transform(Labeled.reshape(-1,1))

###------------------------------------------------------------------------

df = pd.read_csv('h1b_kaggle.csv')
df = df.drop(['Unnamed: 0','WORKSITE'],1)

###------------------------------------------------------------------------

CS = OHL(df, 'CASE_STATUS')
EN = OHL(df, 'EMPLOYER_NAME')
SN = OHL(df, 'SOC_NAME')
JT = OHL(df, 'JOB_TITLE')
FT = OHL(df, 'FULL_TIME_POSITION')
PW = np.array(df['PREVAILING_WAGE'])
YR = OHL(df, 'YEAR')
LO = np.array(df['lon'])
LA = np.array(df['lat'])

回答by Ami Tavory

If you look at sklearn.model_selection.train_test_split, you can see it takes an *arraysargument. To split the first three of your arguments, therefore, you could use

如果您查看sklearn.model_selection.train_test_split,您会发现它需要一个*arrays参数。因此,要拆分前三个参数,您可以使用

CS_tr, CS_te, EN_tr, EN_te, SN_tr, SN_te = train_test_split(CS, EN, SN)

(of course, you can pass more arrays than that).

(当然,您可以传递比这更多的数组)。

Note that current versions of sklearnreturn sparse arrays when given sparse arrays.

请注意,当前版本sklearn在给定稀疏数组时返回稀疏数组。