python - 如何将 numpy 数组附加到 Pandas 数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42314542/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:32:19  来源:igfitidea点击:

python - how to append numpy array to a pandas dataframe

pythonpandasnumpymachine-learningscikit-learn

提问by DBE7

I have trained a Logistic Regression classifier to predict whether a review is positive or negative. Now, I want to append the predicted probabilities returned by the predict_proba-function to my Pandas data frame containing the reviews. I tried doing something like:

我训练了一个逻辑回归分类器来预测评论是正面的还是负面的。现在,我想将predict_proba- 函数返回的预测概率附加到包含评论的 Pandas 数据框中。我尝试做类似的事情:

test_data['prediction'] = sentiment_model.predict_proba(test_matrix)

Obviously, that doesn't work, since predict_probareturns a 2D-numpy array. So, what is the most efficient way of doing this? I created test_matrixwith SciKit-Learn's CountVectorizer:

显然,这不起作用,因为predict_proba返回一个 2D-numpy 数组。那么,这样做最有效的方法是什么?我test_matrix使用 SciKit-Learn 的 CountVectorizer创建:

vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
train_matrix = vectorizer.fit_transform(train_data['review_clean'].values.astype('U'))
test_matrix = vectorizer.transform(test_data['review_clean'].values.astype('U'))

Sample data looks like:

示例数据如下所示:

| Review                                     | Prediction         |                      
| ------------------------------------------ | ------------------ |
| "Toy was great! Our six-year old loved it!"|   0.986            |

回答by Karthik Arumugham

Assign the predictions to a variable and then extract the columns from the variable to be assigned to the pandas dataframe cols. If xis the 2D numpy array with predictions,

将预测分配给变量,然后从变量中提取要分配给熊猫数据框 cols 的列。如果x是带有预测的二维 numpy 数组,

x = sentiment_model.predict_proba(test_matrix)

then you can do,

那么你可以这样做

test_data['prediction0'] = x[:,0]
test_data['prediction1'] = x[:,1]