python - 如何将 numpy 数组附加到 Pandas 数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42314542/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python - how to append numpy array to a pandas dataframe
提问by DBE7
I have trained a Logistic Regression classifier to predict whether a review is positive or negative. Now, I want to append the predicted probabilities returned by the predict_proba
-function to my Pandas data frame containing the reviews. I tried doing something like:
我训练了一个逻辑回归分类器来预测评论是正面的还是负面的。现在,我想将predict_proba
- 函数返回的预测概率附加到包含评论的 Pandas 数据框中。我尝试做类似的事情:
test_data['prediction'] = sentiment_model.predict_proba(test_matrix)
Obviously, that doesn't work, since predict_proba
returns a 2D-numpy array. So, what is the most efficient way of doing this? I created test_matrix
with SciKit-Learn's CountVectorizer:
显然,这不起作用,因为predict_proba
返回一个 2D-numpy 数组。那么,这样做最有效的方法是什么?我test_matrix
使用 SciKit-Learn 的 CountVectorizer创建:
vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
train_matrix = vectorizer.fit_transform(train_data['review_clean'].values.astype('U'))
test_matrix = vectorizer.transform(test_data['review_clean'].values.astype('U'))
Sample data looks like:
示例数据如下所示:
| Review | Prediction |
| ------------------------------------------ | ------------------ |
| "Toy was great! Our six-year old loved it!"| 0.986 |
回答by Karthik Arumugham
Assign the predictions to a variable and then extract the columns from the variable to be assigned to the pandas dataframe cols. If x
is the 2D numpy array with predictions,
将预测分配给变量,然后从变量中提取要分配给熊猫数据框 cols 的列。如果x
是带有预测的二维 numpy 数组,
x = sentiment_model.predict_proba(test_matrix)
then you can do,
那么你可以这样做
test_data['prediction0'] = x[:,0]
test_data['prediction1'] = x[:,1]