Python scikit 学习输出 metrics.classification_report 成 CSV/制表符分隔格式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39662398/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
scikit learn output metrics.classification_report into CSV/tab-delimited format
提问by Seun AJAO
I'm doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here's an extract from the Scikit Learn script for fitting the MNB model
我正在 Scikit-Learn 中进行多类文本分类。该数据集正在使用具有数百个标签的多项朴素贝叶斯分类器进行训练。这是 Scikit Learn 脚本的摘录,用于拟合 MNB 模型
from __future__ import print_function
# Read **`file.csv`** into a pandas DataFrame
import pandas as pd
path = 'data/file.csv'
merged = pd.read_csv(path, error_bad_lines=False, low_memory=False)
# define X and y using the original DataFrame
X = merged.text
y = merged.grid
# split X and y into training and testing sets;
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
# import and instantiate CountVectorizer
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()
# create document-term matrices using CountVectorizer
X_train_dtm = vect.fit_transform(X_train)
X_test_dtm = vect.transform(X_test)
# import and instantiate MultinomialNB
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()
# fit a Multinomial Naive Bayes model
nb.fit(X_train_dtm, y_train)
# make class predictions
y_pred_class = nb.predict(X_test_dtm)
# generate classification report
from sklearn import metrics
print(metrics.classification_report(y_test, y_pred_class))
And a simplified output of the metrics.classification_report on command line screen looks like this:
命令行屏幕上的 metrics.classification_report 的简化输出如下所示:
precision recall f1-score support
12 0.84 0.48 0.61 2843
13 0.00 0.00 0.00 69
15 1.00 0.19 0.32 232
16 0.75 0.02 0.05 965
33 1.00 0.04 0.07 155
4 0.59 0.34 0.43 5600
41 0.63 0.49 0.55 6218
42 0.00 0.00 0.00 102
49 0.00 0.00 0.00 11
5 0.90 0.06 0.12 2010
50 0.00 0.00 0.00 5
51 0.96 0.07 0.13 1267
58 1.00 0.01 0.02 180
59 0.37 0.80 0.51 8127
7 0.91 0.05 0.10 579
8 0.50 0.56 0.53 7555
avg/total 0.59 0.48 0.45 35919
I was wondering if there was any way to get the report output into a standard csv file with regular column headers
我想知道是否有任何方法可以将报告输出放入带有常规列标题的标准 csv 文件中
When I send the command line output into a csv file or try to copy/paste the screen output into a spreadsheet - Openoffice Calc or Excel, It lumps the results in one column. Looking like this:
当我将命令行输出发送到 csv 文件或尝试将屏幕输出复制/粘贴到电子表格 - Openoffice Calc 或 Excel 中时,它会将结果集中在一列中。看起来像这样:
Help appreciated. Thanks!
帮助表示赞赏。谢谢!
回答by janus235
As of scikit-learn
v0.20, the easiest way to convert a classification report to a pandas
Dataframe is by simply having the report returned as a dict
:
从scikit-learn
v0.20 开始,将分类报告转换为Dataframe 的最简单方法pandas
是简单地将报告作为 返回dict
:
report = classification_report(y_test, y_pred, output_dict=True)
and then construct a Dataframe and transpose it:
然后构造一个Dataframe并转置它:
df = pandas.DataFrame(report).transpose()
From here on, you are free to use the standard pandas
methods to generate your desired output formats (CSV, HTML, LaTeX, ...).
从这里开始,您可以自由使用标准pandas
方法来生成所需的输出格式(CSV、HTML、LaTeX 等)。
See also the documentation at https://scikit-learn.org/0.20/modules/generated/sklearn.metrics.classification_report.html
另请参阅https://scikit-learn.org/0.20/modules/generated/sklearn.metrics.classification_report.html 上的文档
回答by kindHymanet
If you want the individual scores this should do the job just fine.
如果您想要个人分数,这应该可以很好地完成工作。
import pandas as pd
def classification_report_csv(report):
report_data = []
lines = report.split('\n')
for line in lines[2:-3]:
row = {}
row_data = line.split(' ')
row['class'] = row_data[0]
row['precision'] = float(row_data[1])
row['recall'] = float(row_data[2])
row['f1_score'] = float(row_data[3])
row['support'] = float(row_data[4])
report_data.append(row)
dataframe = pd.DataFrame.from_dict(report_data)
dataframe.to_csv('classification_report.csv', index = False)
report = classification_report(y_true, y_pred)
classification_report_csv(report)
回答by PankajKabra
We can get the actual values from the precision_recall_fscore_support function and then put them into data frames. the below code will give the same result, but now in pandas df :).
我们可以从 precision_recall_fscore_support 函数中获取实际值,然后将它们放入数据帧中。下面的代码将给出相同的结果,但现在在 pandas df :)。
clf_rep = metrics.precision_recall_fscore_support(true, pred)
out_dict = {
"precision" :clf_rep[0].round(2)
,"recall" : clf_rep[1].round(2)
,"f1-score" : clf_rep[2].round(2)
,"support" : clf_rep[3]
}
out_df = pd.DataFrame(out_dict, index = nb.classes_)
avg_tot = (out_df.apply(lambda x: round(x.mean(), 2) if x.name!="support" else round(x.sum(), 2)).to_frame().T)
avg_tot.index = ["avg/total"]
out_df = out_df.append(avg_tot)
print out_df
回答by Kam Sen
While the previous answers are probably all working I found them a bit verbose. The following stores the individual class results as well as the summary line in a single dataframe. Not very sensitive to changes in the report but did the trick for me.
虽然以前的答案可能都有效,但我发现它们有点冗长。以下将单个类结果以及汇总行存储在单个数据框中。对报告中的变化不是很敏感,但对我有用。
#init snippet and fake data
from io import StringIO
import re
import pandas as pd
from sklearn import metrics
true_label = [1,1,2,2,3,3]
pred_label = [1,2,2,3,3,1]
def report_to_df(report):
report = re.sub(r" +", " ", report).replace("avg / total", "avg/total").replace("\n ", "\n")
report_df = pd.read_csv(StringIO("Classes" + report), sep=' ', index_col=0)
return(report_df)
#txt report to df
report = metrics.classification_report(true_label, pred_label)
report_df = report_to_df(report)
#store, print, copy...
print (report_df)
Which gives the desired output:
这给出了所需的输出:
Classes precision recall f1-score support
1 0.5 0.5 0.5 2
2 0.5 0.5 0.5 2
3 0.5 0.5 0.5 2
avg/total 0.5 0.5 0.5 6
回答by Samuel Nde
Just import pandas as pd
and make sure that you set the output_dict
parameter which by default is False
to True
when computing the classification_report
. This will result in an classification_report dictionary
which you can then pass to a pandas DataFrame
method. You may want to transpose
the resulting DataFrame
to fit the fit the output format that you want. The resulting DataFrame
may then be written to a csv
file as you wish.
只是import pandas as pd
,确保您设置的output_dict
参数,默认情况下是False
要True
计算的时候classification_report
。这将导致classification_report dictionary
您可以将其传递给pandas DataFrame
方法。您可能希望transpose
结果DataFrame
适合您想要的输出格式。DataFrame
然后可以根据需要将结果写入csv
文件。
clsf_report = pd.DataFrame(classification_report(y_true = your_y_true, y_pred = your_y_preds5, output_dict=True)).transpose()
clsf_report.to_csv('Your Classification Report Name.csv', index= True)
I hope this helps.
我希望这有帮助。
回答by Yash Nag
It's obviously a better idea to just output the classification report as dict:
将分类报告输出为dict显然是一个更好的主意:
sklearn.metrics.classification_report(y_true, y_pred, output_dict=True)
But here's a function I made to convert all classes(only classes) results to a pandas dataframe.
但这是我用来将所有类(仅类)结果转换为熊猫数据框的函数。
def report_to_df(report):
report = [x.split(' ') for x in report.split('\n')]
header = ['Class Name']+[x for x in report[0] if x!='']
values = []
for row in report[1:-5]:
row = [value for value in row if value!='']
if row!=[]:
values.append(row)
df = pd.DataFrame(data = values, columns = header)
return df
Hope this works fine for you.
希望这对你有用。
回答by Raul
As mentioned in one of the posts in here, precision_recall_fscore_support
is analogous to classification_report
.
正如这里的一篇帖子中提到的,precision_recall_fscore_support
类似于classification_report
.
Then it suffices to use python library pandas
to easily format the data in a columnar format, similar to what classification_report
does. Here is an example:
那么使用python库pandas
就可以轻松地将数据格式化为柱状格式,类似于什么 classification_report
。下面是一个例子:
import numpy as np
import pandas as pd
from sklearn.metrics import classification_report
from sklearn.metrics import precision_recall_fscore_support
np.random.seed(0)
y_true = np.array([0]*400 + [1]*600)
y_pred = np.random.randint(2, size=1000)
def pandas_classification_report(y_true, y_pred):
metrics_summary = precision_recall_fscore_support(
y_true=y_true,
y_pred=y_pred)
avg = list(precision_recall_fscore_support(
y_true=y_true,
y_pred=y_pred,
average='weighted'))
metrics_sum_index = ['precision', 'recall', 'f1-score', 'support']
class_report_df = pd.DataFrame(
list(metrics_summary),
index=metrics_sum_index)
support = class_report_df.loc['support']
total = support.sum()
avg[-1] = total
class_report_df['avg / total'] = avg
return class_report_df.T
With classification_report
You'll get something like:
随着classification_report
你会得到类似的东西:
print(classification_report(y_true=y_true, y_pred=y_pred, digits=6))
Output:
输出:
precision recall f1-score support
0 0.379032 0.470000 0.419643 400
1 0.579365 0.486667 0.528986 600
avg / total 0.499232 0.480000 0.485248 1000
Then with our custom funtion pandas_classification_report
:
然后使用我们的自定义功能pandas_classification_report
:
df_class_report = pandas_classification_report(y_true=y_true, y_pred=y_pred)
print(df_class_report)
Output:
输出:
precision recall f1-score support
0 0.379032 0.470000 0.419643 400.0
1 0.579365 0.486667 0.528986 600.0
avg / total 0.499232 0.480000 0.485248 1000.0
Then just save it to csv format (refer to herefor other separator formating like sep=';'):
然后只需将其保存为 csv 格式(请参阅此处了解其他分隔符格式,如 sep=';'):
df_class_report.to_csv('my_csv_file.csv', sep=',')
I open my_csv_file.csv
with LibreOffice Calc (although you could use any tabular/spreadsheet editor like excel):
我my_csv_file.csv
使用 LibreOffice Calc打开(尽管您可以使用任何表格/电子表格编辑器,例如 excel):
回答by elz
I also found some of the answers a bit verbose. Here is my three line solution, using precision_recall_fscore_support
as others have suggested.
我还发现一些答案有点冗长。这是我的三行解决方案,precision_recall_fscore_support
按照其他人的建议使用。
import pandas as pd
from sklearn.metrics import precision_recall_fscore_support
report = pd.DataFrame(list(precision_recall_fscore_support(y_true, y_pred)),
index=['Precision', 'Recall', 'F1-score', 'Support']).T
# Now add the 'Avg/Total' row
report.loc['Avg/Total', :] = precision_recall_fscore_support(y_true, y_test,
average='weighted')
report.loc['Avg/Total', 'Support'] = report['Support'].sum()
回答by Surya
Along with example input-output,here's the other functionmetrics_report_to_df(). Implementing precision_recall_fscore_support from Sklearn metrics should do:
除了示例输入-输出,这里还有另一个函数metrics_report_to_df()。从 Sklearn 指标实现 precision_recall_fscore_support 应该:
# Generates classification metrics using precision_recall_fscore_support:
from sklearn import metrics
import pandas as pd
import numpy as np; from numpy import random
# Simulating true and predicted labels as test dataset:
np.random.seed(10)
y_true = np.array([0]*300 + [1]*700)
y_pred = np.random.randint(2, size=1000)
# Here's the custom function returning classification report dataframe:
def metrics_report_to_df(ytrue, ypred):
precision, recall, fscore, support = metrics.precision_recall_fscore_support(ytrue, ypred)
classification_report = pd.concat(map(pd.DataFrame, [precision, recall, fscore, support]), axis=1)
classification_report.columns = ["precision", "recall", "f1-score", "support"] # Add row w "avg/total"
classification_report.loc['avg/Total', :] = metrics.precision_recall_fscore_support(ytrue, ypred, average='weighted')
classification_report.loc['avg/Total', 'support'] = classification_report['support'].sum()
return(classification_report)
# Provide input as true_label and predicted label (from classifier)
classification_report = metrics_report_to_df(y_true, y_pred)
# Here's the output (metrics report transformed to dataframe )
In [1047]: classification_report
Out[1047]:
precision recall f1-score support
0 0.300578 0.520000 0.380952 300.0
1 0.700624 0.481429 0.570703 700.0
avg/Total 0.580610 0.493000 0.513778 1000.0
回答by Karel Macek
Another option is to calculate the underlying data and compose the report on your own. All the statistics you will get by
另一种选择是计算基础数据并自行编写报告。您将获得的所有统计数据
precision_recall_fscore_support