Python scikit 学习输出 metrics.classification_report 成 CSV/制表符分隔格式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39662398/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:33:21  来源:igfitidea点击:

scikit learn output metrics.classification_report into CSV/tab-delimited format

pythontextmachine-learningscikit-learnclassification

提问by Seun AJAO

I'm doing a multiclass text classification in Scikit-Learn. The dataset is being trained using the Multinomial Naive Bayes classifier having hundreds of labels. Here's an extract from the Scikit Learn script for fitting the MNB model

我正在 Scikit-Learn 中进行多类文本分类。该数据集正在使用具有数百个标签的多项朴素贝叶斯分类器进行训练。这是 Scikit Learn 脚本的摘录,用于拟合 MNB 模型

from __future__ import print_function

# Read **`file.csv`** into a pandas DataFrame

import pandas as pd
path = 'data/file.csv'
merged = pd.read_csv(path, error_bad_lines=False, low_memory=False)

# define X and y using the original DataFrame
X = merged.text
y = merged.grid

# split X and y into training and testing sets;
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

# import and instantiate CountVectorizer
from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()

# create document-term matrices using CountVectorizer
X_train_dtm = vect.fit_transform(X_train)
X_test_dtm = vect.transform(X_test)

# import and instantiate MultinomialNB
from sklearn.naive_bayes import MultinomialNB
nb = MultinomialNB()

# fit a Multinomial Naive Bayes model
nb.fit(X_train_dtm, y_train)

# make class predictions
y_pred_class = nb.predict(X_test_dtm)

# generate classification report
from sklearn import metrics
print(metrics.classification_report(y_test, y_pred_class))

And a simplified output of the metrics.classification_report on command line screen looks like this:

命令行屏幕上的 metrics.classification_report 的简化输出如下所示:

             precision  recall   f1-score   support
     12       0.84      0.48      0.61      2843
     13       0.00      0.00      0.00        69
     15       1.00      0.19      0.32       232
     16       0.75      0.02      0.05       965
     33       1.00      0.04      0.07       155
      4       0.59      0.34      0.43      5600
     41       0.63      0.49      0.55      6218
     42       0.00      0.00      0.00       102
     49       0.00      0.00      0.00        11
      5       0.90      0.06      0.12      2010
     50       0.00      0.00      0.00         5
     51       0.96      0.07      0.13      1267
     58       1.00      0.01      0.02       180
     59       0.37      0.80      0.51      8127
      7       0.91      0.05      0.10       579
      8       0.50      0.56      0.53      7555      
    avg/total 0.59      0.48      0.45     35919

I was wondering if there was any way to get the report output into a standard csv file with regular column headers

我想知道是否有任何方法可以将报告输出放入带有常规列标题的标准 csv 文件中

When I send the command line output into a csv file or try to copy/paste the screen output into a spreadsheet - Openoffice Calc or Excel, It lumps the results in one column. Looking like this:

当我将命令行输出发送到 csv 文件或尝试将屏幕输出复制/粘贴到电子表格 - Openoffice Calc 或 Excel 中时,它会将结果集中在一列中。看起来像这样:

enter image description here

在此处输入图片说明

Help appreciated. Thanks!

帮助表示赞赏。谢谢!

回答by janus235

As of scikit-learnv0.20, the easiest way to convert a classification report to a pandasDataframe is by simply having the report returned as a dict:

scikit-learnv0.20 开始,将分类报告转换为Dataframe 的最简单方法pandas是简单地将报告作为 返回dict

report = classification_report(y_test, y_pred, output_dict=True)

and then construct a Dataframe and transpose it:

然后构造一个Dataframe并转置它:

df = pandas.DataFrame(report).transpose()

From here on, you are free to use the standard pandasmethods to generate your desired output formats (CSV, HTML, LaTeX, ...).

从这里开始,您可以自由使用标准pandas方法来生成所需的输出格式(CSV、HTML、LaTeX 等)。

See also the documentation at https://scikit-learn.org/0.20/modules/generated/sklearn.metrics.classification_report.html

另请参阅https://scikit-learn.org/0.20/modules/generated/sklearn.metrics.classification_report.html 上的文档

回答by kindHymanet

If you want the individual scores this should do the job just fine.

如果您想要个人分数,这应该可以很好地完成工作。

import pandas as pd

def classification_report_csv(report):
    report_data = []
    lines = report.split('\n')
    for line in lines[2:-3]:
        row = {}
        row_data = line.split('      ')
        row['class'] = row_data[0]
        row['precision'] = float(row_data[1])
        row['recall'] = float(row_data[2])
        row['f1_score'] = float(row_data[3])
        row['support'] = float(row_data[4])
        report_data.append(row)
    dataframe = pd.DataFrame.from_dict(report_data)
    dataframe.to_csv('classification_report.csv', index = False)

report = classification_report(y_true, y_pred)
classification_report_csv(report)

回答by PankajKabra

We can get the actual values from the precision_recall_fscore_support function and then put them into data frames. the below code will give the same result, but now in pandas df :).

我们可以从 precision_recall_fscore_support 函数中获取实际值,然后将它们放入数据帧中。下面的代码将给出相同的结果,但现在在 pandas df :)。

clf_rep = metrics.precision_recall_fscore_support(true, pred)
out_dict = {
             "precision" :clf_rep[0].round(2)
            ,"recall" : clf_rep[1].round(2)
            ,"f1-score" : clf_rep[2].round(2)
            ,"support" : clf_rep[3]
            }
out_df = pd.DataFrame(out_dict, index = nb.classes_)
avg_tot = (out_df.apply(lambda x: round(x.mean(), 2) if x.name!="support" else  round(x.sum(), 2)).to_frame().T)
avg_tot.index = ["avg/total"]
out_df = out_df.append(avg_tot)
print out_df

回答by Kam Sen

While the previous answers are probably all working I found them a bit verbose. The following stores the individual class results as well as the summary line in a single dataframe. Not very sensitive to changes in the report but did the trick for me.

虽然以前的答案可能都有效,但我发现它们有点冗长。以下将单个类结果以及汇总行存储在单个数据框中。对报告中的变化不是很敏感,但对我有用。

#init snippet and fake data
from io import StringIO
import re
import pandas as pd
from sklearn import metrics
true_label = [1,1,2,2,3,3]
pred_label = [1,2,2,3,3,1]

def report_to_df(report):
    report = re.sub(r" +", " ", report).replace("avg / total", "avg/total").replace("\n ", "\n")
    report_df = pd.read_csv(StringIO("Classes" + report), sep=' ', index_col=0)        
    return(report_df)

#txt report to df
report = metrics.classification_report(true_label, pred_label)
report_df = report_to_df(report)

#store, print, copy...
print (report_df)

Which gives the desired output:

这给出了所需的输出:

Classes precision   recall  f1-score    support
1   0.5 0.5 0.5 2
2   0.5 0.5 0.5 2
3   0.5 0.5 0.5 2
avg/total   0.5 0.5 0.5 6

回答by Samuel Nde

Just import pandas as pdand make sure that you set the output_dictparameter which by default is Falseto Truewhen computing the classification_report. This will result in an classification_report dictionarywhich you can then pass to a pandas DataFramemethod. You may want to transposethe resulting DataFrameto fit the fit the output format that you want. The resulting DataFramemay then be written to a csvfile as you wish.

只是import pandas as pd,确保您设置的output_dict参数,默认情况下是FalseTrue计算的时候classification_report。这将导致classification_report dictionary您可以将其传递给pandas DataFrame方法。您可能希望transpose结果DataFrame适合您想要的输出格式。DataFrame然后可以根据需要将结果写入csv文件。

clsf_report = pd.DataFrame(classification_report(y_true = your_y_true, y_pred = your_y_preds5, output_dict=True)).transpose()
clsf_report.to_csv('Your Classification Report Name.csv', index= True)

I hope this helps.

我希望这有帮助。

回答by Yash Nag

It's obviously a better idea to just output the classification report as dict:

将分类报告输出为dict显然是一个更好的主意:

sklearn.metrics.classification_report(y_true, y_pred, output_dict=True)

But here's a function I made to convert all classes(only classes) results to a pandas dataframe.

但这是我用来将所有类(仅类)结果转换为熊猫数据框的函数。

def report_to_df(report):
    report = [x.split(' ') for x in report.split('\n')]
    header = ['Class Name']+[x for x in report[0] if x!='']
    values = []
    for row in report[1:-5]:
        row = [value for value in row if value!='']
        if row!=[]:
            values.append(row)
    df = pd.DataFrame(data = values, columns = header)
    return df

Hope this works fine for you.

希望这对你有用。

回答by Raul

As mentioned in one of the posts in here, precision_recall_fscore_supportis analogous to classification_report.

正如这里的一篇帖子中提到的,precision_recall_fscore_support类似于classification_report.

Then it suffices to use python library pandasto easily format the data in a columnar format, similar to what classification_reportdoes. Here is an example:

那么使用python库pandas就可以轻松地将数据格式化为柱状格式,类似于什么 classification_report。下面是一个例子:

import numpy as np
import pandas as pd

from sklearn.metrics import classification_report
from  sklearn.metrics import precision_recall_fscore_support

np.random.seed(0)

y_true = np.array([0]*400 + [1]*600)
y_pred = np.random.randint(2, size=1000)

def pandas_classification_report(y_true, y_pred):
    metrics_summary = precision_recall_fscore_support(
            y_true=y_true, 
            y_pred=y_pred)

    avg = list(precision_recall_fscore_support(
            y_true=y_true, 
            y_pred=y_pred,
            average='weighted'))

    metrics_sum_index = ['precision', 'recall', 'f1-score', 'support']
    class_report_df = pd.DataFrame(
        list(metrics_summary),
        index=metrics_sum_index)

    support = class_report_df.loc['support']
    total = support.sum() 
    avg[-1] = total

    class_report_df['avg / total'] = avg

    return class_report_df.T

With classification_reportYou'll get something like:

随着classification_report你会得到类似的东西:

print(classification_report(y_true=y_true, y_pred=y_pred, digits=6))

Output:

输出:

             precision    recall  f1-score   support

          0   0.379032  0.470000  0.419643       400
          1   0.579365  0.486667  0.528986       600

avg / total   0.499232  0.480000  0.485248      1000

Then with our custom funtion pandas_classification_report:

然后使用我们的自定义功能pandas_classification_report

df_class_report = pandas_classification_report(y_true=y_true, y_pred=y_pred)
print(df_class_report)

Output:

输出:

             precision    recall  f1-score  support
0             0.379032  0.470000  0.419643    400.0
1             0.579365  0.486667  0.528986    600.0
avg / total   0.499232  0.480000  0.485248   1000.0

Then just save it to csv format (refer to herefor other separator formating like sep=';'):

然后只需将其保存为 csv 格式(请参阅此处了解其他分隔符格式,如 sep=';'):

df_class_report.to_csv('my_csv_file.csv',  sep=',')

I open my_csv_file.csvwith LibreOffice Calc (although you could use any tabular/spreadsheet editor like excel): Result open with LibreOffice

my_csv_file.csv使用 LibreOffice Calc打开(尽管您可以使用任何表格/电子表格编辑器,例如 excel): 使用 LibreOffice 打开的结果

回答by elz

I also found some of the answers a bit verbose. Here is my three line solution, using precision_recall_fscore_supportas others have suggested.

我还发现一些答案有点冗长。这是我的三行解决方案,precision_recall_fscore_support按照其他人的建议使用。

import pandas as pd
from sklearn.metrics import precision_recall_fscore_support

report = pd.DataFrame(list(precision_recall_fscore_support(y_true, y_pred)),
            index=['Precision', 'Recall', 'F1-score', 'Support']).T

# Now add the 'Avg/Total' row
report.loc['Avg/Total', :] = precision_recall_fscore_support(y_true, y_test,
    average='weighted')
report.loc['Avg/Total', 'Support'] = report['Support'].sum()

回答by Surya

Along with example input-output,here's the other functionmetrics_report_to_df(). Implementing precision_recall_fscore_support from Sklearn metrics should do:

除了示例输入-输出,这里还有另一个函数metrics_report_to_df()。从 Sklearn 指标实现 precision_recall_fscore_support 应该:

# Generates classification metrics using precision_recall_fscore_support:
from sklearn import metrics
import pandas as pd
import numpy as np; from numpy import random

# Simulating true and predicted labels as test dataset: 
np.random.seed(10)
y_true = np.array([0]*300 + [1]*700)
y_pred = np.random.randint(2, size=1000)

# Here's the custom function returning classification report dataframe:
def metrics_report_to_df(ytrue, ypred):
    precision, recall, fscore, support = metrics.precision_recall_fscore_support(ytrue, ypred)
    classification_report = pd.concat(map(pd.DataFrame, [precision, recall, fscore, support]), axis=1)
    classification_report.columns = ["precision", "recall", "f1-score", "support"] # Add row w "avg/total"
    classification_report.loc['avg/Total', :] = metrics.precision_recall_fscore_support(ytrue, ypred, average='weighted')
    classification_report.loc['avg/Total', 'support'] = classification_report['support'].sum() 
    return(classification_report)

# Provide input as true_label and predicted label (from classifier)
classification_report = metrics_report_to_df(y_true, y_pred)

# Here's the output (metrics report transformed to dataframe )
In [1047]: classification_report
Out[1047]: 
           precision    recall  f1-score  support
0           0.300578  0.520000  0.380952    300.0
1           0.700624  0.481429  0.570703    700.0
avg/Total   0.580610  0.493000  0.513778   1000.0

回答by Karel Macek

Another option is to calculate the underlying data and compose the report on your own. All the statistics you will get by

另一种选择是计算基础数据并自行编写报告。您将获得的所有统计数据

precision_recall_fscore_support