pandas 数组维度为 3 时的混淆矩阵错误

Question

提问by blue-sky

This code :

这段代码：

from pandas_ml import ConfusionMatrix
y_actu = [1,2]
y_pred = [1,2]
cm = ConfusionMatrix(y_actu, y_pred)
cm.print_stats()

prints :

印刷：

population: 2
P: 1
N: 1
PositiveTest: 1
NegativeTest: 1
TP: 1
TN: 1
FP: 0
FN: 0
TPR: 1.0
TNR: 1.0
PPV: 1.0
NPV: 1.0
FPR: 0.0
FDR: 0.0
FNR: 0.0
ACC: 1.0
F1_score: 1.0
MCC: 1.0
informedness: 1.0
markedness: 1.0
prevalence: 0.5
LRP: inf
LRN: 0.0
DOR: inf
FOR: 0.0
/opt/conda/lib/python3.5/site-packages/pandas_ml/confusion_matrix/bcm.py:332: RuntimeWarning: divide by zero encountered in double_scalars
  return(np.float64(self.TPR) / self.FPR)

This is expected.

这是意料之中的。

Modifying code to :

修改代码为：

from pandas_ml import ConfusionMatrix
y_actu = [1,2,3]
y_pred = [1,2,3]
cm = ConfusionMatrix(y_actu, y_pred)
cm.print_stats()

change I made is :

我所做的改变是：

y_actu = [1,2,3]
y_pred = [1,2,3]

results in error :

导致错误：

OrderedDict([('Accuracy', 1.0), ('95% CI', (0.29240177382128668, nan)), ('No Information Rate', 'ToDo'), ('P-Value [Acc > NIR]', 0.29629629629629622), ('Kappa', 1.0), ("Mcnemar's Test P-Value", 'ToDo')])

ValueErrorTraceback (most recent call last)
<ipython-input-30-d8c5dc2bea73> in <module>()
      3 y_pred = [1,2,3]
      4 cm = ConfusionMatrix(y_actu, y_pred)
----> 5 cm.print_stats()

/opt/conda/lib/python3.5/site-packages/pandas_ml/confusion_matrix/abstract.py in print_stats(self, lst_stats)
    446         Prints statistics
    447         """
--> 448         print(self._str_stats(lst_stats))
    449 
    450     def get(self, actual=None, predicted=None):

/opt/conda/lib/python3.5/site-packages/pandas_ml/confusion_matrix/abstract.py in _str_stats(self, lst_stats)
    427         }
    428 
--> 429         stats = self.stats(lst_stats)
    430 
    431         d_stats_str = collections.OrderedDict([

/opt/conda/lib/python3.5/site-packages/pandas_ml/confusion_matrix/abstract.py in stats(self, lst_stats)
    390         d_stats = collections.OrderedDict()
    391         d_stats['cm'] = self
--> 392         d_stats['overall'] = self.stats_overall
    393         d_stats['class'] = self.stats_class
    394         return(d_stats)

/opt/conda/lib/python3.5/site-packages/pandas_ml/confusion_matrix/cm.py in __getattr__(self, attr)
     33         Returns (weighted) average statistics
     34         """
---> 35         return(self._avg_stat(attr))

/opt/conda/lib/python3.5/site-packages/pandas_ml/confusion_matrix/abstract.py in _avg_stat(self, stat)
    509             v = getattr(binary_cm, stat)
    510             print(v)
--> 511             s_values[cls] = v
    512         value = (s_values * self.true).sum() / self.population
    513         return(value)

/opt/conda/lib/python3.5/site-packages/pandas/core/series.py in __setitem__(self, key, value)
    771         # do the setitem
    772         cacher_needs_updating = self._check_is_chained_assignment_possible()
--> 773         setitem(key, value)
    774         if cacher_needs_updating:
    775             self._maybe_update_cacher()

/opt/conda/lib/python3.5/site-packages/pandas/core/series.py in setitem(key, value)
    767                     pass
    768 
--> 769             self._set_with(key, value)
    770 
    771         # do the setitem

/opt/conda/lib/python3.5/site-packages/pandas/core/series.py in _set_with(self, key, value)
    809             if key_type == 'integer':
    810                 if self.index.inferred_type == 'integer':
--> 811                     self._set_labels(key, value)
    812                 else:
    813                     return self._set_values(key, value)

/opt/conda/lib/python3.5/site-packages/pandas/core/series.py in _set_labels(self, key, value)
    826         if mask.any():
    827             raise ValueError('%s not contained in the index' % str(key[mask]))
--> 828         self._set_values(indexer, value)
    829 
    830     def _set_values(self, key, value):

/opt/conda/lib/python3.5/site-packages/pandas/core/series.py in _set_values(self, key, value)
    831         if isinstance(key, Series):
    832             key = key._values
--> 833         self._data = self._data.setitem(indexer=key, value=value)
    834         self._maybe_update_cacher()
    835 

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in setitem(self, **kwargs)
   3166 
   3167     def setitem(self, **kwargs):
-> 3168         return self.apply('setitem', **kwargs)
   3169 
   3170     def putmask(self, **kwargs):

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
   3054 
   3055             kwargs['mgr'] = self
-> 3056             applied = getattr(b, f)(**kwargs)
   3057             result_blocks = _extend_blocks(applied, result_blocks)
   3058 

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in setitem(self, indexer, value, mgr)
    685                         indexer.dtype == np.bool_ and
    686                         len(indexer[indexer]) == len(value)):
--> 687                     raise ValueError("cannot set using a list-like indexer "
    688                                      "with a different length than the value")
    689 

ValueError: cannot set using a list-like indexer with a different length than the value

Reading Assignment to containers in Pandasstates "Using endemic lists is not allowed on assignment and is not recommended to do this at all." have I created an endemic list ? What is an endemic list ?

阅读Pandas 中的容器分配说明“在分配中不允许使用地方性列表，并且根本不建议这样做。” 我创建了一个地方性清单吗？什么是地方病名单？

Answer 1

采纳答案by spies006

I would recommend using confusion_matrixfrom scikit-learn. The other metrics that you mention such as Precision, Recall, F1-score are also available from sklearn.metrics.

我建议使用confusion_matrixfrom scikit-learn。您提到的其他指标（例如精度、召回率、F1 分数）也可从sklearn.metrics.

>>> from sklearn.metrics import confusion_matrix
>>> y_actu = [1,2,3]
>>> y_pred = [1,2,3]
>>> confusion_matrix(y_actu, y_pred)
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

Answer 2

回答by Max Power

I also use and recommend the sklearn confusion_matrixfunction. Personally I also keep a "pretty-print confusion matrix"function handy with a few extra conveniences:

我也使用并推荐 sklearnconfusion_matrix功能。就我个人而言，我还保留了一个"pretty-print confusion matrix"带有一些额外便利的功能：

class labels printed along the confusion matrix axes
confusion matrix stats normalized so that all cells sum to 1
confusion matrix cell colors scaled according to cell-value
Additional metrics like F-score, etc printed below the confusion matrix.

沿混淆矩阵轴打印的类标签
混淆矩阵统计归一化，以便所有单元格总和为 1
根据单元格值缩放的混淆矩阵单元格颜色
其他指标，如 F-score 等，印在混淆矩阵下方。

Like this:

像这样：

Here's the plotting function, based largely on this example from the Scikit-Learn documentation

这是绘图函数，主要基于Scikit-Learn 文档中的这个示例

import matplotlib.pyplot as plt
import itertools
from sklearn.metrics import classification_report

def pretty_print_conf_matrix(y_true, y_pred, 
                             classes,
                             normalize=False,
                             title='Confusion matrix',
                             cmap=plt.cm.Blues):
    """
    Mostly stolen from: http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py

    Normalization changed, classification_report stats added below plot
    """

    cm = confusion_matrix(y_true, y_pred)

    # Configure Confusion Matrix Plot Aesthetics (no text yet) 
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title, fontsize=14)
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    plt.ylabel('True label', fontsize=12)
    plt.xlabel('Predicted label', fontsize=12)

    # Calculate normalized values (so all cells sum to 1) if desired
    if normalize:
        cm = np.round(cm.astype('float') / cm.sum(),2) #(axis=1)[:, np.newaxis]

    # Place Numbers as Text on Confusion Matrix Plot
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black",
                 fontsize=12)


    # Add Precision, Recall, F-1 Score as Captions Below Plot
    rpt = classification_report(y_true, y_pred)
    rpt = rpt.replace('avg / total', '      avg')
    rpt = rpt.replace('support', 'N Obs')

    plt.annotate(rpt, 
                 xy = (0,0), 
                 xytext = (-50, -140), 
                 xycoords='axes fraction', textcoords='offset points',
                 fontsize=12, ha='left')    

    # Plot
    plt.tight_layout()

And here's the example with the iris data used to generate the plot image:

这是用于生成绘图图像的虹膜数据的示例：

from sklearn import datasets
from sklearn.svm import SVC

#get data, make predictions
(X,y) = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=0.5)

clf = SVC()
clf.fit(X_train,y_train)
y_test_pred = clf.predict(X_test)


# Plot Confusion Matrix
plt.style.use('classic')
plt.figure(figsize=(3,3))
pretty_print_conf_matrix(y_test, y_test_pred, 
                         classes= ['0', '1', '2'],
                         normalize=True, 
                         title='Confusion Matrix')

Answer 3

回答by Ajax1234

Interestingly, when I run your code, I do not get the error that you received, and the code ran perfectly. I suggest you upgrade the pandas_ml library by running:

有趣的是，当我运行您的代码时，我没有收到您收到的错误，并且代码运行完美。我建议您通过运行以下命令升级 pandas_ml 库：

pip install --upgrade pandas_ml

Also, you need to upgrade pandas by running:

此外，您需要通过运行以下命令升级Pandas：

pip install --upgrade pandas

If that does not work, you can use pandas itself to create a confusion matrix:

如果这不起作用，您可以使用 Pandas 本身来创建一个混淆矩阵：

import pandas as pd
y_actu = pd.Series([1, 2, 3], name='Actual')
y_pred = pd.Series([1, 2, 3], name='Predicted')
df_confusion = pd.crosstab(y_actu, y_pred)
print df_confusion

Which will give you the table you are looking for.

这会给你你正在寻找的桌子。

Answer 4

回答by Alexey Trofimov

Seems like the error is not because of the array dimension:

似乎错误不是因为数组维度：

from pandas_ml import ConfusionMatrix
y_actu = [1,2,2]
y_pred = [1,1,2]
cm = ConfusionMatrix(y_actu, y_pred)
cm.print_stats()

this (binary classification problem) works fine.

这个（二元分类问题）工作正常。

Maybe confusion matrix of multiclass classification problem is just broken.

也许多类分类问题的混淆矩阵刚刚被打破。

Updated:ive just make these steps:

更新：我只需执行以下步骤：

conda update pandas

to get pandas 0.20.1 and then

得到Pandas 0.20.1 然后

pip install -U pandas_ml

now everything is fine with mulsiclass confusion matrix:

现在使用 mulsiclass 混淆矩阵一切正常：

from pandas_ml import ConfusionMatrix
y_actu = [1,2,3]
y_pred = [1,2,3]
cm = ConfusionMatrix(y_actu, y_pred)
cm.print_stats()

i got the output:

我得到了输出：

Class Statistics:

Classes                                       1         2         3
Population                                    3         3         3
P: Condition positive                         1         1         1
N: Condition negative                         2         2         2
Test outcome positive                         1         1         1
Test outcome negative                         2         2         2
TP: True Positive                             1         1         1
TN: True Negative                             2         2         2
FP: False Positive                            0         0         0
FN: False Negative                            0         0         0
TPR: (Sensitivity, hit rate, recall)          1         1         1
TNR=SPC: (Specificity)                        1         1         1
PPV: Pos Pred Value (Precision)               1         1         1
NPV: Neg Pred Value                           1         1         1
FPR: False-out                                0         0         0
FDR: False Discovery Rate                     0         0         0
FNR: Miss Rate                                0         0         0
ACC: Accuracy                                 1         1         1
F1 score                                      1         1         1
MCC: Matthews correlation coefficient         1         1         1
Informedness                                  1         1         1
Markedness                                    1         1         1
Prevalence                             0.333333  0.333333  0.333333
LR+: Positive likelihood ratio              inf       inf       inf
LR-: Negative likelihood ratio                0         0         0
DOR: Diagnostic odds ratio                  inf       inf       inf
FOR: False omission rate                      0         0         0

pandas 数组维度为 3 时的混淆矩阵错误

提问by blue-sky

采纳答案by spies006

回答by Max Power

回答by Ajax1234

回答by Alexey Trofimov

相关推荐

最近更新

标签

pandas 数组维度为 3 时的混淆矩阵错误

提问by blue-sky

采纳答案by spies006

回答by Max Power

回答by Ajax1234

回答by Alexey Trofimov

相关推荐

pandas 重命名熊猫系列中的某些值

Python Pandas Dataframe 替换低于阈值的值

pandas 熊猫的 Pythonic 类型提示？

如何在 Pandas 0.20.1+ 中重命名多级组中的所有列

相关推荐

最近更新

标签