Python UndefinedMetricWarning:F 分数定义不明确,在没有预测样本的标签中设置为 0.0
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43162506/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples
提问by Sticky
I'm getting this weird error:
我收到这个奇怪的错误:
classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)`
but then it also prints the f-score the first time I run:
但它也会在我第一次运行时打印 f-score:
metrics.f1_score(y_test, y_pred, average='weighted')
The second time I run, it provides the score without error. Why is that?
我第二次运行时,它提供了没有错误的分数。这是为什么?
>>> y_pred = test.predict(X_test)
>>> y_test
array([ 1, 10, 35, 9, 7, 29, 26, 3, 8, 23, 39, 11, 20, 2, 5, 23, 28,
30, 32, 18, 5, 34, 4, 25, 12, 24, 13, 21, 38, 19, 33, 33, 16, 20,
18, 27, 39, 20, 37, 17, 31, 29, 36, 7, 6, 24, 37, 22, 30, 0, 22,
11, 35, 30, 31, 14, 32, 21, 34, 38, 5, 11, 10, 6, 1, 14, 12, 36,
25, 8, 30, 3, 12, 7, 4, 10, 15, 12, 34, 25, 26, 29, 14, 37, 23,
12, 19, 19, 3, 2, 31, 30, 11, 2, 24, 19, 27, 22, 13, 6, 18, 20,
6, 34, 33, 2, 37, 17, 30, 24, 2, 36, 9, 36, 19, 33, 35, 0, 4,
1])
>>> y_pred
array([ 1, 10, 35, 7, 7, 29, 26, 3, 8, 23, 39, 11, 20, 4, 5, 23, 28,
30, 32, 18, 5, 39, 4, 25, 0, 24, 13, 21, 38, 19, 33, 33, 16, 20,
18, 27, 39, 20, 37, 17, 31, 29, 36, 7, 6, 24, 37, 22, 30, 0, 22,
11, 35, 30, 31, 14, 32, 21, 34, 38, 5, 11, 10, 6, 1, 14, 30, 36,
25, 8, 30, 3, 12, 7, 4, 10, 15, 12, 4, 22, 26, 29, 14, 37, 23,
12, 19, 19, 3, 25, 31, 30, 11, 25, 24, 19, 27, 22, 13, 6, 18, 20,
6, 39, 33, 9, 37, 17, 30, 24, 9, 36, 39, 36, 19, 33, 35, 0, 4,
1])
>>> metrics.f1_score(y_test, y_pred, average='weighted')
C:\Users\Michael\Miniconda3\envs\snowflakes\lib\site-packages\sklearn\metrics\classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)
0.87282051282051276
>>> metrics.f1_score(y_test, y_pred, average='weighted')
0.87282051282051276
>>> metrics.f1_score(y_test, y_pred, average='weighted')
0.87282051282051276
Also, why is there a trailing 'precision', 'predicted', average, warn_for)
error message? There is no open parenthesis so why does it end with a closing parenthesis? I am running sklearn 0.18.1 using Python 3.6.0 in a conda environment on Windows 10.
另外,为什么会出现尾随'precision', 'predicted', average, warn_for)
错误消息?没有左括号,为什么它以右括号结尾?我在 Windows 10 的 conda 环境中使用 Python 3.6.0 运行 sklearn 0.18.1。
I also looked at hereand I don't know if it's the same bug. This SO postdoesn't have solution either.
回答by Shovalt
As mentioned in the comments, some labels in y_true don't appear in y_pred. Specifically in this case, label '2' is never predicted:
如评论中所述,y_true 中的某些标签未出现在 y_pred 中。特别是在这种情况下,永远不会预测标签“2”:
>>> set(y_test) - set(y_pred)
{2}
This means that there is no F-score to calculate for this label, and thus the F-score for this case is considered to be 0.0. Since you requested an average of the score, you must take into account that a score of 0 was included in the calculation, and this is why scikit-learn is showing you that warning.
这意味着没有要计算此标签的 F-score,因此这种情况下的 F-score 被认为是 0.0。由于您请求了分数的平均值,因此您必须考虑到计算中包含了 0 分,这就是 scikit-learn 向您显示该警告的原因。
This brings me to you not seeing the error a second time. As I mentioned, this is a warning, which is treated differently from an error in python. The default behavior in most environments is to show a specific warning only once. This behavior can be changed:
这让我告诉你没有第二次看到错误。正如我所提到的,这是一个警告,它的处理方式与 python 中的错误不同。大多数环境中的默认行为是只显示一次特定警告。可以更改此行为:
import warnings
warnings.filterwarnings('always') # "error", "ignore", "always", "default", "module" or "once"
If you set this before importing the other modules, you will see the warning every time you run the code.
如果在导入其他模块之前设置了此项,则每次运行代码时都会看到警告。
There is no way to avoid seeing this warning the first time, aside for setting warnings.filterwarnings('ignore')
. What you cando, is decide that you are not interested in the scores of labels that were not predicted, and then explicitly specify the labels you areinterested in (which are labels that were predicted at least once):
没有办法避免第一次看到这个警告,除了设置warnings.filterwarnings('ignore')
. 你有什么可以做的,就是决定你是不是在没有预测标签的分数感兴趣,然后明确指定标签的兴趣(其中至少有一次是预测标签):
>>> metrics.f1_score(y_test, y_pred, average='weighted', labels=np.unique(y_pred))
0.91076923076923078
The warning is not shown in this case.
在这种情况下不显示警告。
回答by normanius
The accepted answer explains already well why the warning occurs. If you simply want to control the warnings, one could use precision_recall_fscore_support
. It offers a (semi-official) argument warn_for
that could be used to mute the warnings.
接受的答案已经很好地解释了警告发生的原因。如果您只是想控制警告,可以使用precision_recall_fscore_support
. 它提供了一个(半官方的)参数warn_for
,可用于使警告静音。
(_, _, f1, _) = metrics.precision_recall_fscore_support(y_test, y_pred,
average='weighted',
warn_for=tuple())
As mentioned already in some comments, use this with care.
正如一些评论中已经提到的,请谨慎使用。
回答by Amir Md Amiruzzaman
Alternatively you could use the following lines of code
或者,您可以使用以下代码行
from sklearn.metrics import f1_score
metrics.f1_score(y_test, y_pred, labels=np.unique(y_pred))
This should remove your warning and give you the result you wanted
这应该会删除您的警告并为您提供您想要的结果
回答by petty.cf
the same problem also happened to me when i training my classification model. the reason caused this problem is as what the warning message said "in labels with no predicated samples", it will caused the zero-division when compute f1-score. I found another solution when i read sklearn.metrics.f1_scoredoc, there is a note as follows:
当我训练我的分类模型时,同样的问题也发生在我身上。导致这个问题的原因是警告消息说“在没有预测样本的标签中”,它会在计算f1-score时导致零除。我在阅读sklearn.metrics.f1_scoredoc时找到了另一个解决方案,有一个注释如下:
When true positive + false positive == 0, precision is undefined; When true positive + false negative == 0, recall is undefined. In such cases, by default the metric will be set to 0, as will f-score, and UndefinedMetricWarning will be raised. This behavior can be modified with zero_division
当真阳性 + 假阳性 == 0 时,精度未定义;当真阳性 + 假阴性 == 0 时,召回是不确定的。在这种情况下,默认情况下,度量将设置为 0,f-score 也是如此,并且将引发 UndefinedMetricWarning。可以使用 zero_division 修改此行为
the zero_division
default value is "warn"
, you could set it to 0
or 1
to avoid UndefinedMetricWarning
.
it works for me ;) oh wait, there is another problem when i using zero_division
, my sklearn report that no such keyword argument by using scikit-learn 0.21.3. Just update your sklearn to the latest version by running pip install scikit-learn -U
该zero_division
默认值是"warn"
,你可以将其设置为0
或1
要避免UndefinedMetricWarning
。它对zero_division
我有用;) 哦等等,当我使用时还有另一个问题,我的 sklearn 报告说使用 scikit-learn 0.21.3 没有这样的关键字参数。只需通过运行将您的 sklearn 更新到最新版本pip install scikit-learn -U
回答by Manula Vishvajith
As I have noticed this error occurs under two circumstances,
正如我注意到这个错误发生在两种情况下,
- If you have used train_test_split() to split your data, you have to make sure that you reset the index of the data (specially when taken using a pandas series object): y_train, y_test indices should be resetted. The problem is when you try to use one of the scores from sklearn.metrics such as; precision_score, this will try to match the shuffled indices of the y_test that you got from train_test_split().
- 如果您使用 train_test_split() 拆分数据,则必须确保重置数据的索引(特别是使用 Pandas 系列对象时): y_train, y_test 索引应该被重置。问题是当您尝试使用来自 sklearn.metrics 的分数之一时,例如;precision_score,这将尝试匹配您从 train_test_split() 获得的 y_test 的混洗索引。
so use, either np.array(y_test) for y_true in scores or y_test.reset_index(drop=True)
所以使用 np.array(y_test) for y_true in score 或 y_test.reset_index(drop=True)
- Then again you can still have this error if your predicted 'True Positives' is 0, which is used for precision, recall and f1_scores. You can visualize this using a confusion_matrix. If the classification is multilabel and you set param: average='weighted'/micro/macro you will get an answer as long as the diagonal line in the matrix is not 0
- 然后,如果您预测的“真阳性”为 0(用于精度、召回率和 f1_scores),您仍然会出现此错误。您可以使用混淆矩阵将其可视化。如果分类是多标签并且您设置了 param: average='weighted'/micro/macro 只要矩阵中的对角线不为 0,您就会得到答案
Hope this helps.
希望这可以帮助。
回答by Tw UxTLi51Nus
As the error message states, the method used to get the F score is from the "Classification" part of sklearn - thus the talking about "labels".
正如错误消息所述,用于获得 F 分数的方法来自 sklearn 的“分类”部分 - 因此谈论“标签”。
Do you have a regression problem? Sklearn provides a "F score" method for regression under the "feature selection" group: http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html
你有回归问题吗?sklearn在“特征选择”组下提供了一个“F分数”的回归方法:http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html
In case you do have a classification problem, @Shovalt's answer seems correct to me.
如果您确实有分类问题,@Shovalt 的答案对我来说似乎是正确的。