Python scikit-learn：随机森林 class_weight 和 sample_weight 参数

Question

提问by user36047

I have a class imbalance problem and been experimenting with a weighted Random Forest using the implementation in scikit-learn (>= 0.16).

我有一个类不平衡问题，并且一直在使用 scikit-learn (>= 0.16) 中的实现来试验加权随机森林。

I have noticed that the implementation takes a class_weightparameter in the tree constructor and sample_weightparameter in the fit method to help solve class imbalance. Those two seem to be multiplied though to decide a final weight.

我注意到该实现在树构造函数中采用了class_weight参数，在 fit 方法中采用了sample_weight参数来帮助解决类不平衡问题。这两者似乎相乘以决定最终权重。

I have trouble understanding the following:

我无法理解以下内容：

In what stages of the tree construction/training/prediction are those weights used? I have seen some papers for weighted trees, but I am not sure what scikit implements.
What exactly is the difference between class_weight and sample_weight?

在树构建/训练/预测的哪个阶段使用这些权重？我看过一些关于加权树的论文，但我不确定 scikit 实现了什么。
class_weight 和 sample_weight 到底有什么区别？

Answer 1

采纳答案by Andreus

RandomForests are built on Trees, which are very well documented. Check how Trees use the sample weighting:

RandomForests 建立在 Trees 之上，有很好的文档记录。检查 Trees 如何使用样本权重：

User guide on decision trees- tells exactly what algorithm is used
Decision tree API- explains how sample_weight is used by trees (which for random forests, as you have determined, is the product of class_weight and sample_weight).

决策树用户指南- 准确说明使用的算法
决策树 API- 解释了树如何使用 sample_weight（对于随机森林，正如您所确定的，它是 class_weight 和 sample_weight 的乘积）。

As for the difference between class_weightand sample_weight: much can be determined simply by the nature of their datatypes. sample_weightis 1D array of length n_samples, assigning an explicit weight to each example used for training. class_weightis either a dictionary of each class to a uniform weight for that class (e.g., {1:.9, 2:.5, 3:.01}), or is a string telling sklearn how to automatically determine this dictionary.

至于class_weight和之间的区别sample_weight：可以简单地由它们的数据类型的性质来确定。sample_weight是 length 的一维数组，n_samples为每个用于训练的示例分配一个明确的权重。class_weight要么是每个类的字典到该类的统一权重（例如，{1:.9, 2:.5, 3:.01}），要么是一个字符串，告诉 sklearn 如何自动确定这个字典。

So the training weight for a given example is the product of it's explicitly named sample_weight(or 1if sample_weightis not provided), and it's class_weight(or 1if class_weightis not provided).

因此，给定示例的训练权重是它被明确命名sample_weight（或者1如果sample_weight没有提供）和它class_weight（或者1如果class_weight没有提供）的乘积。

Python scikit-learn：随机森林 class_weight 和 sample_weight 参数

提问by user36047

采纳答案by Andreus

相关推荐

最近更新

标签

Python scikit-learn：随机森林 class_weight 和 sample_weight 参数

提问by user36047

采纳答案by Andreus

相关推荐

在python中导入外部“.txt”文件

Python 将包含字符串的 Pandas 系列转换为布尔值

Python 使用 requests.get() 时未提供架构和其他错误

将 Python 列表写入 csv 中的列

相关推荐

最近更新

标签