Python 使用 scikit-learn 进行特征选择

Question

提问by sara

I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method:

我是机器学习的新手。我正在准备使用 Scikit Learn SVM 进行分类的数据。为了选择最佳功能，我使用了以下方法：

SelectKBest(chi2, k=10).fit_transform(A1, A2)

Since my dataset consist of negative values, I get the following error:

由于我的数据集包含负值，因此出现以下错误：

ValueError                                Traceback (most recent call last)

/media/5804B87404B856AA/TFM_UC3M/test2_v.py in <module>()
----> 1 
      2 
      3 
      4 
      5 

/usr/local/lib/python2.6/dist-packages/sklearn/base.pyc in fit_transform(self, X, y,     **fit_params)
    427         else:
    428             # fit method of arity 2 (supervised transformation)

--> 429             return self.fit(X, y, **fit_params).transform(X)
    430 
    431 

/usr/local/lib/python2.6/dist-packages/sklearn/feature_selection/univariate_selection.pyc in fit(self, X, y)
    300         self._check_params(X, y)
    301 
--> 302         self.scores_, self.pvalues_ = self.score_func(X, y)
    303         self.scores_ = np.asarray(self.scores_)
    304         self.pvalues_ = np.asarray(self.pvalues_)

/usr/local/lib/python2.6/dist-  packages/sklearn/feature_selection/univariate_selection.pyc in chi2(X, y)
    190     X = atleast2d_or_csr(X)
    191     if np.any((X.data if issparse(X) else X) < 0):
--> 192         raise ValueError("Input X must be non-negative.")
    193 
    194     Y = LabelBinarizer().fit_transform(y)

ValueError: Input X must be non-negative.

Can someone tell me how can I transform my data ?

有人能告诉我如何转换我的数据吗？

Answer 1

回答by Maxim

The error message Input X must be non-negativesays it all: Pearson's chi square test (goodness of fit)does not apply to negative values. It's logical because the chi square test assumes frequencies distribution and a frequency can't be a negative number. Consequently, sklearn.feature_selection.chi2asserts the input is non-negative.

错误消息Input X must be non-negative说明了一切：Pearson 卡方检验（拟合优度）不适用于负值。这是合乎逻辑的，因为卡方检验假设频率分布并且频率不能是负数。因此，sklearn.feature_selection.chi2断言输入是非负的。

You are saying that your features are "min, max, mean, median and FFT of accelerometer signal". In many cases, it may be quite safe to simply shift each feature to make it all positive, or even normalize to [0, 1]interval as suggested by EdChum.

您是说您的特征是“加速度计信号的最小值、最大值、平均值、中值和 FFT”。在许多情况下，简单地移动每个特征以使其全部为正值，或者甚至[0, 1]按照 EdChum 的建议将其标准化为区间可能是非常安全的。

If data transformation is for some reason not possible (e.g. a negative value is an important factor), you should pick another statistic to score your features:

如果由于某种原因无法进行数据转换（例如，负值是一个重要因素），您应该选择另一个统计数据来为您的特征评分：

sklearn.feature_selection.f_classifcomputes ANOVA f-value
sklearn.feature_selection.mutual_info_classifcomputes the mutual information

sklearn.feature_selection.f_classif计算方差分析 f 值
sklearn.feature_selection.mutual_info_classif计算互信息

Since the whole point of this procedure is to prepare the features for another method, it's not a big deal to pick anyone, the end result usually the same or very close.

由于此过程的重点是为另一种方法准备特征，因此挑选任何人都没什么大不了的，最终结果通常相同或非常接近。

Python 使用 scikit-learn 进行特征选择

提问by sara

回答by Maxim

相关推荐

最近更新

标签

Python 使用 scikit-learn 进行特征选择

提问by sara

回答by Maxim

相关推荐

Python 使用海龟图形的谢尔宾斯基三角递归

按值对字典进行排序python

为什么我的 python if 语句不起作用？

Python Pandas 数据框/Numpy 数组“轴”定义中的歧义

相关推荐

最近更新

标签