Python 在 Scikit Learn 中运行 SelectKBest 后获取特征名称的最简单方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39839112/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
The easiest way for getting feature names after running SelectKBest in Scikit Learn
提问by Aviade
I would like to make supervised learning.
我想做监督学习。
Until now I know to do supervised learning to all features.
直到现在我知道对所有特征进行监督学习。
However, I would like also to conduct experiment with the K best features.
但是,我还想对 K 个最佳特征进行实验。
I read the documentation and found the in Scikit learn there is SelectKBest method.
我阅读了文档,发现在 Scikit 中学习了 SelectKBest 方法。
Unfortunately, I am not sure how to create new dataframe after finding those best features:
不幸的是,我不确定在找到这些最佳功能后如何创建新的数据框:
Let's assume I would like to conduct experiment with 5 best features:
假设我想用 5 个最佳特征进行实验:
from sklearn.feature_selection import SelectKBest, f_classif
select_k_best_classifier = SelectKBest(score_func=f_classif, k=5).fit_transform(features_dataframe, targeted_class)
Now if I would add the next line:
现在,如果我要添加下一行:
dataframe = pd.DataFrame(select_k_best_classifier)
I will receive a new dataframe without feature names (only index starting from 0 to 4).
我将收到一个没有特征名称的新数据帧(只有从 0 到 4 的索引)。
I should replace it to:
我应该将其替换为:
dataframe = pd.DataFrame(fit_transofrmed_features, columns=features_names)
My question is how to create the features_names list??
我的问题是如何创建 features_names 列表?
I know that I should use: select_k_best_classifier.get_support()
我知道我应该使用:select_k_best_classifier.get_support()
Which returns array of boolean values.
它返回布尔值数组。
The true value in the array represent the index in the right column.
数组中的真值表示右列中的索引。
How should I use this boolean array with the array of all features names I can get via the method:
我应该如何将这个布尔数组与我可以通过该方法获得的所有功能名称的数组一起使用:
feature_names = list(features_dataframe.columns.values)
采纳答案by MMF
You can do the following :
您可以执行以下操作:
mask = select_k_best_classifier.get_support() #list of booleans
new_features = [] # The list of your K best features
for bool, feature in zip(mask, feature_names):
if bool:
new_features.append(feature)
Then change the name of your features:
然后更改您的功能的名称:
dataframe = pd.DataFrame(fit_transofrmed_features, columns=new_features)
回答by Reimar
This doesn't require loops.
这不需要循环。
# Create and fit selector
selector = SelectKBest(f_classif, k=5)
selector.fit(features_df, target)
# Get columns to keep and create new dataframe with those only
cols = selector.get_support(indices=True)
features_df_new = features_df.iloc[:,cols]
回答by Dmitriy Apollonin
For me this code works fine and is more 'pythonic':
对我来说,这段代码运行良好,而且更像是“pythonic”:
mask = select_k_best_classifier.get_support()
new_features = features_dataframe.columns[mask]
回答by Shah Muhammad Hamdi
Following code will help you in finding top K features with their F-scores. Let, X is the pandas dataframe, whose columns are all the features and y is the list of class labels.
以下代码将帮助您找到具有 F 分数的前 K 个特征。让,X 是 Pandas 数据框,其列是所有特征,y 是类标签列表。
import pandas as pd
from sklearn.feature_selection import SelectKBest, f_classif
#Suppose, we select 5 features with top 5 Fisher scores
selector = SelectKBest(f_classif, k = 5)
#New dataframe with the selected features for later use in the classifier. fit() method works too, if you want only the feature names and their corresponding scores
X_new = selector.fit_transform(X, y)
names = X.columns.values[selector.get_support()]
scores = selector.scores_[selector.get_support()]
names_scores = list(zip(names, scores))
ns_df = pd.DataFrame(data = names_scores, columns=['Feat_names', 'F_Scores'])
#Sort the dataframe for better visualization
ns_df_sorted = ns_df.sort_values(['F_Scores', 'Feat_names'], ascending = [False, True])
print(ns_df_sorted)
回答by Anuj Sharma
There is an another alternative method, which ,however, is not fast as above solutions.
还有另一种替代方法,但是它不如上述解决方案快。
# Use the selector to retrieve the best features
X_new = select_k_best_classifier.fit_transform(train[feature_cols],train['is_attributed'])
# Get back the kept features as a DataFrame with dropped columns as all 0s
selected_features = pd.DataFrame(select_k_best_classifier.inverse_transform(X_new),
index=train.index,
columns= feature_cols)
selected_columns = selected_features.columns[selected_features.var() !=0]

