pandas LabelEncoder 指定 DataFrame 中的类

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38893374/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:47:41  来源:igfitidea点击:

LabelEncoder specify classes in DataFrame

pythonpandasmachine-learningscikit-learn

提问by gbhrea

I'm applying a LabelEncoder to a pandas DataFrame, df

我正在将 LabelEncoder 应用于 Pandas DataFrame, df

Feat1  Feat2  Feat3  Feat4  Feat5
  A      A      A      A      E
  B      B      C      C      E
  C      D      C      C      E
  D      A      C      D      E

I'm applying a label encoder to a dataframe like this -

我正在将标签编码器应用于这样的数据帧 -

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
intIndexed = df.apply(le.fit_transform)

This is how the labels are mapped

这是标签的映射方式

A = 0
B = 1
C = 2
D = 3
E = 0

I'm guessing that Eisn't given the value of 4as it doesn't appear in any other column other than Feat 5.

我猜它E没有给出值,4因为它没有出现在除Feat 5.

I want Eto be given the value of 4- but don't know how to do this in a DataFrame.

我想E获得价值4- 但不知道如何在 DataFrame 中做到这一点。

回答by Nickil Maveli

You could fitthe label encoder and later transformthe labels to their normalized encoding as follows:

您可以fit将标签编码器和后来transform的标签归一化编码,如下所示:

In [4]: from sklearn import preprocessing
   ...: import numpy as np

In [5]: le = preprocessing.LabelEncoder()

In [6]: le.fit(np.unique(df.values))
Out[6]: LabelEncoder()

In [7]: list(le.classes_)
Out[7]: ['A', 'B', 'C', 'D', 'E']

In [8]: df.apply(le.transform)
Out[8]: 
   Feat1  Feat2  Feat3  Feat4  Feat5
0      0      0      0      0      4
1      1      1      2      2      4
2      2      3      2      2      4
3      3      0      2      3      4


One way to specify labels by default would be:

默认情况下指定标签的一种方法是:

In [9]: labels = ['A', 'B', 'C', 'D', 'E']

In [10]: enc = le.fit(labels)

In [11]: enc.classes_                       # sorts the labels in alphabetical order
Out[11]: 
array(['A', 'B', 'C', 'D', 'E'], 
      dtype='<U1')

In [12]: enc.transform('E')
Out[12]: 4

回答by Anvesh_vs

You can fit and transform in single statement, Please find the code for encoding single column and assigning back to data frame.

您可以在单个语句中进行拟合和转换,请找到编码单列并分配回数据框的代码。

df[columnName] = LabelEncoder().fit_transform(df[columnName])