pandas LabelEncoder 指定 DataFrame 中的类
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38893374/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
LabelEncoder specify classes in DataFrame
提问by gbhrea
I'm applying a LabelEncoder to a pandas DataFrame, df
我正在将 LabelEncoder 应用于 Pandas DataFrame, df
Feat1 Feat2 Feat3 Feat4 Feat5
A A A A E
B B C C E
C D C C E
D A C D E
I'm applying a label encoder to a dataframe like this -
我正在将标签编码器应用于这样的数据帧 -
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
intIndexed = df.apply(le.fit_transform)
This is how the labels are mapped
这是标签的映射方式
A = 0
B = 1
C = 2
D = 3
E = 0
I'm guessing that E
isn't given the value of 4
as it doesn't appear in any other column other than Feat 5
.
我猜它E
没有给出值,4
因为它没有出现在除Feat 5
.
I want E
to be given the value of 4
- but don't know how to do this in a DataFrame.
我想E
获得价值4
- 但不知道如何在 DataFrame 中做到这一点。
回答by Nickil Maveli
You could fit
the label encoder and later transform
the labels to their normalized encoding as follows:
您可以fit
将标签编码器和后来transform
的标签归一化编码,如下所示:
In [4]: from sklearn import preprocessing
...: import numpy as np
In [5]: le = preprocessing.LabelEncoder()
In [6]: le.fit(np.unique(df.values))
Out[6]: LabelEncoder()
In [7]: list(le.classes_)
Out[7]: ['A', 'B', 'C', 'D', 'E']
In [8]: df.apply(le.transform)
Out[8]:
Feat1 Feat2 Feat3 Feat4 Feat5
0 0 0 0 0 4
1 1 1 2 2 4
2 2 3 2 2 4
3 3 0 2 3 4
One way to specify labels by default would be:
默认情况下指定标签的一种方法是:
In [9]: labels = ['A', 'B', 'C', 'D', 'E']
In [10]: enc = le.fit(labels)
In [11]: enc.classes_ # sorts the labels in alphabetical order
Out[11]:
array(['A', 'B', 'C', 'D', 'E'],
dtype='<U1')
In [12]: enc.transform('E')
Out[12]: 4
回答by Anvesh_vs
You can fit and transform in single statement, Please find the code for encoding single column and assigning back to data frame.
您可以在单个语句中进行拟合和转换,请找到编码单列并分配回数据框的代码。
df[columnName] = LabelEncoder().fit_transform(df[columnName])