有什么方法可以在 Python pandas 中获取标签编码器的映射？

Question

提问by Gingerbread

I am converting strings to categorical values in my dataset using the following piece of code.

我正在使用以下代码将字符串转换为数据集中的分类值。

data['weekday'] = pd.Categorical.from_array(data.weekday).labels

For eg,

例如，

index    weekday
0        Sunday
1        Sunday
2        Wednesday
3        Monday
4        Monday
5        Thursday
6        Tuesday

After encoding the weekday, my dataset appears like this:

对工作日进行编码后，我的数据集如下所示：

index    weekday
    0       3
    1       3
    2       6
    3       1
    4       1
    5       4
    6       5

Is there any way I can know that Sunday has been mapped to 3, Wednesday to 6 and so on?

有什么办法可以知道星期日已映射到 3、星期三到 6 等等？

Answer 1

采纳答案by Algor Troy

The best way of doing this can be to use label encoder of sklearn library.

最好的方法是使用 sklearn 库的标签编码器。

Something like this:

像这样的东西：

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(["paris", "paris", "tokyo", "amsterdam"])
list(le.classes_)
le.transform(["tokyo", "tokyo", "paris"])
list(le.inverse_transform([2, 2, 1]))

Answer 2

回答by chinskiy

You can create additional dictionary with mapping:

您可以使用映射创建附加字典：

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(data['name'])
le_name_mapping = dict(zip(le.classes_, le.transform(le.classes_)))
print(le_name_mapping)
{'Tom': 0, 'Nick': 1, 'Kate': 2}

Answer 3

回答by Abhishek

A simple & elegant way to do the same.

一个简单而优雅的方法来做同样的事情。

cat_list = ['Sun', 'Sun', 'Wed', 'Mon', 'Mon']
encoded_data, mapping_index = pd.Series(cat_list).factorize()

and you are done, check below

和你做，请查看下面

print(encoded_data)
print(mapping_index)
print(mapping_index.get_loc("Mon"))

Answer 4

回答by John Zwinck

First, make a categorical series:

首先，制作一个分类系列：

weekdays = pd.Series(['Sun', 'Sun', 'Wed', 'Mon', 'Mon']).astype('category')

Then, inspect its "categories":

然后，检查它的“类别”：

weekdays.cat.categories.get_loc('Sun')

Answer 5

回答by ssm

There are many ways of doing this. You can consider pd.factorize, sklearn.preprocessing.LabelEncoderetc. However, in this specific case, you have two options which will suit you best:

有很多方法可以做到这一点。你可以考虑pd.factorize，sklearn.preprocessing.LabelEncoder等等。然而，在这种特殊情况下，你必须将最适合您两种选择：

Going by your own method, you can add the categories:

按照您自己的方法，您可以添加类别：

pd.Categorical( df.weekday, [ 
    'Sunday', 'Monday', 'Tuesday', 
    'Wednesday', 'Thursday', 'Friday', 
    'Saturday']  ).labels

The other option is to map values directly using a dict

另一种选择是使用直接映射值 dict

df.weekday.map({
    'Sunday': 0,
    'Monday': 1,
     # ... and so on. You get the idea ...
})

Answer 6

回答by Vikas Gupta

If you have numerical and categorical both type of data in dataframe You can use : here X is my dataframe having categorical and numerical both variables

如果您在数据框中有数字和分类两种类型的数据，您可以使用：这里 X 是我的数据框，具有分类和数字两种变量

from sklearn import preprocessing
le = preprocessing.LabelEncoder()

for i in range(0,X.shape[1]):
    if X.dtypes[i]=='object':
        X[X.columns[i]] = le.fit_transform(X[X.columns[i]])

Or you can try this:

或者你可以试试这个：

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
data = data.apply(le.fit_transform)

Note: This technique is good if you are not interested in converting them back.

注意：如果您对将它们转换回来不感兴趣，则此技术很好。

Answer 7

回答by Alexandr Kosolapov

train['cat'] = train['cat'].map(list(train['cat'].value_counts().to_frame().reset_index().reset_index().set_index('index').to_dict().values())[0])

有什么方法可以在 Python pandas 中获取标签编码器的映射？

提问by Gingerbread

采纳答案by Algor Troy

回答by chinskiy

回答by Abhishek

回答by John Zwinck

回答by ssm

回答by Vikas Gupta

回答by Alexandr Kosolapov

相关推荐

最近更新

标签

有什么方法可以在 Python pandas 中获取标签编码器的映射？

提问by Gingerbread

采纳答案by Algor Troy

回答by chinskiy

回答by Abhishek

回答by John Zwinck

回答by ssm

回答by Vikas Gupta

回答by Alexandr Kosolapov

相关推荐

如何将 Python 2 unicode() 函数转换为正确的 Python 3.x 语法

Python 如何在 Flask 中获取表单数据？

Python AttributeError: 'Figure' 对象没有属性 'plot'

Python 如何从 Jupyter 笔记本上的 * .IPYNB 文件执行 * .PY 文件？

相关推荐

最近更新

标签