pandas 将分类变量从 String 转换为 int 表示

Question

提问by Abhi

I have a numpy array of classification of text in the form of String array, i.e. y_train = ['A', 'B', 'A', 'C',...]. I am trying to apply SKlearn multinomial NB algorithm to predict classes for entire dataset.

我有一个字符串数组形式的文本分类的 numpy 数组，即 y_train = ['A', 'B', 'A', 'C',...]. 我正在尝试应用 SKlearn 多项 NB 算法来预测整个数据集的类别。

I want to convert the String classes into integers to be able to input into the algorithm and convert ['A', 'B', 'A', 'C', ...]into ['1', '2', '1', '3', ...]

我想将字符串类转换为整数，以便能够输入算法并转换['A', 'B', 'A', 'C', ...]为['1', '2', '1', '3', ...]

I can write a for loop to go through array and create a new one with int classifiers but is there a direct function to achieve this

我可以编写一个 for 循环来遍历数组并使用 int 分类器创建一个新的循环，但是是否有直接的函数来实现这一点

Answer 1

采纳答案by Ted Petrou

If you are using sklearn, I would suggest sticking with methods in that library that do these things for you. Sklearn has a number of ways of preprocessing data such as encoding labels. One of which is the sklearn.preprocessing.LabelEncoderfunction.

如果您正在使用 sklearn，我建议您坚持使用该库中为您执行这些操作的方法。Sklearn 有多种预处理数据的方法，例如编码标签。其中之一是sklearn.preprocessing.LabelEncoder功能。

from sklearn.preprocessing import LabelEncoder  

le = LabelEncoder()
le.fit_transform(y_train)

Which outputs

哪些输出

array([0, 1, 0, 2])

Use le.inverse_transform([0,1,2])to map back

使用le.inverse_transform([0,1,2])映射回

Answer 2

回答by MaxU

Try factorizemethod:

尝试分解方法：

In [264]: y_train = pd.Series(['A', 'B', 'A', 'C'])

In [265]: y_train
Out[265]:
0    A
1    B
2    A
3    C
dtype: object

In [266]: pd.factorize(y_train)
Out[266]: (array([0, 1, 0, 2], dtype=int64), Index(['A', 'B', 'C'], dtype='object'))

Demo:

演示：

In [271]: fct = pd.factorize(y_train)[0]+1

In [272]: fct
Out[272]: array([1, 2, 1, 3], dtype=int64)

pandas 将分类变量从 String 转换为 int 表示

提问by Abhi

采纳答案by Ted Petrou

回答by MaxU

相关推荐

最近更新

标签

pandas 将分类变量从 String 转换为 int 表示

提问by Abhi

采纳答案by Ted Petrou

回答by MaxU

相关推荐

pandas AttributeError: 'DataFrame' 对象没有属性 'Address'

如何使用来自用户输入的 Pandas 数据框

pandas 从现有数据帧 python 中选择特定行创建一个新的数据帧

为 Pandas Dataframe Boxplot() 设置 y 轴比例，3 个偏差？

相关推荐

最近更新

标签