pandas Sklearn 将字符串类标签更改为 int

Question

提问by Lukasz

I have a pandas dataframe and I'm trying to change the values in a given column which are represented by strings into integers. For instance:

我有一个 Pandas 数据框，我正在尝试将给定列中的值更改为由字符串表示的整数。例如：

df = index    fruit   quantity   price 
         0    apple          5    0.99
         1    apple          2    0.99
         2   orange          4    0.89
         4   banana          1    1.64
       ...
     10023     kiwi         10    0.92

I would like it to look at:

我想看看：

df = index    fruit   quantity   price 
         0        1          5    0.99
         1        1          2    0.99
         2        2          4    0.89
         4        3          1    1.64
       ...
     10023        5         10    0.92

I can do this using

我可以使用

df["fruit"] = df["fruit"].map({"apple": 1, "orange": 2,...})

which works if I have a small list to change, but I'm looking at a column with over 500 different labels. Is there any way of changing this from a stringto a an int?

如果我有一个要更改的小列表，这会起作用，但我正在查看具有 500 多个不同标签的列。有什么办法可以将其从 astring更改为 anint吗？

Answer 1

回答by jezrael

Use factorizeand then convert to categoricalif necessary:

使用factorize然后categorical在必要时转换为：

df.fruit = pd.factorize(df.fruit)[0]
print (df)
   fruit  quantity  price
0      0         5   0.99
1      0         2   0.99
2      1         4   0.89
3      2         1   1.64
4      3        10   0.92

df.fruit = pd.Categorical(pd.factorize(df.fruit)[0])
print (df)
  fruit  quantity  price
0     0         5   0.99
1     0         2   0.99
2     1         4   0.89
3     2         1   1.64
4     3        10   0.92

print (df.dtypes)
fruit       category
quantity       int64
price        float64
dtype: object

Also if need count from 1:

此外，如果需要从1：

df.fruit = pd.Categorical(pd.factorize(df.fruit)[0] + 1)
print (df)
  fruit  quantity  price
0     1         5   0.99
1     1         2   0.99
2     2         4   0.89
3     3         1   1.64
4     4        10   0.92

Answer 2

回答by MaxU

you can use factorizemethod:

您可以使用分解方法：

In [13]: df['fruit'] = pd.factorize(df['fruit'])[0].astype(np.uint16)

In [14]: df
Out[14]:
   index  fruit  quantity  price
0      0      0         5   0.99
1      1      0         2   0.99
2      2      1         4   0.89
3      4      2         1   1.64
4  10023      3        10   0.92

In [15]: df.dtypes
Out[15]:
index         int64
fruit        uint16
quantity      int64
price       float64
dtype: object

alternatively you can do it this way:

或者你可以这样做：

In [21]: df['fruit'] = df.fruit.astype('category').cat.codes

In [22]: df
Out[22]:
   index  fruit  quantity  price
0      0      0         5   0.99
1      1      0         2   0.99
2      2      3         4   0.89
3      4      1         1   1.64
4  10023      2        10   0.92

In [23]: df.dtypes
Out[23]:
index         int64
fruit          int8
quantity      int64
price       float64
dtype: object

pandas Sklearn 将字符串类标签更改为 int

提问by Lukasz

回答by jezrael

回答by MaxU

相关推荐

最近更新

标签

pandas Sklearn 将字符串类标签更改为 int

提问by Lukasz

回答by jezrael

回答by MaxU

相关推荐

pandas 数据框 set_index 未设置

使用逗号和负数将 Pandas Dataframe 转换为 Float

pandas 如何采取地板和上限以去除异常值

pandas 根据 2 个现有列的值将新列分配（添加）到 dask 数据框 - 涉及条件语句

相关推荐

最近更新

标签