Python 获取pandas中分类变量的映射

Question

提问by Bob

I'm doing this to make categorical variables numbers

我这样做是为了使分类变量编号

>>> df = pd.DataFrame({'x':['good', 'bad', 'good', 'great']}, dtype='category')

       x
0   good
1    bad
2   good
3  great

How can I get the mapping between the original values and the new values?

如何获得原始值和新值之间的映射？

Answer 1

回答by JohnE

Method 1

方法一

You can create a dictionary mapping by enumerating (similar to creating a dictionary from a list by creating dictionary keys from the list indices):

您可以通过枚举创建字典映射（类似于通过从列表索引创建字典键来从列表创建字典）：

dict( enumerate(df['x'].cat.categories ) )

# {0: 'bad', 1: 'good', 2: 'great'}

Method 2

方法二

Alternatively, you could map the values and codes in everyrow:

或者，您可以映射每一行中的值和代码：

dict( zip( df['x'].cat.codes, df['x'] ) )

# {0: 'bad', 1: 'good', 2: 'great'}

It's a little more transparent what is happening here, and arguably safer for that reason. It is also much less efficient as the length of the arguments to zip()is len(df)whereas the length of df['x'].cat.categoriesis only the count of unique values and generally much shorter than len(df).

这里发生的事情更加透明，因此可以说更安全。它的效率也低得多，因为参数的长度zip()是，len(df)而的长度df['x'].cat.categories只是唯一值的数量，通常比短得多len(df)。

Additional Discussion

附加讨论

The reason Method 1 works is that the categories have type Index:

方法 1 有效的原因是类别具有索引类型：

type( df['x'].cat.categories )

# pandas.core.indexes.base.Index

and in this case you look up values in an index just as you would a list.

在这种情况下，您可以像查找列表一样在索引中查找值。

There are a couple of ways to verify that Method 1 works. First, you can just check that a round trip retains the correct values:

有几种方法可以验证方法 1 是否有效。首先，您可以检查往返行程是否保留了正确的值：

(df['x'] == df['x'].cat.codes.map( dict( 
            enumerate(df['x'].cat.categories) ) ).astype('category')).all()
# True

or you can check that Method 1 and Method 2 give the same answer:

或者您可以检查方法 1 和方法 2 给出相同的答案：

(dict( enumerate(df['x'].cat.categories ) ) == dict( zip( df['x'].cat.codes, df['x'] ) ))

# True

Python 获取pandas中分类变量的映射

提问by Bob

回答by JohnE

Method 1

方法一

Method 2

方法二

Additional Discussion

附加讨论

相关推荐

最近更新

标签

Python 获取pandas中分类变量的映射

提问by Bob

回答by JohnE

Method 1

方法一

Method 2

方法二

Additional Discussion

附加讨论

相关推荐

Python 使用 openpyxl 用颜色填充单元格？

Python - 如何按每个列表中的第四个元素对列表列表进行排序？

Python 按绝对值排序而不改变数据

Python 检查是否设置了 argparse 可选参数

相关推荐

最近更新

标签