pandas 如何知道由 astype('category').cat.codes 分配的标签?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51102205/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:45:44  来源:igfitidea点击:

How to know the labels assigned by astype('category').cat.codes?

pythonpandasdataframecategorical-data

提问by Marisa

I have the following dataframe called language

我有以下数据框称为 language

         lang          level
0      english         intermediate
1      spanish         intermediate
2      spanish         basic
3      english         basic
4      english         advanced
5      spanish         intermediate
6      spanish         basic
7      spanish         advanced

I categorized each of my variables into numbers by using

我使用以下方法将每个变量分类为数字

language.lang.astype('category').cat.codes

language.lang.astype('category').cat.codes

and

language.level.astype('category').cat.codes

language.level.astype('category').cat.codes

respectively. Obtaining the following data frame:

分别。获取以下数据框:

      lang   level
0      0       1
1      1       1
2      1       0
3      0       0
4      0       2
5      1       1
6      1       0
7      1       2

Now, I would like to know if there is a way to obtain which original value corresponds to each value. I'd like to know that the 0value in the langcolumn corresponds to english and so on.

现在,我想知道是否有办法获得每个值对应的原始值。我想知道列中的0lang对应于英语等。

Is there any function that allows me to get back this information?

有什么功能可以让我取回这些信息吗?

回答by jezrael

You can generate dictionary:

您可以生成字典:

c = language.lang.astype('category')

d = dict(enumerate(c.cat.categories))
print (d)
{0: 'english', 1: 'spanish'}

So then if necessary is possible map:

那么如果有必要的话map

language['code'] = language.lang.astype('category').cat.codes

language['level_back'] = language['code'].map(d)
print (language)
      lang         level  code level_back
0  english  intermediate     0    english
1  spanish  intermediate     1    spanish
2  spanish         basic     1    spanish
3  english         basic     0    english
4  english      advanced     0    english
5  spanish  intermediate     1    spanish
6  spanish         basic     1    spanish
7  spanish      advanced     1    spanish

回答by Scott Boston

You can use .cat.categories index, like this:

您可以使用 .cat.categories 索引,如下所示:

df.lang.cat.categories[0]

Output:

输出:

'english'

回答by piRSquared

The categorical type is a process of factorization. Meaning that each unique value or category is given a incremented integer value starting from zero.

分类类型是一个因式分解的过程。这意味着每个唯一值或类别都被赋予一个从零开始的递增整数值。

For example:

例如:

c = language.lang.astype('category')

You've got codes in

你有代码

codes = c.cat.codes

And categories in

和类别

cats = c.cat.categories

It is designed to enable you to leverage Numpy array slicing and you can get access to your labels or categories via

它旨在使您能够利用 Numpy 数组切片,并且您可以通过以下方式访问您的标签或类别

cats[codes]

Index(['english', 'spanish', 'spanish', 'english', 'english', 'spanish',
       'spanish', 'spanish'],
      dtype='object')

There is no need to construct a dictionary to look it up when you are already given a construct to look it up quite efficiently.

当您已经获得了一个可以非常有效地查找它的构造时,就没有必要构造一个字典来查找它。



As further example, this is how we can replicate with pd.factorize

作为进一步的例子,这就是我们如何复制 pd.factorize

codes, cats = pd.factorize(language.lang)

print(cats, codes, cats[codes], sep='\n\n')

Index(['english', 'spanish'], dtype='object')

[0 1 1 0 0 1 1 1]

Index(['english', 'spanish', 'spanish', 'english', 'english', 'spanish',
       'spanish', 'spanish'],
      dtype='object')