pandas 如何知道由 astype('category').cat.codes 分配的标签?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51102205/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to know the labels assigned by astype('category').cat.codes?
提问by Marisa
I have the following dataframe called language
我有以下数据框称为 language
lang level
0 english intermediate
1 spanish intermediate
2 spanish basic
3 english basic
4 english advanced
5 spanish intermediate
6 spanish basic
7 spanish advanced
I categorized each of my variables into numbers by using
我使用以下方法将每个变量分类为数字
language.lang.astype('category').cat.codes
language.lang.astype('category').cat.codes
and
和
language.level.astype('category').cat.codes
language.level.astype('category').cat.codes
respectively. Obtaining the following data frame:
分别。获取以下数据框:
lang level
0 0 1
1 1 1
2 1 0
3 0 0
4 0 2
5 1 1
6 1 0
7 1 2
Now, I would like to know if there is a way to obtain which original value corresponds to each value. I'd like to know that the 0
value in the lang
column corresponds to english and so on.
现在,我想知道是否有办法获得每个值对应的原始值。我想知道列中的0
值lang
对应于英语等。
Is there any function that allows me to get back this information?
有什么功能可以让我取回这些信息吗?
回答by jezrael
You can generate dictionary:
您可以生成字典:
c = language.lang.astype('category')
d = dict(enumerate(c.cat.categories))
print (d)
{0: 'english', 1: 'spanish'}
So then if necessary is possible map
:
那么如果有必要的话map
:
language['code'] = language.lang.astype('category').cat.codes
language['level_back'] = language['code'].map(d)
print (language)
lang level code level_back
0 english intermediate 0 english
1 spanish intermediate 1 spanish
2 spanish basic 1 spanish
3 english basic 0 english
4 english advanced 0 english
5 spanish intermediate 1 spanish
6 spanish basic 1 spanish
7 spanish advanced 1 spanish
回答by Scott Boston
You can use .cat.categories index, like this:
您可以使用 .cat.categories 索引,如下所示:
df.lang.cat.categories[0]
Output:
输出:
'english'
回答by piRSquared
The categorical type is a process of factorization. Meaning that each unique value or category is given a incremented integer value starting from zero.
分类类型是一个因式分解的过程。这意味着每个唯一值或类别都被赋予一个从零开始的递增整数值。
For example:
例如:
c = language.lang.astype('category')
You've got codes in
你有代码
codes = c.cat.codes
And categories in
和类别
cats = c.cat.categories
It is designed to enable you to leverage Numpy array slicing and you can get access to your labels or categories via
它旨在使您能够利用 Numpy 数组切片,并且您可以通过以下方式访问您的标签或类别
cats[codes]
Index(['english', 'spanish', 'spanish', 'english', 'english', 'spanish',
'spanish', 'spanish'],
dtype='object')
There is no need to construct a dictionary to look it up when you are already given a construct to look it up quite efficiently.
当您已经获得了一个可以非常有效地查找它的构造时,就没有必要构造一个字典来查找它。
As further example, this is how we can replicate with pd.factorize
作为进一步的例子,这就是我们如何复制 pd.factorize
codes, cats = pd.factorize(language.lang)
print(cats, codes, cats[codes], sep='\n\n')
Index(['english', 'spanish'], dtype='object')
[0 1 1 0 0 1 1 1]
Index(['english', 'spanish', 'spanish', 'english', 'english', 'spanish',
'spanish', 'spanish'],
dtype='object')