pandas 将熊猫系列从字符串转换为唯一的 int id

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25963431/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:29:38  来源:igfitidea点击:

Convert pandas series from string to unique int ids

pythonpandas

提问by Dave Novelli

I have a categorical variable in a series. I want to assign integer ids to each unique value and create a new series with the ids, effectively turning a string variable into an integer variable. What is the most compact/efficient way to do this?

我有一个系列中的分类变量。我想为每个唯一值分配整数 id,并使用 id 创建一个新系列,有效地将字符串变量转换为整数变量。执行此操作的最紧凑/最有效的方法是什么?

回答by unutbu

You could use pandas.factorize:

你可以使用pandas.factorize

In [32]: s = pd.Series(['a','b','c'])

In [33]: labels, levels = pd.factorize(s)

In [35]: labels
Out[35]: array([0, 1, 2])

回答by Daniel Golden

Example using the new pandas categoricaltype in pandas 0.15+

categorical在 pandas 0.15+ 中使用新的 pandas类型的示例

http://pandas.pydata.org/pandas-docs/version/0.16.2/categorical.html

http://pandas.pydata.org/pandas-docs/version/0.16.2/categorical.html

In [553]: x = pd.Series(['a', 'a', 'a', 'b', 'b', 'c']).astype('category')

In [554]: x
Out[554]: 
0    a
1    a
2    a
3    b
4    b
5    c
dtype: category
Categories (3, object): [
                        a
                        , b
                        , c]

In [555]: x.cat.codes
Out[555]: 
0    0
1    0
2    0
3    1
4    1
5    2
dtype: int8