Python Pandas - 创建一个列数据类型对象或因子
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15723628/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - make a column dtype object or Factor
提问by N. McA.
In pandas, how can I convert a column of a DataFrame into dtype object?
Or better yet, into a factor? (For those who speak R, in Python, how do I as.factor()?)
在 Pandas 中,如何将 DataFrame 的一列转换为 dtype 对象?或者更好的是,成为一个因素?(对于那些说 R 的人,在 Python 中,我该怎么做as.factor()?)
Also, what's the difference between pandas.Factorand pandas.Categorical?
此外,有什么之间的区别pandas.Factor和pandas.Categorical?
采纳答案by Andy Hayden
You can use the astypemethod to cast a Series (one column):
您可以使用该astype方法来转换系列(一列):
df['col_name'] = df['col_name'].astype(object)
Or the entire DataFrame:
或者整个 DataFrame:
df = df.astype(object)
Update
更新
Since version 0.15, you can use the category datatypein a Series/column:
从 0.15 版开始,您可以在系列/列中使用类别数据类型:
df['col_name'] = df['col_name'].astype('category')
Note: pd.Factorwas been deprecated and has been removed in favor of pd.Categorical.
注意:pd.Factor已被弃用并已被删除以支持pd.Categorical.
回答by herrfz
Factorand Categoricalare the same, as far as I know. I think it was initially called Factor, and then changed to Categorical. To convert to Categorical maybe you can use pandas.Categorical.from_array, something like this:
FactorCategorical据我所知,都是一样的。我认为它最初被称为 Factor,然后改为 Categorical。要转换为 Categorical 也许你可以使用pandas.Categorical.from_array,像这样:
In [27]: df = pd.DataFrame({'a' : [1, 2, 3, 4, 5], 'b' : ['yes', 'no', 'yes', 'no', 'absent']})
In [28]: df
Out[28]:
a b
0 1 yes
1 2 no
2 3 yes
3 4 no
4 5 absent
In [29]: df['c'] = pd.Categorical.from_array(df.b).labels
In [30]: df
Out[30]:
a b c
0 1 yes 2
1 2 no 1
2 3 yes 2
3 4 no 1
4 5 absent 0
回答by piggybox
There's also pd.factorize function to use:
还有 pd.factorize 函数可以使用:
# use the df data from @herrfz
In [150]: pd.factorize(df.b)
Out[150]: (array([0, 1, 0, 1, 2]), array(['yes', 'no', 'absent'], dtype=object))
In [152]: df['c'] = pd.factorize(df.b)[0]
In [153]: df
Out[153]:
a b c
0 1 yes 0
1 2 no 1
2 3 yes 0
3 4 no 1
4 5 absent 2

