Python Pandas - 创建一个列数据类型对象或因子

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15723628/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:47:47  来源:igfitidea点击:

Pandas - make a column dtype object or Factor

pythonpandas

提问by N. McA.

In pandas, how can I convert a column of a DataFrame into dtype object? Or better yet, into a factor? (For those who speak R, in Python, how do I as.factor()?)

在 Pandas 中,如何将 DataFrame 的一列转换为 dtype 对象?或者更好的是,成为一个因素?(对于那些说 R 的人,在 Python 中,我该怎么做as.factor()?)

Also, what's the difference between pandas.Factorand pandas.Categorical?

此外,有什么之间的区别pandas.Factorpandas.Categorical

采纳答案by Andy Hayden

You can use the astypemethod to cast a Series (one column):

您可以使用该astype方法来转换系列(一列):

df['col_name'] = df['col_name'].astype(object)

Or the entire DataFrame:

或者整个 DataFrame:

df = df.astype(object)


Update

更新

Since version 0.15, you can use the category datatypein a Series/column:

从 0.15 版开始,您可以在系列/列中使用类别数据类型

df['col_name'] = df['col_name'].astype('category')

Note: pd.Factorwas been deprecated and has been removed in favor of pd.Categorical.

注意:pd.Factor已被弃用并已被删除以支持pd.Categorical.

回答by herrfz

Factorand Categoricalare the same, as far as I know. I think it was initially called Factor, and then changed to Categorical. To convert to Categorical maybe you can use pandas.Categorical.from_array, something like this:

FactorCategorical据我所知,都是一样的。我认为它最初被称为 Factor,然后改为 Categorical。要转换为 Categorical 也许你可以使用pandas.Categorical.from_array,像这样:

In [27]: df = pd.DataFrame({'a' : [1, 2, 3, 4, 5], 'b' : ['yes', 'no', 'yes', 'no', 'absent']})

In [28]: df
Out[28]: 
   a       b
0  1     yes
1  2      no
2  3     yes
3  4      no
4  5  absent

In [29]: df['c'] = pd.Categorical.from_array(df.b).labels

In [30]: df
Out[30]: 
   a       b  c
0  1     yes  2
1  2      no  1
2  3     yes  2
3  4      no  1
4  5  absent  0

回答by piggybox

There's also pd.factorize function to use:

还有 pd.factorize 函数可以使用:

# use the df data from @herrfz

In [150]: pd.factorize(df.b)
Out[150]: (array([0, 1, 0, 1, 2]), array(['yes', 'no', 'absent'], dtype=object))
In [152]: df['c'] = pd.factorize(df.b)[0]

In [153]: df
Out[153]: 
   a       b  c
0  1     yes  0
1  2      no  1
2  3     yes  0
3  4      no  1
4  5  absent  2