在 Pandas 中将多列转换为类别。申请?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30991532/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:30:58  来源:igfitidea点击:

Converting multiple columns to categories in Pandas. apply?

pythonpandas

提问by Amelio Vazquez-Reina

Consider a Dataframe. I want to convert a set of columns to_convertto categories.

考虑一个数据框。我想将一组列转换to_convert为类别。

I can certainly do the following:

我当然可以做到以下几点:

for col in to_convert:
  df[col] = df[col].astype('category')

but I was surprised that the following does not return a dataframe:

但我很惊讶以下不返回数据帧:

df[to_convert].apply(lambda x: x.astype('category'), axis=0)

which of course makes the following not work:

这当然会使以下内容不起作用:

df[to_convert] = df[to_convert].apply(lambda x: x.astype('category'), axis=0)

Why does apply(axis=0) return a Series even though it is supposed to act on the columns one by one?

为什么apply( axis=0) 返回一个系列,即使它应该一一作用于列?

采纳答案by Jeff

This was just fixed in master, and so will be in 0.17.0, see the issue here

这只是在 master 中修复,因此将在 0.17.0 中,请在此处查看问题

In [7]: df = DataFrame({'A' : list('aabbcd'), 'B' : list('ffghhe')})

In [8]: df
Out[8]: 
   A  B
0  a  f
1  a  f
2  b  g
3  b  h
4  c  h
5  d  e

In [9]: df.dtypes
Out[9]: 
A    object
B    object
dtype: object

In [10]: df.apply(lambda x: x.astype('category'))       
Out[10]: 
   A  B
0  a  f
1  a  f
2  b  g
3  b  h
4  c  h
5  d  e

In [11]: df.apply(lambda x: x.astype('category')).dtypes
Out[11]: 
A    category
B    category
dtype: object

回答by joelostblom

Note that since pandas 0.23.0you no longer applyto convert multiple columns to categorical data types. Now you can simply do df[to_convert].astype('category')instead (where to_convertis a set of columns as defined in the question).

请注意,从 pandas 0.23.0 开始,您不再apply需要将多列转换为分类数据类型。现在你可以简单地做df[to_convert].astype('category')(哪里to_convert是问题中定义的一组列)。