Pandas 将所有对象列转换为类别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39904889/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:09:34  来源:igfitidea点击:

Pandas cast all object columns to category

pythonpandascastingcategorical-data

提问by Georg Heiler

I want to have ha elegant function to cast all object columns in a pandas data frame to categories

我想拥有优雅的功能来将 Pandas 数据框中的所有对象列转换为类别

df[x] = df[x].astype("category")performs the type cast df.select_dtypes(include=['object'])would sub-select all categories columns. However this results in a loss of the other columns / a manual merge is required. Is there a solution which "just works in place" or does not require a manual cast?

df[x] = df[x].astype("category")执行类型转换 df.select_dtypes(include=['object'])将子选择所有类别列。但是,这会导致其他列丢失/需要手动合并。是否有“就地工作”或不需要手动演员的解决方案?

edit

编辑

I am looking for something similar as http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.convert_objects.htmlfor a conversion to categorical data

我正在寻找类似于http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.convert_objects.html 的东西来转换为分类数据

回答by piRSquared

use applyand pd.Series.astypewith dtype='category'

使用applypd.Series.astypedtype='category'

Consider the pd.DataFramedf

考虑 pd.DataFramedf

df = pd.DataFrame(dict(
        A=[1, 2, 3, 4],
        B=list('abcd'),
        C=[2, 3, 4, 5],
        D=list('defg')
    ))
df

enter image description here

在此处输入图片说明

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
A    4 non-null int64
B    4 non-null object
C    4 non-null int64
D    4 non-null object
dtypes: int64(2), object(2)
memory usage: 200.0+ bytes

Lets use select_dtypesto include all 'object'types to convert and recombine with a select_dtypesto exclude them.

让我们使用select_dtypes来包含所有'object'类型以进行转换和重新组合select_dtypes以排除它们。

df = pd.concat([
        df.select_dtypes([], ['object']),
        df.select_dtypes(['object']).apply(pd.Series.astype, dtype='category')
        ], axis=1).reindex_axis(df.columns, axis=1)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
A    4 non-null int64
B    4 non-null category
C    4 non-null int64
D    4 non-null category
dtypes: category(2), int64(2)
memory usage: 208.0 bytes

回答by KG in Chicago

I think that this is a more elegant way:

我认为这是一种更优雅的方式:

df = pd.DataFrame(dict(
        A=[1, 2, 3, 4],
        B=list('abcd'),
        C=[2, 3, 4, 5],
        D=list('defg')
    ))

df.info()

df.loc[:, df.dtypes == 'object'] =\
    df.select_dtypes(['object'])\
    .apply(lambda x: x.astype('category'))

df.info()

回答by a Data Head

Wish I could add this as a comment, but can't.

希望我可以将其添加为评论,但不能。

The accepted answer doesn't work for pandas version 0.25 and higher. Use .reindexinstead of reindex_axis. See here for more information: https://github.com/scikit-hep/root_pandas/issues/82

接受的答案不适用于 0.25 版及更高版本的Pandas。使用.reindex代替reindex_axis。有关更多信息,请参见此处:https: //github.com/scikit-hep/root_pandas/issues/82

回答by Anton Golubev

Often the order of categories has meaning, for example t-short sizes 'S', 'M', 'L' 'XL' are ordered categories (in SPSS - ordinals). If you are interested in creating ordered categories from strings you can use this code:

通常类别的顺序是有意义的,例如 t-short 尺寸“S”、“M”、“L”、“XL”是有序的类别(在 SPSS 中 - 序数)。如果您有兴趣从字符串创建有序类别,您可以使用以下代码:

df = pd.concat([
        df.select_dtypes([], ['object']),
        df.select_dtypes(['object']).apply(pd.Categorical, ordered=True)
        ], axis=1).reindex(df.columns, axis=1)

In the resulting DataFrame categorical columns can be sorted by values the same way as you used to sort strings.

在生成的 DataFrame 中,分类列可以按照与用于对字符串进行排序相同的方式按值进行排序。