Pandas get_dummies 输出 dtype integer/bool 而不是 float

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27468892/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:45:41  来源:igfitidea点击:

Pandas get_dummies to output dtype integer/bool instead of float

pythonpandas

提问by queise

I would like to know if could ask the get_dummies function in pandas to output the dummies dataframe with a dtype lighter than the default float64.

我想知道是否可以要求 pandas 中的 get_dummies 函数输出具有比默认 float64 更轻的 dtype 的虚拟数据帧。

So, for a sample dataframe with categorical columns:

因此,对于具有分类列的示例数据框:

In []: df = pd.DataFrame([(blue,wood),(blue,metal),(red,wood)],
                         columns=['C1','C2'])
In []: df
Out[]:
    C1      C2
0   blue    wood
1   blue    metal
2   red     wood

after getting the dummies, it looks like:

得到假人后,它看起来像:

In []: df = pd.get_dummies(df)
In []: df    
Out[]:
 C1_blue    C1_red  C2_metal    C2_wood
0   1   0   0   1
1   1   0   1   0
2   0   1   0   1

which is perfectly fine. However, by default the 1's and 0's are float64:

这很好。但是,默认情况下 1 和 0 是 float64:

In []: df.dtypes
Out[]: 
C1_blue     float64
C1_red      float64
C2_metal    float64
C2_wood     float64
dtype: object

I know I can change the dtype afterwards with astype:

我知道我可以在之后更改 dtype astype

In []: df = pd.get_dummies(df).astype(np.int8)

But I don't want to have the dataframe with floats in memory, because I am dealing with a big dataframe (from a csv of about ~5Gb). I would like to have the dummies directly as integers.

但我不想让数据帧在内存中浮动,因为我正在处理一个大数据帧(来自大约 5Gb 的 csv)。我想将假人直接作为整数。

采纳答案by queise

The float issue is now solved. From pandas version 0.19, pd.get_dummies function returns dummy-encoded columns as small integers.

浮动问题现已解决。从 pandas 0.19 版开始,pd.get_dummies 函数将虚拟编码列作为小整数返回。

See: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#get-dummies-now-returns-integer-dtypes

请参阅:http: //pandas.pydata.org/pandas-docs/stable/whatsnew.html#get-dummies-now-returns-integer-dtypes

回答by Jeff

There is an open issue w.r.t. this, see here: https://github.com/pydata/pandas/issues/8725

这有一个未解决的问题,请参见此处:https: //github.com/pydata/pandas/issues/8725