pandas 将对象类型的数据框列转换为浮动

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51119808/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:46:04  来源:igfitidea点击:

Convert dataframe columns of object type to float

pythonpandasmachine-learning

提问by avik

I want to convert all the non float type columns of my dataframe to float ,is there any way i can do it .It would be great if i can do it in One Go . Below is the type

我想将我的数据框的所有非浮点类型列转换为浮点,有什么办法可以做到。如果我可以一次性完成,那就太好了。下面是类型

longitude          -    float64 
latitude          -     float64
housing_median_age   -  float64
total_rooms          -  float64
total_bedrooms       -   object
population           -  float64
households            - float64
median_income         - float64
rooms_per_household   - float64
category_<1H OCEAN    -   uint8
category_INLAND        -  uint8
category_ISLAND        -  uint8
category_NEAR BAY     -   uint8
category_NEAR OCEAN    -  uint8

Below is the snippet of my code

下面是我的代码片段

import pandas as pd
import numpy as np 
from sklearn.model_selection import KFold

df = pd.DataFrame(housing)
df['ocean_proximity'] = pd.Categorical(df['ocean_proximity']) #type casting 
dfDummies = pd.get_dummies(df['ocean_proximity'], prefix = 'category' )
df = pd.concat([df, dfDummies], axis=1)
print df.head()
housingdata = df
hf = housingdata.drop(['median_house_value','ocean_proximity'], axis=1)
hl = housingdata[['median_house_value']]
hf.fillna(hf.mean,inplace = True)
hl.fillna(hf.mean,inplace = True)

回答by jpp

A quick and easy method, if you don't need specific control over downcasting or error-handling, is to use df = df.astype(float).

如果您不需要对向下转换或错误处理进行特定控制,则一种快速简便的方法是使用df = df.astype(float).

For more control, you can use pd.DataFrame.select_dtypesto select columns by dtype. Then use pd.to_numericon a subset of columns.

为了获得更多控制,您可以使用pd.DataFrame.select_dtypesdtype 来选择列。然后pd.to_numeric在列的子集上使用。

Setup

设置

df = pd.DataFrame([['656', 341.341, 4535],
                   ['545', 4325.132, 562]],
                  columns=['col1', 'col2', 'col3'])

print(df.dtypes)

col1     object
col2    float64
col3      int64
dtype: object

Solution

解决方案

cols = df.select_dtypes(exclude=['float']).columns

df[cols] = df[cols].apply(pd.to_numeric, downcast='float', errors='coerce')

Result

结果

print(df.dtypes)

col1    float32
col2    float64
col3    float32
dtype: object

print(df)

    col1      col2    col3
0  656.0   341.341  4535.0
1  545.0  4325.132   562.0