pandas 将熊猫数据框列转换为数字的更好方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43442337/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:24:20  来源:igfitidea点击:

Better way to convert pandas dataframe columns to numeric

pythonpandasdataframetype-conversion

提问by Sveinn

I have a dataframe with some columns containing data of type object because of some funky data entries (aka a . or whatnot).

由于一些时髦的数据条目(又名 . 或诸如此类),我有一个数据框,其中一些列包含 object 类型的数据。

I have been able to correct this by identifying the object columns and then doing this:

我已经能够通过识别对象列然后执行以下操作来纠正此问题:

obj_cols = df.loc[:, df.dtypes == object]
conv_cols = obj_cols.convert_objects(convert_numeric='force')

This works fine and allows me to run the regression I need, but generates this error:

这工作正常,并允许我运行我需要的回归,但会产生此错误:

FutureWarning: convert_objects is deprecated.

Is there a better way to do this so as to avoid the error? I also tried constructing a lambda function but that didn't work.

有没有更好的方法来做到这一点以避免错误?我也尝试构建一个 lambda 函数,但没有奏效。

回答by Vaishali

Convert_objects is deprecated. Use this instead. You can add parameter errors='coerce' to convert bad non numeric values to NaN.

Convert_objects 已弃用。改用这个。您可以添加参数 errors='coerce' 将错误的非数值转换为 NaN。

conv_cols = obj_cols.apply(pd.to_numeric, errors = 'coerce')

The function will be applied to the whole DataFrame. Columns that can be converted to a numeric type will be converted, while columns that cannot (e.g. they contain non-digit strings or dates) will be left alone.

该函数将应用于整个 DataFrame。可以转换为数字类型的列将被转换,而不能转换的列(例如它们包含非数字字符串或日期)将被保留。

回答by MissBleu

If you have a sample data frame:

如果您有示例数据框:

sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 'f', 'Mar': 140},
     {'account': 'Alpha Co',  'Jan': 'e', 'Feb': 210, 'Mar': 215},
     {'account': 'Blue Inc',  'Jan': 50,  'Feb': 90,  'Mar': 'g' }]
df = pd.DataFrame(sales)

and you want to get rid of the strings in the columns that should be numeric, you can do this with pd.to_numeric

并且你想去掉列中应该是数字的字符串,你可以用 pd.to_numeric 做到这一点

cols = ['Jan', 'Feb', 'Mar']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce', axis=1)

your new data frame will have NaN in place of the 'wacky' data

您的新数据框将用 NaN 代替“古怪”数据