Python 将 Pandas DataFrame 中带逗号的数字字符串转换为浮点数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22137723/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:20:48  来源:igfitidea点击:

Convert number strings with commas in pandas DataFrame to float

pythonpandas

提问by pheon

I have a DataFrame that contains numbers as strings with commas for the thousands marker. I need to convert them to floats.

我有一个数据帧,其中包含数字作为字符串,千位标记用逗号表示。我需要将它们转换为浮点数。

a = [['1,200', '4,200'], ['7,000', '-0.03'], [ '5', '0']]
df=pandas.DataFrame(a)

I am guessing I need to use locale.atof. Indeed

我猜我需要使用 locale.atof。的确

df[0].apply(locale.atof)

works as expected. I get a Series of floats.

按预期工作。我得到了一系列的花车。

But when I apply it to the DataFrame, I get an error.

但是当我将它应用到 DataFrame 时,出现错误。

df.apply(locale.atof)

TypeError: ("cannot convert the series to ", u'occurred at index 0')

类型错误:(“无法将系列转换为”,你'发生在索引 0')

and

df[0:1].apply(locale.atof)

gives another error:

给出另一个错误:

ValueError: ('invalid literal for float(): 1,200', u'occurred at index 0')

ValueError: ('float() 的文字无效: 1,200', u'occurred at index 0')

So, how do I convert this DataFrameof strings to a DataFrame of floats?

那么,如何将这个DataFrame字符串转换为浮点数的 DataFrame 呢?

采纳答案by Andy Hayden

If you're reading in from csvthen you can use the thousands arg:

如果您从 csv 读入,那么您可以使用数千个 arg

df.read_csv('foo.tsv', sep='\t', thousands=',')

This method is likely to be more efficient than performing the operation as a separate step.

这种方法可能比将操作作为单独的步骤执行更有效。



You need to set the localefirst:

您需要先设置语言环境

In [ 9]: import locale

In [10]: from locale import atof

In [11]: locale.setlocale(locale.LC_NUMERIC, '')
Out[11]: 'en_GB.UTF-8'

In [12]: df.applymap(atof)
Out[12]:
      0        1
0  1200  4200.00
1  7000    -0.03
2     5     0.00

回答by shen ke

You may use the pandas.Series.str.replacemethod:

您可以使用pandas.Series.str.replace方法:

df.iloc[:,:].str.replace(',', '').astype(float)

This method can remove or replace the comma in the string.

此方法可以删除或替换字符串中的逗号。

回答by ghollah kioko

You can convert one column at a time like this :

您可以像这样一次转换一列:

df['colname'] = df['colname'].str.replace(',', '').astype(float)