如何使用 pandas read_csv 函数有效地处理欧洲小数点分隔符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11763204/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 15:47:14  来源:igfitidea点击:

How to efficiently handle European decimal separators using the pandas read_csv function?

pythoncsvdecimalpandas

提问by THM

I'm using read_csvto read CSV files into Pandas data frames. My CSV files contain large numbers of decimals/floats. The numbers are encoded using the European decimal notation:

我正在使用read_csv将 CSV 文件读入 Pandas 数据框。我的 CSV 文件包含大量小数/浮点数。这些数字使用欧洲十进制表示法进行编码:

1.234.456,78

This means that the '.' is used as the thousand separator and the ',' is the decimal mark.

这意味着'.' 用作千位分隔符,',' 是小数点。

Pandas 0.8. provides a read_csvargument called 'thousands' to set the thousand separator. Is there an additional argument to provide the decimal mark as well? If no, what is the most efficient way to parse a European style decimal number?

熊猫 0.8。提供一个read_csv名为“千位”的参数来设置千位分隔符。是否还有其他参数来提供小数点?如果不是,解析欧式十进制数的最有效方法是什么?

Currently I'm using string replace which I consider to be a significant performance penalty. The coding I'm using is this:

目前我正在使用字符串替换,我认为这是一个显着的性能损失。我使用的编码是这样的:

# Convert to float data type and change decimal point from ',' to '.'
f = lambda x: string.replace(x, u',', u'.')
df['MyColumn'] = df['MyColumn'].map(f)

Any help is appreciated.

任何帮助表示赞赏。

回答by lbolla

You can use the converterskw in read_csv. Given /tmp/data.csvlike this:

您可以convertersread_csv. 给出/tmp/data.csv这样的:

"x","y"                                                                         
"one","1.234,56"                                                                
"two","2.000,00"   

you can do:

你可以做:

In [20]: pandas.read_csv('/tmp/data.csv', converters={'y': lambda x: float(x.replace('.','').replace(',','.'))})
Out[20]: 
     x        y
0  one  1234.56
1  two  2000.00

回答by joshlk

For European style numbers, use the thousandsand decimalparameters in pandas.read_csv.

对于欧式编号,请使用 中的thousandsdecimal参数pandas.read_csv

For example:

例如:

pandas.read_csv('data.csv', thousands='.', decimal=',')

From the docs:

文档

thousands:

str, optional Thousands separator.

decimal:

str, default ‘.' Character to recognize as decimal point (e.g. use ‘,' for European data).

str,可选的千位分隔符。

十进制

str,默认'.' 识别为小数点的字符(例如,对欧洲数据使用​​“,”)。