Pandas 读取带有浮点值的 csv 文件会导致奇怪的四舍五入和十进制数字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47368296/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas read csv file with float values results in weird rounding and decimal digits
提问by beta
I have a csv file containing numerical values such as 1524.449677
. There are always exactly 6 decimal places.
我有一个包含数值的 csv 文件,例如1524.449677
. 总是有 6 位小数。
When I import the csv file (and other columns) via pandas read_csv
, the column automatically gets the datatype object
. My issue is that the values are shown as 2470.6911370000003
which actually should be 2470.691137
. Or the value 2484.30691
is shown as 2484.3069100000002
.
当我通过 pandas 导入 csv 文件(和其他列)时read_csv
,该列会自动获取数据类型object
。我的问题是这些值显示为2470.6911370000003
实际应该是2470.691137
. 或者该值2484.30691
显示为2484.3069100000002
。
This seems to be a datatype issue in some way. I tried to explicitly provide the data type when importing via read_csv
by giving the dtype
argument as {'columnname': np.float64}
. Still the issue did not go away.
这在某种程度上似乎是一个数据类型问题。我试图通过read_csv
将dtype
参数作为{'columnname': np.float64}
. 问题仍然没有消失。
How can I get the values imported and shown exactly as they are in the source csv file?
我怎样才能导入和显示的值与源 csv 文件中的值完全一样?
回答by Paula Livingstone
Pandas uses a dedicated dec 2 bin
converter that compromises accuracy in preference to speed.
Pandas 使用专用dec 2 bin
转换器,该转换器优先于速度而牺牲准确性。
Passing float_precision='round_trip'
to read_csv
fixes this.
传递float_precision='round_trip'
来read_csv
解决这个问题。
Check out this pagefor more detail on this.
查看此页面了解更多详情。
After processing your data, if you want to save it back in a csvfile, you can passfloat_format = "%.nf"
to the corresponding method.
处理完数据后,如果要将其保存回csv文件,则可以传递float_format = "%.nf"
给相应的方法。
A full exemple:
一个完整的例子:
import pandas as pd
df_in = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places
回答by Holzner
I realise this is an old question, but maybe this will help someone else:
我意识到这是一个老问题,但也许这会对其他人有所帮助:
I had a similar problem, but couldn't quite use the same solution. Unfortunately the float_precision
option only exists when using the C engine and not with the python engine. So if you have to use the python engine for some other reason (for example because the C engine can't deal with regex literals as deliminators), this little "trick" worked for me:
我有一个类似的问题,但不能完全使用相同的解决方案。不幸的是,该float_precision
选项仅在使用 C 引擎时存在,而在 python 引擎中不存在。因此,如果您出于其他原因必须使用 python 引擎(例如,因为 C 引擎无法将正则表达式文本作为分隔符处理),那么这个小“技巧”对我有用:
In the pd.read_csv
arguments, define dtype='str'
and then convert your dataframe to whatever dtype you want, e.g. df = df.astype('float64')
.
在pd.read_csv
参数中,定义dtype='str'
然后将您的数据帧转换为您想要的任何 dtype,例如df = df.astype('float64')
.
Bit of a hack, but it seems to work. If anyone has any suggestions on how to solve this in a better way, let me know.
有点黑客,但它似乎有效。如果有人对如何以更好的方式解决此问题有任何建议,请告诉我。