Pandas 读取带有浮点值的 csv 文件会导致奇怪的四舍五入和十进制数字

Question

提问by beta

I have a csv file containing numerical values such as 1524.449677. There are always exactly 6 decimal places.

我有一个包含数值的 csv 文件，例如1524.449677. 总是有 6 位小数。

When I import the csv file (and other columns) via pandas read_csv, the column automatically gets the datatype object. My issue is that the values are shown as 2470.6911370000003which actually should be 2470.691137. Or the value 2484.30691is shown as 2484.3069100000002.

当我通过 pandas 导入 csv 文件（和其他列）时read_csv，该列会自动获取数据类型object。我的问题是这些值显示为2470.6911370000003实际应该是2470.691137. 或者该值2484.30691显示为2484.3069100000002。

This seems to be a datatype issue in some way. I tried to explicitly provide the data type when importing via read_csvby giving the dtypeargument as {'columnname': np.float64}. Still the issue did not go away.

这在某种程度上似乎是一个数据类型问题。我试图通过read_csv将dtype参数作为{'columnname': np.float64}. 问题仍然没有消失。

How can I get the values imported and shown exactly as they are in the source csv file?

我怎样才能导入和显示的值与源 csv 文件中的值完全一样？

Answer 1

回答by Paula Livingstone

Pandas uses a dedicated dec 2 binconverter that compromises accuracy in preference to speed.

Pandas 使用专用dec 2 bin转换器，该转换器优先于速度而牺牲准确性。

Passing float_precision='round_trip'to read_csvfixes this.

传递float_precision='round_trip'来read_csv解决这个问题。

Check out this pagefor more detail on this.

查看此页面了解更多详情。

After processing your data, if you want to save it back in a csvfile, you can pass
float_format = "%.nf"to the corresponding method.

处理完数据后，如果要将其保存回csv文件，则可以传递
float_format = "%.nf"给相应的方法。

A full exemple:

一个完整的例子：

import pandas as pd

df_in  = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places

Answer 2

回答by Holzner

I realise this is an old question, but maybe this will help someone else:

我意识到这是一个老问题，但也许这会对其他人有所帮助：

I had a similar problem, but couldn't quite use the same solution. Unfortunately the float_precisionoption only exists when using the C engine and not with the python engine. So if you have to use the python engine for some other reason (for example because the C engine can't deal with regex literals as deliminators), this little "trick" worked for me:

我有一个类似的问题，但不能完全使用相同的解决方案。不幸的是，该float_precision选项仅在使用 C 引擎时存在，而在 python 引擎中不存在。因此，如果您出于其他原因必须使用 python 引擎（例如，因为 C 引擎无法将正则表达式文本作为分隔符处理），那么这个小“技巧”对我有用：

In the pd.read_csvarguments, define dtype='str'and then convert your dataframe to whatever dtype you want, e.g. df = df.astype('float64').

在pd.read_csv参数中，定义dtype='str'然后将您的数据帧转换为您想要的任何 dtype，例如df = df.astype('float64').

Bit of a hack, but it seems to work. If anyone has any suggestions on how to solve this in a better way, let me know.

有点黑客，但它似乎有效。如果有人对如何以更好的方式解决此问题有任何建议，请告诉我。

Pandas 读取带有浮点值的 csv 文件会导致奇怪的四舍五入和十进制数字

提问by beta

回答by Paula Livingstone

回答by Holzner

相关推荐

最近更新

标签

Pandas 读取带有浮点值的 csv 文件会导致奇怪的四舍五入和十进制数字

提问by beta

回答by Paula Livingstone

回答by Holzner

相关推荐

将两个 Pandas 数据帧连接在一起（在 python 中）

pandas 熊猫中两个数据框之间的差异

pandas 如何在条形图中按递增顺序对条形进行排序？

转置 Pandas DataFrame 并将列标题更改为列表

相关推荐

最近更新

标签