在 Pandas 中使用 read_csv 时精度丢失

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36909368/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:08:29  来源:igfitidea点击:

Precision lost while using read_csv in pandas

pythoncsvpandasnumpyfloating-accuracy

提问by Krishna Sangeeth K S

I have files of the below format in a text file which I am trying to read into a pandas dataframe.

我在文本文件中有以下格式的文件,我试图将其读入Pandas数据帧。

895|2015-4-23|19|10000|LA|0.4677978806|0.4773469340|0.4089938425|0.8224291972|0.8652525793|0.6829942860|0.5139162227|

As you can see there are 10integers after the floating point in the input file.

如您所见,输入文件中浮点数后有10 个整数。

df = pd.read_csv('mockup.txt',header=None,delimiter='|')

When I try to read it into dataframe, I am not getting the last 4 integers

当我尝试将它读入数据帧时,我没有得到最后 4 个整数

df[5].head()

0    0.467798
1    0.258165
2    0.860384
3    0.803388
4    0.249820
Name: 5, dtype: float64

How can I get the complete precision as present in the input file? I have some matrix operations that needs to be performed so i cannot cast it as string.

如何获得输入文件中的完整精度?我有一些需要执行的矩阵运算,所以我不能将它转换为字符串。

I figured out that I have to do something about dtypebut I am not sure where I should use it.

我发现我必须做些什么,dtype但我不确定应该在哪里使用它。

回答by jezrael

It is only display problem, see docs:

这只是显示问题,请参阅文档

#temporaly set display precision
with pd.option_context('display.precision', 10):
    print df

     0          1   2      3   4             5            6             7   \
0  895  2015-4-23  19  10000  LA  0.4677978806  0.477346934  0.4089938425   

             8             9            10            11  12  
0  0.8224291972  0.8652525793  0.682994286  0.5139162227 NaN    

EDIT: (Thank you Mark Dickinson):

编辑:(谢谢马克狄金森):

Pandas uses a dedicated decimal-to-binary converter that sacrifices perfect accuracy for the sake of speed. Passing float_precision='round_trip'to read_csv fixes this. See the documentationfor more.

Pandas 使用专用的十进制到二进制转换器,为了速度而牺牲了完美的准确性。传递float_precision='round_trip'给 read_csv 解决了这个问题。有关更多信息,请参阅文档