Python float64 与熊猫 to_csv
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12877189/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
float64 with pandas to_csv
提问by avances123
I'm reading a CSV with float numbers like this:
我正在阅读带有浮点数的 CSV,如下所示:
Bob,0.085
Alice,0.005
And import into a dataframe, and write this dataframe to a new place
并导入到一个数据帧中,并将这个数据帧写入一个新的地方
df = pd.read_csv(orig)
df.to_csv(pandasfile)
Now this pandasfilehas:
现在这pandasfile有:
Bob,0.085000000000000006
Alice,0.0050000000000000001
What happen? maybe I have to cast to a different type like float32 or something?
发生什么事?也许我必须转换为不同的类型,如 float32 或其他类型?
Im using pandas 0.9.0and numpy 1.6.2.
我使用pandas 0.9.0和numpy 1.6.2。
采纳答案by bmu
As mentioned in the comments, it is a general floating point problem.
正如评论中提到的,这是一个一般的浮点问题。
However you can use the float_formatkey word of to_csvto hide it:
但是,您可以使用float_format关键字 ofto_csv来隐藏它:
df.to_csv('pandasfile.csv', float_format='%.3f')
or, if you don't want 0.0001 to be rounded to zero:
或者,如果您不想将 0.0001 舍入为零:
df.to_csv('pandasfile.csv', float_format='%g')
will give you:
会给你:
Bob,0.085
Alice,0.005
in your output file.
在您的输出文件中。
For an explanation of %g, see Format Specification Mini-Language.
有关 的说明%g,请参阅格式规范迷你语言。
回答by Richard Gomes
UPDATE:Answer was accurate at time of writing, and floating point precision is still not something you get by default with to_csv/read_csv (precision-performance tradeoff; defaults favor performance).
更新:在撰写本文时答案是准确的,浮点精度仍然不是默认情况下使用 to_csv/read_csv 获得的(精度-性能权衡;默认值有利于性能)。
Nowadays there is the float_formatargument available for pandas.DataFrame.to_csvand the float_precisionargument available for pandas.from_csv.
目前有中float_format可用的参数pandas.DataFrame.to_csv和该float_precision可供说法pandas.from_csv。
The original is still worth reading to get a better grasp on the problem.
为了更好地理解这个问题,原著仍然值得一读。
It was a bug in pandas, not only in "to_csv" function, but in "read_csv" too. It's not a general floating point issue, despite it's true that floating point arithmeticis a subject which demands some care from the programmer. This article below clarifies a bit this subject:
这是熊猫中的一个错误,不仅在“to_csv”函数中,而且在“read_csv”中也是如此。这不是一般的浮点问题,尽管浮点算术确实是一个需要程序员注意的主题。下面的这篇文章澄清了这个主题:
http://docs.python.org/2/tutorial/floatingpoint.html
A classic one-liner which shows the "problem" is ...
显示“问题”的经典单线是......
>>> 0.1 + 0.1 + 0.1
0.30000000000000004
... which does not display 0.3 as one would expect. On the other hand, if you handle the calculation using fixed point arithmeticand only in the last step you employ floating point arithmetic, it will work as you expect. See this:
...没有像人们预期的那样显示 0.3。另一方面,如果您使用定点算术处理计算并且仅在最后一步使用浮点算术,它将按您预期的那样工作。看到这个:
>>> (1 + 1 + 1) * 1.0 / 10
0.3
If you desperately need to circumvent this problem, I recommend you create another CSV file which contains all figures as integers, for example multiplying by 100, 1000 or other factor which turns out to be convenient. Inside your application, read the CSV file as usual and you will get those integer figures back. Then convert those values to floating point, dividing by the same factor you multiplied before.
如果您迫切需要规避这个问题,我建议您创建另一个 CSV 文件,其中包含所有数字作为整数,例如乘以 100、1000 或其他因数而变得方便的因子。在您的应用程序中,像往常一样读取 CSV 文件,您将获得这些整数。然后将这些值转换为浮点数,除以之前乘过的相同因子。

