pandas 熊猫读科学记数法和变化

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34013790/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:17:59  来源:igfitidea点击:

Pandas read scientific notation and change

pythoncsvpandas

提问by hselbie

I have a dataframe in pandas that i'm reading in from a csv.

我在 Pandas 中有一个数据框,我正在从 csv 中读取它。

One of my columns has values that include NaN, floats, and scientific notation, i.e. 5.3e-23

我的一个列有值,其中包括NaNfloats,和科学记数法,即5.3e-23

My trouble is that as I read in the csv, pandas views these data as an object dtype, not the float32that it should be. I guess because it thinks the scientific notation entries are strings.

我的问题是,当我在 csv 中阅读时,pandas 将这些数据视为object dtype,而不是float32应该的。我猜是因为它认为科学记数法条目是字符串。

I've tried to convert the dtype using df['speed'].astype(float)after it's been read in, and tried to specify the dtype as it's being read in using df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a']). This throws the error ValueError: cannot safely convert passed user dtype of <f4 for object dtyped data in column ...

我尝试df['speed'].astype(float)在读入后使用 using 转换 dtype ,并尝试在使用df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a']). 这会引发错误ValueError: cannot safely convert passed user dtype of <f4 for object dtyped data in column ...

So far neither of these methods have worked. Am I missing something that is an incredibly easy fix?

到目前为止,这两种方法都没有奏效。我是否遗漏了一些非常容易修复的东西?

this questionseems to suggest I can specify known numbers that might throw an error, but i'd prefer to convert the scientific notation back to a float if possible.

这个问题似乎表明我可以指定可能会引发错误的已知数字,但如果可能的话,我更愿意将科学记数法转换回浮点数。

EDITED TO SHOW DATA FROM CSV AS REQUESTED IN COMMENTS

编辑以在评论中显示来自 CSV 的数据

7425616,12375,28,2015-08-09 11:07:56,0,-8.18644,118.21463,2,0,2
7425615,12375,28,2015-08-09 11:04:15,0,-8.18644,118.21463,2,NaN,2
7425617,12375,28,2015-08-09 11:09:38,0,-8.18644,118.2145,2,0.14,2
7425592,12375,28,2015-08-09 10:36:34,0,-8.18663,118.2157,2,0.05,2
65999,1021,29,2015-01-30 21:43:26,0,-8.36728,118.29235,1,0.206836151554794,2
204958,1160,30,2015-02-03 17:53:37,2,-8.36247,118.28664,1,9.49242000872744e-05,7
384739,,32,2015-01-14 16:07:02,1,-8.36778,118.29206,2,Infinity,4
275929,1160,30,2015-02-17 03:13:51,1,-8.36248,118.28656,1,113.318511172611,5

采纳答案by hselbie

I realised it was the infinitystatement causing the issue in my data. Removing this with a find and replace worked.

我意识到这是infinity导致我的数据出现问题的声明。通过查找和替换来删除它。

@Anton Protopopov answer also works as did @DSM's comment regarding me not typing df['speed'] = df['speed'].astype(float).

@Anton Protopopov 的回答也和@DSM 关于我没有打字的评论一样有效df['speed'] = df['speed'].astype(float)

Thanks for the help.

谢谢您的帮助。

回答by Anton Protopopov

It's hard to say without seeing your data but it seems that problem in your rows that they contain something else except for numbers and 'n/a' values. You could load your dataframe and then convert it to numeric as show in answers for thatquestion. If you have pandas version >= 0.17.0then you could use following:

很难说没有看到您的数据,但您的行中似乎存在问题,它们包含除数字和“n/a”值之外的其他内容。您可以加载数据框,然后将其转换为数字,如问题的答案所示。如果您有Pandas版本 >=0.17.0那么您可以使用以下内容:

df1 = df.apply(pd.to_numeric, args=('coerce',))

Then you could drop row with NA values with dropnaor fill them with zeros with fillna

然后你可以用 NA 值删除行dropna或用零填充它们fillna