pandas 熊猫将科学记数法中的浮点数转换为字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41157981/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:37:54  来源:igfitidea点击:

Pandas convert float in scientific notation to string

pythonpandas

提问by Cheng

I used read_csv()to load a dataset that looks like this

我曾经read_csv()加载一个看起来像这样的数据集

userid
NaN
1.091178e+11
1.137856e+11

I want to convert the user ids to string. One solution is to add keep_default_na=Falseto read_csv(), which is suggested by this SO: Converting long integers to strings in pandas (to avoid scientific notation)

我想将用户 ID 转换为字符串。一种解决方案是添加keep_default_na=Falseread_csv(),这是 SO 建议的:将长整数转换为Pandas中的字符串(以避免科学记数法)

Let's say I don't want to use keep_default_na=False. Is there any way to convert the user id column to str.

假设我不想使用keep_default_na=False. 有什么方法可以将用户 ID 列转换为 str。

I tried df.userid.astype(str)and I got 1.091178e+11back. I was expecting the result in the expanded form not scientific form.

我试过了df.userid.astype(str),我1.091178e+11回来了。我期待的是扩展形式而不是科学形式的结果。

What should I do?

我该怎么办?

采纳答案by jezrael

You can use mapor apply, as mentioned in this comment:

您可以使用mapapply,如本评论中所述

print (df.userid.map(lambda x: '{:.0f}'.format(x)))
0             nan
1    109117800000
2    113785600000
Name: userid, dtype: object


df.userid = df.userid.map(lambda x: '{:.0f}'.format(x))
print (df)
         userid
0           nan
1  109117800000
2  113785600000

I wondered whether mapwould be faster, but it is the same:

我想知道是否map会更快,但它是一样的:

#[300000 rows x 1 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
#print (df)

In [40]: %timeit (df.userid.map(lambda x: '{:.0f}'.format(x)))
1 loop, best of 3: 211 ms per loop

In [41]: %timeit (df.userid.apply(lambda x: '{:.0f}'.format(x)))
1 loop, best of 3: 210 ms per loop

Another solution is to_string, but it is slow:

另一个解决方案是to_string,但速度很慢:

print(df.userid.to_string(float_format='{:.0f}'.format))
0            nan
1   109117800000
2   113785600000

In [41]: (df.userid.to_string(float_format='{:.0f}'.format))
1 loop, best of 3: 2.52 s per loop

回答by Douglas Navarro

I just stumbled upon this problem after reading a dataframe from a json file using the read_jsonmethod and unfortunately it does not have a keep_default_naparameter.

在使用该read_json方法从 json 文件读取数据帧后,我偶然发现了这个问题,不幸的是它没有keep_default_na参数。

The solution was to convert the long floats to np.int64before converting them to str.

解决方案是np.int64先将长浮点数转换为str.

In [53]: tweet_id_sample = tweets.iloc[0]['id']
         tweet_id_sample
Out[53]: 8.924206435553362e+17

In [54]: tweet_id_sample.astype(str)
Out[54]: '8.924206435553362e+17'

In [55]: tweet_id_sample.astype(np.int64).astype(str)
Out[55]: '892420643555336192'

In [56]: # This overflows
         tweet_id_sample.astype(int)
Out[56]: -2147483648