pandas 熊猫将科学记数法中的浮点数转换为字符串

Question

提问by Cheng

I used read_csv()to load a dataset that looks like this

我曾经read_csv()加载一个看起来像这样的数据集

userid
NaN
1.091178e+11
1.137856e+11

I want to convert the user ids to string. One solution is to add keep_default_na=Falseto read_csv(), which is suggested by this SO: Converting long integers to strings in pandas (to avoid scientific notation)

我想将用户 ID 转换为字符串。一种解决方案是添加keep_default_na=False到read_csv()，这是 SO 建议的：将长整数转换为Pandas中的字符串（以避免科学记数法）

Let's say I don't want to use keep_default_na=False. Is there any way to convert the user id column to str.

假设我不想使用keep_default_na=False. 有什么方法可以将用户 ID 列转换为 str。

I tried df.userid.astype(str)and I got 1.091178e+11back. I was expecting the result in the expanded form not scientific form.

我试过了df.userid.astype(str)，我1.091178e+11回来了。我期待的是扩展形式而不是科学形式的结果。

What should I do？

我该怎么办？

Answer 1

采纳答案by jezrael

You can use mapor apply, as mentioned in this comment:

您可以使用map或apply，如本评论中所述：

print (df.userid.map(lambda x: '{:.0f}'.format(x)))
0             nan
1    109117800000
2    113785600000
Name: userid, dtype: object

df.userid = df.userid.map(lambda x: '{:.0f}'.format(x))
print (df)
         userid
0           nan
1  109117800000
2  113785600000

I wondered whether mapwould be faster, but it is the same:

我想知道是否map会更快，但它是一样的：

#[300000 rows x 1 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
#print (df)

In [40]: %timeit (df.userid.map(lambda x: '{:.0f}'.format(x)))
1 loop, best of 3: 211 ms per loop

In [41]: %timeit (df.userid.apply(lambda x: '{:.0f}'.format(x)))
1 loop, best of 3: 210 ms per loop

Another solution is to_string, but it is slow:

另一个解决方案是to_string，但速度很慢：

print(df.userid.to_string(float_format='{:.0f}'.format))
0            nan
1   109117800000
2   113785600000

In [41]: (df.userid.to_string(float_format='{:.0f}'.format))
1 loop, best of 3: 2.52 s per loop

Answer 2

回答by Douglas Navarro

I just stumbled upon this problem after reading a dataframe from a json file using the read_jsonmethod and unfortunately it does not have a keep_default_naparameter.

在使用该read_json方法从 json 文件读取数据帧后，我偶然发现了这个问题，不幸的是它没有keep_default_na参数。

The solution was to convert the long floats to np.int64before converting them to str.

解决方案是np.int64先将长浮点数转换为str.

In [53]: tweet_id_sample = tweets.iloc[0]['id']
         tweet_id_sample
Out[53]: 8.924206435553362e+17

In [54]: tweet_id_sample.astype(str)
Out[54]: '8.924206435553362e+17'

In [55]: tweet_id_sample.astype(np.int64).astype(str)
Out[55]: '892420643555336192'

In [56]: # This overflows
         tweet_id_sample.astype(int)
Out[56]: -2147483648

pandas 熊猫将科学记数法中的浮点数转换为字符串

提问by Cheng

采纳答案by jezrael

回答by Douglas Navarro

相关推荐

最近更新

标签

pandas 熊猫将科学记数法中的浮点数转换为字符串

提问by Cheng

采纳答案by jezrael

回答by Douglas Navarro

相关推荐

Pandas NameError：未定义名称“df”

pandas 使用机器学习预测 NA（缺失值）

Python Pandas，两行作为列标题？

Pandas Dataframe 分组和标准差

相关推荐

最近更新

标签