pandas Python熊猫插入长整数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13550940/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:30:23  来源:igfitidea点击:

Python pandas insert long integer

pythonnumpypandas

提问by Tom

I'm trying to insert long integers in a Pandas Dataframe

我正在尝试在 Pandas Dataframe 中插入长整数

import numpy as np
from pandas import DataFrame

data_scores = [(6311132704823138710, 273), (2685045978526272070, 23), (8921811264899370420, 45), (17019687244989530680L, 270), (9930107427299601010L, 273)]
dtype = [('uid', 'u8'), ('score', 'u8')]
data = np.zeros((len(data_scores),),dtype=dtype)
data[:] = data_scores
df_crawls = DataFrame(data)
print df_crawls.head()

But when I look in the dataframe, last values which are long are now negative :

但是当我查看数据帧时,最后一个很长的值现在是负数:

                       uid  score
0  6311132704823138710    273
1  2685045978526272070     23
2  8921811264899370420     45
3 -1427056828720020936    270
4 -8516636646409950606    273

uids are 64 bits unsigned int, so 'u8' should be the correct dtype ? Any ideas ?

uid 是 64 位无符号整数,所以 'u8' 应该是正确的 dtype 吗?有任何想法吗 ?

回答by Wes McKinney

Yes-- it's a present limitation of pandas-- we do plan to add support for unsigned integer dtypes in the future. An error message would be much better:

是的——这是 Pandas 目前的限制——我们确实计划在未来添加对无符号整数 dtypes 的支持。错误信息会好得多:

http://github.com/pydata/pandas/issues/2355

http://github.com/pydata/pandas/issues/2355

For now you can make the column dtype=objectas a workaround.

现在,您可以将该列dtype=object作为一种解决方法。

EDIT 2012-11-27

编辑 2012-11-27

Detecting overflows now, though will become dtype=object for now until DataFrame has better support for unsigned data types.

现在检测溢出,但现在将变为 dtype=object,直到 DataFrame 更好地支持无符号数据类型。

In [3]: df_crawls
Out[3]: 
                    uid  score
0   6311132704823138710    273
1   2685045978526272070     23
2   8921811264899370420     45
3  17019687244989530680    270
4   9930107427299601010    273

In [4]: df_crawls.dtypes
Out[4]: 
uid      object
score     int64

回答by deinonychusaur

This won't tell you what to do, except try on a 64-bit computer or contact pandas developers (or patch the problem yourself...). But at any rate, this seems to be your problem:

这不会告诉你该怎么做,除非在 64 位计算机上尝试或联系 Pandas 开发人员(或自己修补问题......)。但无论如何,这似乎是你的问题:

The problem is that DataFramedoes not understand unsigned int 64 bit, at least on a 32-bit machine.

问题是DataFrame不理解 unsigned int 64 位,至少在 32 位机器上是这样。

I changed the values of your data_scoreto better be able to track what was happening:

我更改了您的值data_score以更好地跟踪正在发生的事情:

data_scores = [(2**31 + 1, 273), (2 ** 31 - 1, 23), (2 ** 32 + 1, 45), (2 ** 63 - 1, 270), (2 ** 63 + 1, 273)]

Then I tried:

然后我尝试:

In [92]: data.dtype
Out[92]: dtype([('uid', '<u8'), ('score', '<u8')])

In [93]: data
Out[93]: 
array([(2147483649L, 273L), (2147483647L, 23L), (4294967297L, 45L),
       (9223372036854775807L, 270L), (9223372036854775809L, 273L)], 
      dtype=[('uid', '<u8'), ('score', '<u8')])

In [94]: df = DataFrame(data, dtype='uint64')

In [95]: df.values
Out[95]: 
array([[2147483649,                  273],
       [2147483647,                   23],
       [4294967297,                   45],
       [9223372036854775807,                  270],
       [-9223372036854775807,                  273]], dtype=int64)

Notice how the dtypeof DataFramedoesn't match the one requested in row 94. Also as I wrote in the comment above, the numpy array works perfectly. Further, if you specify uint32in row 94 it still specifies a dtypeof int64for the DataFramevalues. However it doesn't give you negative overflows, probably because uint32fits inside the positive values of the int64.

请注意dtypeof如何与DataFrame第 94 行中请求的不匹配。另外,正如我在上面的评论中所写的那样,numpy 数组工作得很好。此外,如果您uint32在第 94 行中指定,它仍会为值指定一个dtypeof 。但是它不会给你负溢出,可能是因为适合.int64DataFrameuint32int64