pandas Python熊猫插入长整数

Question

提问by Tom

I'm trying to insert long integers in a Pandas Dataframe

我正在尝试在 Pandas Dataframe 中插入长整数

import numpy as np
from pandas import DataFrame

data_scores = [(6311132704823138710, 273), (2685045978526272070, 23), (8921811264899370420, 45), (17019687244989530680L, 270), (9930107427299601010L, 273)]
dtype = [('uid', 'u8'), ('score', 'u8')]
data = np.zeros((len(data_scores),),dtype=dtype)
data[:] = data_scores
df_crawls = DataFrame(data)
print df_crawls.head()

But when I look in the dataframe, last values which are long are now negative :

但是当我查看数据帧时，最后一个很长的值现在是负数：

                       uid  score
0  6311132704823138710    273
1  2685045978526272070     23
2  8921811264899370420     45
3 -1427056828720020936    270
4 -8516636646409950606    273

uids are 64 bits unsigned int, so 'u8' should be the correct dtype ? Any ideas ?

uid 是 64 位无符号整数，所以 'u8' 应该是正确的 dtype 吗？有任何想法吗？

Answer 1

回答by Wes McKinney

Yes-- it's a present limitation of pandas-- we do plan to add support for unsigned integer dtypes in the future. An error message would be much better:

是的——这是 Pandas 目前的限制——我们确实计划在未来添加对无符号整数 dtypes 的支持。错误信息会好得多：

http://github.com/pydata/pandas/issues/2355

For now you can make the column dtype=objectas a workaround.

现在，您可以将该列dtype=object作为一种解决方法。

EDIT 2012-11-27

编辑 2012-11-27

Detecting overflows now, though will become dtype=object for now until DataFrame has better support for unsigned data types.

现在检测溢出，但现在将变为 dtype=object，直到 DataFrame 更好地支持无符号数据类型。

In [3]: df_crawls
Out[3]: 
                    uid  score
0   6311132704823138710    273
1   2685045978526272070     23
2   8921811264899370420     45
3  17019687244989530680    270
4   9930107427299601010    273

In [4]: df_crawls.dtypes
Out[4]: 
uid      object
score     int64

Answer 2

回答by deinonychusaur

This won't tell you what to do, except try on a 64-bit computer or contact pandas developers (or patch the problem yourself...). But at any rate, this seems to be your problem:

这不会告诉你该怎么做，除非在 64 位计算机上尝试或联系 Pandas 开发人员（或自己修补问题......）。但无论如何，这似乎是你的问题：

The problem is that DataFramedoes not understand unsigned int 64 bit, at least on a 32-bit machine.

问题是DataFrame不理解 unsigned int 64 位，至少在 32 位机器上是这样。

I changed the values of your data_scoreto better be able to track what was happening:

我更改了您的值data_score以更好地跟踪正在发生的事情：

data_scores = [(2**31 + 1, 273), (2 ** 31 - 1, 23), (2 ** 32 + 1, 45), (2 ** 63 - 1, 270), (2 ** 63 + 1, 273)]

Then I tried:

然后我尝试：

In [92]: data.dtype
Out[92]: dtype([('uid', '<u8'), ('score', '<u8')])

In [93]: data
Out[93]: 
array([(2147483649L, 273L), (2147483647L, 23L), (4294967297L, 45L),
       (9223372036854775807L, 270L), (9223372036854775809L, 273L)], 
      dtype=[('uid', '<u8'), ('score', '<u8')])

In [94]: df = DataFrame(data, dtype='uint64')

In [95]: df.values
Out[95]: 
array([[2147483649,                  273],
       [2147483647,                   23],
       [4294967297,                   45],
       [9223372036854775807,                  270],
       [-9223372036854775807,                  273]], dtype=int64)

Notice how the dtypeof DataFramedoesn't match the one requested in row 94. Also as I wrote in the comment above, the numpy array works perfectly. Further, if you specify uint32in row 94 it still specifies a dtypeof int64for the DataFramevalues. However it doesn't give you negative overflows, probably because uint32fits inside the positive values of the int64.

请注意dtypeof如何与DataFrame第 94 行中请求的不匹配。另外，正如我在上面的评论中所写的那样，numpy 数组工作得很好。此外，如果您uint32在第 94 行中指定，它仍会为值指定一个dtypeof 。但是它不会给你负溢出，可能是因为适合.int64DataFrameuint32int64

pandas Python熊猫插入长整数

提问by Tom

回答by Wes McKinney

回答by deinonychusaur

相关推荐

最近更新

标签

pandas Python熊猫插入长整数

提问by Tom

回答by Wes McKinney

回答by deinonychusaur

相关推荐

Pytables 表转换为 Pandas DataFrame

pandas python pandas删除系列中的重复项

Pandas DataFrame 重新索引列问题

由 Timestamp 对象组成的 Pandas 系列的 min() 和 max() 方法的意外结果

相关推荐

最近更新

标签