在 numpy 或 pandas 中处理巨大的数字

Question

提问by Terence Chow

I am doing a competition where I am provided data that is anonymized. Quite a few of the columns have HUGE values. The largest was 40 digits long! I used pd.read_csvbut those columns have been converted to objects as a result.

我正在举办一场比赛，我在比赛中获得了匿名数据。相当多的列具有巨大的价值。最大的是40位数！我使用过，pd.read_csv但结果这些列已转换为对象。

My original plan was to scale the data down but since they are seen as objects I can't do arithmetic on these.

我最初的计划是缩小数据规模，但由于它们被视为对象，我无法对它们进行算术运算。

Does anyone have a suggestion on how to handle huge numbers in Pandas or Numpy?

有没有人对如何在 Pandas 或 Numpy 中处理大量数字有任何建议？

Note that I've tried converting the value to a uint64with no luck. I get the error "long too big to convert"

请注意，我已经尝试将值转换为 auint64没有运气。我收到错误“太长而无法转换”

Answer 1

回答by DSM

If you have a mixed-type column -- some integers, some strings -- stored in a dtype=object column, you can still convert to ints and perform arithmetic. Starting from a mixed-type column:

如果你有一个混合类型的列——一些整数，一些字符串——存储在 dtype=object 列中，你仍然可以转换为整数并执行算术。从混合类型列开始：

>>> df = pd.DataFrame({"A": [11**44, "11"*22]})
>>> df
                                                A
0  6626407607736641103900260617069258125403649041
1    11111111111111111111111111111111111111111111

[2 rows x 1 columns]
>>> df.dtypes, list(map(type, df.A))
(A    object
dtype: object, [<type 'long'>, <type 'str'>])

We can convert to ints:

我们可以转换为整数：

>>> df["A"] = df["A"].apply(int)
>>> df.dtypes, list(map(type, df.A))
(A    object
dtype: object, [<type 'long'>, <type 'long'>])
>>> df
                                                A
0  6626407607736641103900260617069258125403649041
1    11111111111111111111111111111111111111111111

[2 rows x 1 columns]

And then perform arithmetic:

然后进行算术运算：

>>> df // 11
                                               A
0  602400691612421918536387328824478011400331731
1    1010101010101010101010101010101010101010101

[2 rows x 1 columns]

Answer 2

回答by dawg

You can use Pandas convertersto call intor some other custom converter function on the string as they are being imported:

您可以使用 Pandas转换int器在导入字符串时调用或其他一些自定义转换器函数：

import pandas as pd 
from StringIO import StringIO

txt='''\
line,Big_Num,text
1,1234567890123456789012345678901234567890,"That sure is a big number"
2,9999999999999999999999999999999999999999,"That is an even BIGGER number"
3,1,"Tiny"
4,-9999999999999999999999999999999999999999,"Really negative"
'''

df=pd.read_csv(StringIO(txt), converters={'Big_Num':int})

print df

Prints:

印刷：

   line                                    Big_Num                           text
0     1   1234567890123456789012345678901234567890      That sure is a big number
1     2   9999999999999999999999999999999999999999  That is an even BIGGER number
2     3                                          1                           Tiny
3     4  -9999999999999999999999999999999999999999                Really negative

Now test arithmetic:

现在测试算术：

n=df["Big_Num"][1]
print n,n+1

Prints:

印刷：

9999999999999999999999999999999999999999 10000000000000000000000000000000000000000

If you have any values in the column that might cause intto croak, you can do this:

如果列中的任何值可能导致int发出嘶嘶声，您可以执行以下操作：

txt='''\
line,Big_Num,text
1,1234567890123456789012345678901234567890,"That sure is a big number"
2,9999999999999999999999999999999999999999,"That is an even BIGGER number"
3,0.000000000000000001,"Tiny"
4,"a string","Use 0 for strings"
'''

def conv(s):
    try:
        return int(s)
    except ValueError:
        try:
            return float(s)
        except ValueError:
            return 0        

df=pd.read_csv(StringIO(txt), converters={'Big_Num':conv})
print df

Prints:

印刷：

   line                                   Big_Num                           text
0     1  1234567890123456789012345678901234567890      That sure is a big number
1     2  9999999999999999999999999999999999999999  That is an even BIGGER number
2     3                                     1e-18                           Tiny
3     4                                         0              Use 0 for strings

Then every value in the column will be either a Python int or a float and will support arithmetic.

然后列中的每个值都将是 Python int 或 float 并且支持算术。

Answer 3

回答by Andy Hayden

Edit: These can't be (accurately) represented as floats either, it just doesn't raise when you try... probably best to use object dtype and longs as in DSM's answer.

编辑：这些也不能（准确地）表示为浮点数，它只是在您尝试时不会引发……可能最好使用对象 dtype 和 longs，如 DSM 的答案。

But you can do it inaccurately (using @DSM's example):

但是您可以不准确地执行此操作（使用@DSM 的示例）：

In [11]: df = pd.DataFrame({"A": [11**44, "11"*22]}).astype(float)

In [12]: df
Out[12]: 
              A
0  6.626408e+45
1  1.111111e+43

[2 rows x 1 columns]

In [13]: df.dtypes
Out[13]: 
A    float64
dtype: object

But it may not be what you want...

但它可能不是你想要的......

In [21]: df.iloc[0, 0]
Out[21]: 6.6264076077366411e+45

In [22]: long(df.iloc[0, 0])
Out[22]: 6626407607736641089115845702792172379125579776L

In [23]: 11 ** 44
Out[23]: 6626407607736641103900260617069258125403649041L

As DSM suggests, convert to long (and use object dtype) so as not to lose accuracy:

正如 DSM 建议的那样，转换为 long（并使用 object dtype）以免失去准确性：

In [31]: df = pd.DataFrame({"A": [11**44, "11"*22]}).apply(long, 1)

In [32]: df
Out[32]: 
0    6626407607736641103900260617069258125403649041
1      11111111111111111111111111111111111111111111
dtype: object

在 numpy 或 pandas 中处理巨大的数字

提问by Terence Chow

回答by DSM

回答by dawg

回答by Andy Hayden

相关推荐

最近更新

标签

在 numpy 或 pandas 中处理巨大的数字

提问by Terence Chow

回答by DSM

回答by dawg

回答by Andy Hayden

相关推荐

pandas 如何在熊猫图中显示中文？

从 Pandas 回归中获取要绘制的回归线

Pandas Statsmodels ols 使用 DF 预测器进行回归预测？

pandas 从 csv 文件中读取列上的多索引

相关推荐

最近更新

标签