Python 快速素数分解模块

Question

提问by orlp

I am looking for an implementationor clear algorithmfor getting the prime factorization of Nin either python, pseudocode or anything else well-readable. There are a few demands/facts:

我正在寻找一种实现或清晰的算法，用于在 python、伪代码或任何其他可读性良好的东西中获得N的质因数分解。有一些要求/事实：

Nis between 1 and ~20 digits
No pre-calculated lookup table, memoization is fine though.
Need not to be mathematically proven (e.g. could rely on the Goldbach conjecture if needed)
Need not to be precise, is allowed to be probabilistic/deterministic if needed

N介于 1 到 ~20 位数字之间
没有预先计算的查找表，不过记忆很好。
不需要数学证明（例如，如果需要，可以依赖哥德巴赫猜想）
不需要精确，如果需要，可以是概率性/确定性

I need a fast prime factorization algorithm, not only for itself, but for usage in many other algorithms like calculating the Euler phi(n).

我需要一个快速的素数分解算法，不仅用于它本身，还用于许多其他算法，例如计算 Euler phi(n)。

I have tried other algorithms from Wikipedia and such but either I couldn't understand them (ECM) or I couldn't create a working implementation from the algorithm (Pollard-Brent).

我已经尝试过维基百科中的其他算法等，但要么我无法理解它们（ECM），要么我无法从该算法（Pollard-Brent）创建一个工作实现。

I am really interested in the Pollard-Brent algorithm, so any more information/implementations on it would be really nice.

我真的对 Pollard-Brent 算法很感兴趣，所以关于它的更多信息/实现会非常好。

Thanks!

谢谢！

EDIT

编辑

After messing around a little I have created a pretty fast prime/factorization module. It combines an optimized trial division algorithm, the Pollard-Brent algorithm, a miller-rabin primality test and the fastest primesieve I found on the internet. gcd is a regular Euclid's GCD implementation (binary Euclid's GCD is muchslower then the regular one).

在稍微弄乱之后，我创建了一个非常快的素数/分解模块。它结合了优化的试除算法、Pollard-Brent 算法、miller-rabin 素性测试和我在互联网上找到的最快的素筛。gcd 是常规 Euclid 的 GCD 实现（二进制 Euclid 的 GCD比常规GCD慢得多）。

Bounty

赏金

Oh joy, a bounty can be acquired! But how can I win it?

哦，喜悦，可以获得赏金！但是我怎么能赢呢？

Find an optimalization or bug in my module.
Provide alternative/better algorithms/implementations.

在我的模块中找到优化或错误。
提供替代/更好的算法/实现。

The answer which is the most complete/constructive gets the bounty.

最完整/最具建设性的答案将获得奖励。

And finally the module itself:

最后是模块本身：

import random

def primesbelow(N):
    # http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    #""" Input N>=6, Returns a list of primes, 2 <= p < N """
    correction = N % 6 > 1
    N = {0:N, 1:N-1, 2:N+4, 3:N+3, 4:N+2, 5:N+1}[N%6]
    sieve = [True] * (N // 3)
    sieve[0] = False
    for i in range(int(N ** .5) // 3 + 1):
        if sieve[i]:
            k = (3 * i + 1) | 1
            sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
            sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
    return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]

smallprimeset = set(primesbelow(100000))
_smallprimeset = 100000
def isprime(n, precision=7):
    # http://en.wikipedia.org/wiki/Miller-Rabin_primality_test#Algorithm_and_running_time
    if n < 1:
        raise ValueError("Out of bounds, first argument must be > 0")
    elif n <= 3:
        return n >= 2
    elif n % 2 == 0:
        return False
    elif n < _smallprimeset:
        return n in smallprimeset


    d = n - 1
    s = 0
    while d % 2 == 0:
        d //= 2
        s += 1

    for repeat in range(precision):
        a = random.randrange(2, n - 2)
        x = pow(a, d, n)

        if x == 1 or x == n - 1: continue

        for r in range(s - 1):
            x = pow(x, 2, n)
            if x == 1: return False
            if x == n - 1: break
        else: return False

    return True

# https://comeoncodeon.wordpress.com/2010/09/18/pollard-rho-brent-integer-factorization/
def pollard_brent(n):
    if n % 2 == 0: return 2
    if n % 3 == 0: return 3

    y, c, m = random.randint(1, n-1), random.randint(1, n-1), random.randint(1, n-1)
    g, r, q = 1, 1, 1
    while g == 1:
        x = y
        for i in range(r):
            y = (pow(y, 2, n) + c) % n

        k = 0
        while k < r and g==1:
            ys = y
            for i in range(min(m, r-k)):
                y = (pow(y, 2, n) + c) % n
                q = q * abs(x-y) % n
            g = gcd(q, n)
            k += m
        r *= 2
    if g == n:
        while True:
            ys = (pow(ys, 2, n) + c) % n
            g = gcd(abs(x - ys), n)
            if g > 1:
                break

    return g

smallprimes = primesbelow(1000) # might seem low, but 1000*1000 = 1000000, so this will fully factor every composite < 1000000
def primefactors(n, sort=False):
    factors = []

    for checker in smallprimes:
        while n % checker == 0:
            factors.append(checker)
            n //= checker
        if checker > n: break

    if n < 2: return factors

    while n > 1:
        if isprime(n):
            factors.append(n)
            break
        factor = pollard_brent(n) # trial division did not fully factor, switch to pollard-brent
        factors.extend(primefactors(factor)) # recurse to factor the not necessarily prime factor returned by pollard-brent
        n //= factor

    if sort: factors.sort()

    return factors

def factorization(n):
    factors = {}
    for p1 in primefactors(n):
        try:
            factors[p1] += 1
        except KeyError:
            factors[p1] = 1
    return factors

totients = {}
def totient(n):
    if n == 0: return 1

    try: return totients[n]
    except KeyError: pass

    tot = 1
    for p, exp in factorization(n).items():
        tot *= (p - 1)  *  p ** (exp - 1)

    totients[n] = tot
    return tot

def gcd(a, b):
    if a == b: return a
    while b > 0: a, b = b, a % b
    return a

def lcm(a, b):
    return abs((a // gcd(a, b)) * b)

Answer 1

回答by milkypostman

You should probably do some prime detection which you could look here, Fast algorithm for finding prime numbers?

您可能应该进行一些素数检测，您可以在这里查看，查找素数的快速算法？

You should read that entire blog though, there is a few algorithms that he lists for testing primality.

不过，您应该阅读整篇博客，他列出了一些用于测试素性的算法。

Answer 2

回答by Kabie

Even on the current one, there are several spots to be noticed.

即使在目前的情况下，也有几个地方需要注意。

Don't do checker*checkerevery loop, use s=ceil(sqrt(num))and checher < s
checher should plus 2 each time, ignore all even numbers except 2
Use divmodinstead of %and //

不要做checker*checker每一个循环，使用s=ceil(sqrt(num))和checher < s
checher 每次都要加 2，忽略除 2 以外的所有偶数
使用divmod代替%和//

Answer 3

回答by Rozuur

There is no need to calculate smallprimesusing primesbelow, use smallprimesetfor that.

无需计算smallprimesusing primesbelow，smallprimeset为此使用。

smallprimes = (2,) + tuple(n for n in xrange(3,1000,2) if n in smallprimeset)

Divide your primefactorsinto two functions for handling smallprimesand other for pollard_brent, this can save a couple of iterations as all the powers of smallprimes will be divided from n.

将您的primefactors函数分为两个函数用于处理smallprimes和其他函数pollard_brent，这可以节省几次迭代，因为 smallprimes 的所有幂都将除以 n。

def primefactors(n, sort=False):
    factors = []

    limit = int(n ** .5) + 1
    for checker in smallprimes:
        print smallprimes[-1]
        if checker > limit: break
        while n % checker == 0:
            factors.append(checker)
            n //= checker


    if n < 2: return factors
    else : 
        factors.extend(bigfactors(n,sort))
        return factors

def bigfactors(n, sort = False):
    factors = []
    while n > 1:
        if isprime(n):
            factors.append(n)
            break
        factor = pollard_brent(n) 
        factors.extend(bigfactors(factor,sort)) # recurse to factor the not necessarily prime factor returned by pollard-brent
        n //= factor

    if sort: factors.sort()    
    return factors

By considering verified results of Pomerance, Selfridge and Wagstaff and Jaeschke, you can reduce the repetitions in isprimewhich uses Miller-Rabin primality test. From Wiki.

通过考虑 Pomerance、Selfridge 和 Wagstaff 以及 Jaeschke 的验证结果，您可以减少isprime使用 Miller-Rabin 素性检验的重复次数。来自维基。

if n < 1,373,653, it is enough to test a = 2 and 3;
if n < 9,080,191, it is enough to test a = 31 and 73;
if n < 4,759,123,141, it is enough to test a = 2, 7, and 61;
if n < 2,152,302,898,747, it is enough to test a = 2, 3, 5, 7, and 11;
if n < 3,474,749,660,383, it is enough to test a = 2, 3, 5, 7, 11, and 13;
if n < 341,550,071,728,321, it is enough to test a = 2, 3, 5, 7, 11, 13, and 17.

如果 n < 1,373,653，则足以测试 a = 2 和 3；
如果 n < 9,080,191，则足以测试 a = 31 和 73；
如果 n < 4,759,123,141，则足以测试 a = 2、7 和 61；
如果 n < 2,152,302,898,747，则足以测试 a = 2、3、5、7 和 11；
如果 n < 3,474,749,660,383，则足以测试 a = 2、3、5、7、11 和 13；
如果 n < 341,550,071,728,321，则足以测试 a = 2、3、5、7、11、13 和 17。

Edit 1: Corrected return call of if-elseto append bigfactors to factors in primefactors.

编辑 1：更正if-else了将 bigfactors 附加到primefactors.

Answer 4

回答by Ehtesh Choudhury

There's a python library with a collection of primality tests (including incorrect ones for what not to do). It's called pyprimes. Figured it's worth mentioning for posterity's purpose. I don't think it includes the algorithms you mentioned.

有一个 python 库，其中包含一组素性测试（包括不正确的测试）。它被称为pyprimes。认为为了后代的目的值得一提。我认为它不包括您提到的算法。

Answer 5

回答by Quuxplusone

I just ran into a bug in this code when factoring the number 2**1427 * 31.

在分解数字时，我刚刚在这段代码中遇到了一个错误2**1427 * 31。

  File "buckets.py", line 48, in prettyprime
    factors = primefactors.primefactors(n, sort=True)
  File "/private/tmp/primefactors.py", line 83, in primefactors
    limit = int(n ** .5) + 1
OverflowError: long int too large to convert to float

This code snippet:

此代码片段：

limit = int(n ** .5) + 1
for checker in smallprimes:
    if checker > limit: break
    while n % checker == 0:
        factors.append(checker)
        n //= checker
        limit = int(n ** .5) + 1
        if checker > limit: break

should be rewritten into

应该改写成

for checker in smallprimes:
    while n % checker == 0:
        factors.append(checker)
        n //= checker
    if checker > n: break

which will likely perform faster on realistic inputs anyway. Square root is slow — basically the equivalent of many multiplications —, smallprimesonly has a few dozen members, and this way we remove the computation of n ** .5from the tight inner loop, which is certainly helpful when factoring numbers like 2**1427. There's simply no reason to compute sqrt(2**1427), sqrt(2**1426), sqrt(2**1425), etc. etc., when all we care about is "does the [square of the] checker exceed n".

无论如何，这可能会在实际输入上执行得更快。平方根很慢——基本上相当于许多乘法——smallprimes只有几十个成员，这样我们n ** .5从紧密的内部循环中删除了的计算，这在分解像2**1427. 根本没有理由计算sqrt(2**1427)、sqrt(2**1426)、sqrt(2**1425)等，当我们只关心“检查器的 [平方] 是否超过n”时。

The rewritten code doesn't throw exceptions when presented with big numbers, andis roughly twice as fast according to timeit(on sample inputs 2and 2**718 * 31).

重写的代码在出现大数字时不会抛出异常，并且根据timeit（样本输入2和2**718 * 31）大约快两倍。

Also notice that isprime(2)returns the wrong result, but this is okay as long as we don't rely on it. IMHO you should rewrite the intro of that function as

还要注意isprime(2)返回错误的结果，但只要我们不依赖它就可以了。恕我直言，您应该将该函数的介绍重写为

if n <= 3:
    return n >= 2
...

Answer 6

回答by Antoni Gual Via

You could factorize up to a limit then use brent to get higher factors

您可以分解到一个限制，然后使用布伦特来获得更高的因子

from fractions import gcd
from random import randint

def brent(N):
   if N%2==0: return 2
   y,c,m = randint(1, N-1),randint(1, N-1),randint(1, N-1)
   g,r,q = 1,1,1
   while g==1:             
       x = y
       for i in range(r):
          y = ((y*y)%N+c)%N
       k = 0
       while (k<r and g==1):
          ys = y
          for i in range(min(m,r-k)):
             y = ((y*y)%N+c)%N
             q = q*(abs(x-y))%N
          g = gcd(q,N)
          k = k + m
       r = r*2
   if g==N:
       while True:
          ys = ((ys*ys)%N+c)%N
          g = gcd(abs(x-ys),N)
          if g>1:  break
   return g

def factorize(n1):
    if n1==0: return []
    if n1==1: return [1]
    n=n1
    b=[]
    p=0
    mx=1000000
    while n % 2 ==0 : b.append(2);n//=2
    while n % 3 ==0 : b.append(3);n//=3
    i=5
    inc=2
    while i <=mx:
       while n % i ==0 : b.append(i); n//=i
       i+=inc
       inc=6-inc
    while n>mx:
      p1=n
      while p1!=p:
          p=p1
          p1=brent(p)
      b.append(p1);n//=p1 
    if n!=1:b.append(n)   
    return sorted(b)

from functools import reduce
#n= 2**1427 * 31 #
n= 67898771  * 492574361 * 10000223 *305175781* 722222227*880949 *908909
li=factorize(n)
print (li)
print (n - reduce(lambda x,y :x*y ,li))

Answer 7

回答by Colonel Panic

If you don't want to reinvent the wheel, use the library sympy

如果您不想重新发明轮子，请使用库sympy

pip install sympy

Use the function sympy.ntheory.factorint

使用功能 sympy.ntheory.factorint

>>> from sympy.ntheory import factorint
>>> factorint(10**20+1)
{73: 1, 5964848081: 1, 1676321: 1, 137: 1}

You can factor some very large numbers:

您可以分解一些非常大的数字：

>>> factorint(10**100+1)
{401: 1, 5964848081: 1, 1676321: 1, 1601: 1, 1201: 1, 137: 1, 73: 1, 129694419029057750551385771184564274499075700947656757821537291527196801: 1}

Python 快速素数分解模块

提问by orlp

Bounty

赏金

回答by milkypostman

回答by Kabie

回答by Rozuur

回答by Ehtesh Choudhury

回答by Quuxplusone

回答by Antoni Gual Via

回答by Colonel Panic

相关推荐

最近更新

标签

Python 快速素数分解模块

提问by orlp

Bounty

赏金

回答by milkypostman

回答by Kabie

回答by Rozuur

回答by Ehtesh Choudhury

回答by Quuxplusone

回答by Antoni Gual Via

回答by Colonel Panic

相关推荐

是否有可能在交互模式下执行 Python 脚本

Python 在一维 numpy 数组中使用 Numpy 查找局部最大值/最小值

Python 打印一个字符串，在字符之间有一点延迟

在 Python 中生成与 RegEx 匹配的字符串

相关推荐

最近更新

标签