Python 快速素数分解模块
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4643647/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fast prime factorization module
提问by orlp
I am looking for an implementationor clear algorithmfor getting the prime factorization of Nin either python, pseudocode or anything else well-readable. There are a few demands/facts:
我正在寻找一种实现或清晰的算法,用于在 python、伪代码或任何其他可读性良好的东西中获得N的质因数分解。有一些要求/事实:
- Nis between 1 and ~20 digits
- No pre-calculated lookup table, memoization is fine though.
- Need not to be mathematically proven (e.g. could rely on the Goldbach conjecture if needed)
- Need not to be precise, is allowed to be probabilistic/deterministic if needed
- N介于 1 到 ~20 位数字之间
- 没有预先计算的查找表,不过记忆很好。
- 不需要数学证明(例如,如果需要,可以依赖哥德巴赫猜想)
- 不需要精确,如果需要,可以是概率性/确定性
I need a fast prime factorization algorithm, not only for itself, but for usage in many other algorithms like calculating the Euler phi(n).
我需要一个快速的素数分解算法,不仅用于它本身,还用于许多其他算法,例如计算 Euler phi(n)。
I have tried other algorithms from Wikipedia and such but either I couldn't understand them (ECM) or I couldn't create a working implementation from the algorithm (Pollard-Brent).
我已经尝试过维基百科中的其他算法等,但要么我无法理解它们(ECM),要么我无法从该算法(Pollard-Brent)创建一个工作实现。
I am really interested in the Pollard-Brent algorithm, so any more information/implementations on it would be really nice.
我真的对 Pollard-Brent 算法很感兴趣,所以关于它的更多信息/实现会非常好。
Thanks!
谢谢!
EDIT
编辑
After messing around a little I have created a pretty fast prime/factorization module. It combines an optimized trial division algorithm, the Pollard-Brent algorithm, a miller-rabin primality test and the fastest primesieve I found on the internet. gcd is a regular Euclid's GCD implementation (binary Euclid's GCD is muchslower then the regular one).
在稍微弄乱之后,我创建了一个非常快的素数/分解模块。它结合了优化的试除算法、Pollard-Brent 算法、miller-rabin 素性测试和我在互联网上找到的最快的素筛。gcd 是常规 Euclid 的 GCD 实现(二进制 Euclid 的 GCD比常规GCD慢得多)。
Bounty
赏金
Oh joy, a bounty can be acquired! But how can I win it?
哦,喜悦,可以获得赏金!但是我怎么能赢呢?
- Find an optimalization or bug in my module.
- Provide alternative/better algorithms/implementations.
- 在我的模块中找到优化或错误。
- 提供替代/更好的算法/实现。
The answer which is the most complete/constructive gets the bounty.
最完整/最具建设性的答案将获得奖励。
And finally the module itself:
最后是模块本身:
import random
def primesbelow(N):
# http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
#""" Input N>=6, Returns a list of primes, 2 <= p < N """
correction = N % 6 > 1
N = {0:N, 1:N-1, 2:N+4, 3:N+3, 4:N+2, 5:N+1}[N%6]
sieve = [True] * (N // 3)
sieve[0] = False
for i in range(int(N ** .5) // 3 + 1):
if sieve[i]:
k = (3 * i + 1) | 1
sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]
smallprimeset = set(primesbelow(100000))
_smallprimeset = 100000
def isprime(n, precision=7):
# http://en.wikipedia.org/wiki/Miller-Rabin_primality_test#Algorithm_and_running_time
if n < 1:
raise ValueError("Out of bounds, first argument must be > 0")
elif n <= 3:
return n >= 2
elif n % 2 == 0:
return False
elif n < _smallprimeset:
return n in smallprimeset
d = n - 1
s = 0
while d % 2 == 0:
d //= 2
s += 1
for repeat in range(precision):
a = random.randrange(2, n - 2)
x = pow(a, d, n)
if x == 1 or x == n - 1: continue
for r in range(s - 1):
x = pow(x, 2, n)
if x == 1: return False
if x == n - 1: break
else: return False
return True
# https://comeoncodeon.wordpress.com/2010/09/18/pollard-rho-brent-integer-factorization/
def pollard_brent(n):
if n % 2 == 0: return 2
if n % 3 == 0: return 3
y, c, m = random.randint(1, n-1), random.randint(1, n-1), random.randint(1, n-1)
g, r, q = 1, 1, 1
while g == 1:
x = y
for i in range(r):
y = (pow(y, 2, n) + c) % n
k = 0
while k < r and g==1:
ys = y
for i in range(min(m, r-k)):
y = (pow(y, 2, n) + c) % n
q = q * abs(x-y) % n
g = gcd(q, n)
k += m
r *= 2
if g == n:
while True:
ys = (pow(ys, 2, n) + c) % n
g = gcd(abs(x - ys), n)
if g > 1:
break
return g
smallprimes = primesbelow(1000) # might seem low, but 1000*1000 = 1000000, so this will fully factor every composite < 1000000
def primefactors(n, sort=False):
factors = []
for checker in smallprimes:
while n % checker == 0:
factors.append(checker)
n //= checker
if checker > n: break
if n < 2: return factors
while n > 1:
if isprime(n):
factors.append(n)
break
factor = pollard_brent(n) # trial division did not fully factor, switch to pollard-brent
factors.extend(primefactors(factor)) # recurse to factor the not necessarily prime factor returned by pollard-brent
n //= factor
if sort: factors.sort()
return factors
def factorization(n):
factors = {}
for p1 in primefactors(n):
try:
factors[p1] += 1
except KeyError:
factors[p1] = 1
return factors
totients = {}
def totient(n):
if n == 0: return 1
try: return totients[n]
except KeyError: pass
tot = 1
for p, exp in factorization(n).items():
tot *= (p - 1) * p ** (exp - 1)
totients[n] = tot
return tot
def gcd(a, b):
if a == b: return a
while b > 0: a, b = b, a % b
return a
def lcm(a, b):
return abs((a // gcd(a, b)) * b)
回答by milkypostman
You should probably do some prime detection which you could look here, Fast algorithm for finding prime numbers?
您可能应该进行一些素数检测,您可以在这里查看, 查找素数的快速算法?
You should read that entire blog though, there is a few algorithms that he lists for testing primality.
不过,您应该阅读整篇博客,他列出了一些用于测试素性的算法。
回答by Kabie
Even on the current one, there are several spots to be noticed.
即使在目前的情况下,也有几个地方需要注意。
- Don't do
checker*checkerevery loop, uses=ceil(sqrt(num))andchecher < s - checher should plus 2 each time, ignore all even numbers except 2
- Use
divmodinstead of%and//
- 不要做
checker*checker每一个循环,使用s=ceil(sqrt(num))和checher < s - checher 每次都要加 2,忽略除 2 以外的所有偶数
- 使用
divmod代替%和//
回答by Rozuur
There is no need to calculate smallprimesusing primesbelow, use smallprimesetfor that.
无需计算smallprimesusing primesbelow,smallprimeset为此使用。
smallprimes = (2,) + tuple(n for n in xrange(3,1000,2) if n in smallprimeset)
smallprimes = (2,) + tuple(n for n in xrange(3,1000,2) if n in smallprimeset)
Divide your primefactorsinto two functions for handling smallprimesand other for pollard_brent, this can save a couple of iterations as all the powers of smallprimes will be divided from n.
将您的primefactors函数分为两个函数用于处理smallprimes和其他函数pollard_brent,这可以节省几次迭代,因为 smallprimes 的所有幂都将除以 n。
def primefactors(n, sort=False):
factors = []
limit = int(n ** .5) + 1
for checker in smallprimes:
print smallprimes[-1]
if checker > limit: break
while n % checker == 0:
factors.append(checker)
n //= checker
if n < 2: return factors
else :
factors.extend(bigfactors(n,sort))
return factors
def bigfactors(n, sort = False):
factors = []
while n > 1:
if isprime(n):
factors.append(n)
break
factor = pollard_brent(n)
factors.extend(bigfactors(factor,sort)) # recurse to factor the not necessarily prime factor returned by pollard-brent
n //= factor
if sort: factors.sort()
return factors
By considering verified results of Pomerance, Selfridge and Wagstaff and Jaeschke, you can reduce the repetitions in isprimewhich uses Miller-Rabin primality test. From Wiki.
通过考虑 Pomerance、Selfridge 和 Wagstaff 以及 Jaeschke 的验证结果,您可以减少isprime使用 Miller-Rabin 素性检验的重复次数。来自维基。
- if n < 1,373,653, it is enough to test a = 2 and 3;
- if n < 9,080,191, it is enough to test a = 31 and 73;
- if n < 4,759,123,141, it is enough to test a = 2, 7, and 61;
- if n < 2,152,302,898,747, it is enough to test a = 2, 3, 5, 7, and 11;
- if n < 3,474,749,660,383, it is enough to test a = 2, 3, 5, 7, 11, and 13;
- if n < 341,550,071,728,321, it is enough to test a = 2, 3, 5, 7, 11, 13, and 17.
- 如果 n < 1,373,653,则足以测试 a = 2 和 3;
- 如果 n < 9,080,191,则足以测试 a = 31 和 73;
- 如果 n < 4,759,123,141,则足以测试 a = 2、7 和 61;
- 如果 n < 2,152,302,898,747,则足以测试 a = 2、3、5、7 和 11;
- 如果 n < 3,474,749,660,383,则足以测试 a = 2、3、5、7、11 和 13;
- 如果 n < 341,550,071,728,321,则足以测试 a = 2、3、5、7、11、13 和 17。
Edit 1: Corrected return call of if-elseto append bigfactors to factors in primefactors.
编辑 1:更正if-else了将 bigfactors 附加到primefactors.
回答by Ehtesh Choudhury
There's a python library with a collection of primality tests (including incorrect ones for what not to do). It's called pyprimes. Figured it's worth mentioning for posterity's purpose. I don't think it includes the algorithms you mentioned.
有一个 python 库,其中包含一组素性测试(包括不正确的测试)。它被称为pyprimes。认为为了后代的目的值得一提。我认为它不包括您提到的算法。
回答by Quuxplusone
I just ran into a bug in this code when factoring the number 2**1427 * 31.
在分解数字时,我刚刚在这段代码中遇到了一个错误2**1427 * 31。
File "buckets.py", line 48, in prettyprime
factors = primefactors.primefactors(n, sort=True)
File "/private/tmp/primefactors.py", line 83, in primefactors
limit = int(n ** .5) + 1
OverflowError: long int too large to convert to float
This code snippet:
此代码片段:
limit = int(n ** .5) + 1
for checker in smallprimes:
if checker > limit: break
while n % checker == 0:
factors.append(checker)
n //= checker
limit = int(n ** .5) + 1
if checker > limit: break
should be rewritten into
应该改写成
for checker in smallprimes:
while n % checker == 0:
factors.append(checker)
n //= checker
if checker > n: break
which will likely perform faster on realistic inputs anyway. Square root is slow — basically the equivalent of many multiplications —, smallprimesonly has a few dozen members, and this way we remove the computation of n ** .5from the tight inner loop, which is certainly helpful when factoring numbers like 2**1427. There's simply no reason to compute sqrt(2**1427), sqrt(2**1426), sqrt(2**1425), etc. etc., when all we care about is "does the [square of the] checker exceed n".
无论如何,这可能会在实际输入上执行得更快。平方根很慢——基本上相当于许多乘法——smallprimes只有几十个成员,这样我们n ** .5从紧密的内部循环中删除了 的计算,这在分解像2**1427. 根本没有理由计算sqrt(2**1427)、sqrt(2**1426)、sqrt(2**1425)等,当我们只关心“检查器的 [平方] 是否超过n”时。
The rewritten code doesn't throw exceptions when presented with big numbers, andis roughly twice as fast according to timeit(on sample inputs 2and 2**718 * 31).
重写的代码在出现大数字时不会抛出异常,并且根据timeit(样本输入2和2**718 * 31)大约快两倍。
Also notice that isprime(2)returns the wrong result, but this is okay as long as we don't rely on it. IMHO you should rewrite the intro of that function as
还要注意isprime(2)返回错误的结果,但只要我们不依赖它就可以了。恕我直言,您应该将该函数的介绍重写为
if n <= 3:
return n >= 2
...
回答by Antoni Gual Via
You could factorize up to a limit then use brent to get higher factors
您可以分解到一个限制,然后使用布伦特来获得更高的因子
from fractions import gcd
from random import randint
def brent(N):
if N%2==0: return 2
y,c,m = randint(1, N-1),randint(1, N-1),randint(1, N-1)
g,r,q = 1,1,1
while g==1:
x = y
for i in range(r):
y = ((y*y)%N+c)%N
k = 0
while (k<r and g==1):
ys = y
for i in range(min(m,r-k)):
y = ((y*y)%N+c)%N
q = q*(abs(x-y))%N
g = gcd(q,N)
k = k + m
r = r*2
if g==N:
while True:
ys = ((ys*ys)%N+c)%N
g = gcd(abs(x-ys),N)
if g>1: break
return g
def factorize(n1):
if n1==0: return []
if n1==1: return [1]
n=n1
b=[]
p=0
mx=1000000
while n % 2 ==0 : b.append(2);n//=2
while n % 3 ==0 : b.append(3);n//=3
i=5
inc=2
while i <=mx:
while n % i ==0 : b.append(i); n//=i
i+=inc
inc=6-inc
while n>mx:
p1=n
while p1!=p:
p=p1
p1=brent(p)
b.append(p1);n//=p1
if n!=1:b.append(n)
return sorted(b)
from functools import reduce
#n= 2**1427 * 31 #
n= 67898771 * 492574361 * 10000223 *305175781* 722222227*880949 *908909
li=factorize(n)
print (li)
print (n - reduce(lambda x,y :x*y ,li))
回答by Colonel Panic
If you don't want to reinvent the wheel, use the library sympy
如果您不想重新发明轮子,请使用库sympy
pip install sympy
Use the function sympy.ntheory.factorint
>>> from sympy.ntheory import factorint
>>> factorint(10**20+1)
{73: 1, 5964848081: 1, 1676321: 1, 137: 1}
You can factor some very large numbers:
您可以分解一些非常大的数字:
>>> factorint(10**100+1)
{401: 1, 5964848081: 1, 1676321: 1, 1601: 1, 1201: 1, 137: 1, 73: 1, 129694419029057750551385771184564274499075700947656757821537291527196801: 1}

