埃拉托色尼筛法 - 寻找素数 Python

Question

提问by Srikar Appalaraju

Just to clarify, this is not a homework problem :)

澄清一下，这不是家庭作业问题:)

I wanted to find primes for a math application I am building & came across Sieve of Eratosthenesapproach.

我想为我正在构建的数学应用程序找到质数，并遇到了 Eratosthenes 筛法。

I have written an implementation of it in Python. But it's terribly slow. For say, if I want to find all primes less than 2 million. It takes > 20 mins. (I stopped it at this point). How can I speed this up?

我已经用 Python 编写了它的实现。但它非常慢。例如，如果我想找到所有小于 200 万的素数。需要> 20分钟。（我在这一点上停止了它）。我怎样才能加快速度？

def primes_sieve(limit):
    limitn = limit+1
    primes = range(2, limitn)

    for i in primes:
        factors = range(i, limitn, i)
        for f in factors[1:]:
            if f in primes:
                primes.remove(f)
    return primes

print primes_sieve(2000)

UPDATE:I ended up doing profiling on this code & found that quite a lot of time was spent on removing an element from the list. Quite understandable considering it has to traverse the entire list (worst-case) to find the element & then remove it and then readjust the list (maybe some copy goes on?). Anyway, I chucked out list for dictionary. My new implementation -

更新：我最终对这段代码进行了分析，发现从列表中删除一个元素花费了大量时间。考虑到它必须遍历整个列表（最坏的情况）才能找到元素，然后将其删除，然后重新调整列表（也许继续进行某些复制？），这是可以理解的。无论如何，我扔掉了字典列表。我的新实现 -

def primes_sieve1(limit):
    limitn = limit+1
    primes = dict()
    for i in range(2, limitn): primes[i] = True

    for i in primes:
        factors = range(i,limitn, i)
        for f in factors[1:]:
            primes[f] = False
    return [i for i in primes if primes[i]==True]

print primes_sieve1(2000000)

Answer 1

采纳答案by Pi Delport

You're not quite implementing the correct algorithm:

你没有完全实现正确的算法：

In your first example, primes_sievedoesn't maintain a list of primality flags to strike/unset (as in the algorithm), but instead resizes a list of integers continuously, which is very expensive: removing an item from a list requires shifting all subsequent items down by one.

在您的第一个示例中，primes_sieve不维护要删除/取消设置的素数标志列表（如在算法中），而是连续调整整数列表的大小，这非常昂贵：从列表中删除项目需要移动所有后续项目下降了一个。

In the second example, primes_sieve1maintains a dictionaryof primality flags, which is a step in the right direction, but it iterates over the dictionary in undefined order, and redundantly strikes out factors of factors (instead of only factors of primes, as in the algorithm). You could fix this by sorting the keys, and skipping non-primes (which already makes it an order of magnitude faster), but it's still much more efficient to just use a list directly.

在第二个示例中，primes_sieve1维护一个素数标志字典，这是朝着正确方向迈出的一步，但它以未定义的顺序遍历字典，并冗余删除了因子的因子（而不是仅素数的因子，如在算法中）。您可以通过对键进行排序并跳过非素数来解决此问题（这已经使其速度提高了一个数量级），但是直接使用列表仍然效率更高。

The correct algorithm (with a list instead of a dictionary) looks something like:

正确的算法（使用列表而不是字典）看起来像：

def primes_sieve2(limit):
    a = [True] * limit                          # Initialize the primality list
    a[0] = a[1] = False

    for (i, isprime) in enumerate(a):
        if isprime:
            yield i
            for n in range(i*i, limit, i):     # Mark factors non-prime
                a[n] = False

(Note that this also includes the algorithmic optimization of starting the non-prime marking at the prime's square (i*i) instead of its double.)

（请注意，这还包括在素数平方 ( i*i) 而不是其双精度数处开始非素数标记的算法优化。）

Answer 2

回答by Glenn Maynard

Removing from the beginning of an array (list) requires moving all of the items after it down. That means that removing every element from a list in this way starting from the front is an O(n^2) operation.

从数组（列表）的开头删除需要将其后的所有项目向下移动。这意味着以这种方式从列表中删除每个元素是一个 O(n^2) 操作。

You can do this much more efficiently with sets:

你可以用集合更有效地做到这一点：

def primes_sieve(limit):
    limitn = limit+1
    not_prime = set()
    primes = []

    for i in range(2, limitn):
        if i in not_prime:
            continue

        for f in range(i*2, limitn, i):
            not_prime.add(f)

        primes.append(i)

    return primes

print primes_sieve(1000000)

... or alternatively, avoid having to rearrange the list:

... 或者，避免重新排列列表：

def primes_sieve(limit):
    limitn = limit+1
    not_prime = [False] * limitn
    primes = []

    for i in range(2, limitn):
        if not_prime[i]:
            continue
        for f in xrange(i*2, limitn, i):
            not_prime[f] = True

        primes.append(i)

    return primes

Answer 3

回答by Saurabh Rana

def eratosthenes(n):
    multiples = []
    for i in range(2, n+1):
        if i not in multiples:
            print (i)
            for j in range(i*i, n+1, i):
                multiples.append(j)

eratosthenes(100)

Answer 4

回答by Paul Gardiner

I realise this isn't really answering the question of how to generate primes quickly, but perhaps some will find this alternative interesting: because python provides lazy evaluation via generators, eratosthenes' sieve can be implemented exactly as stated:

我意识到这并没有真正回答如何快速生成素数的问题，但也许有些人会发现这种替代方法很有趣：因为 python 通过生成器提供惰性求值，eratosthenes 的筛选可以完全按照说明实现：

def intsfrom(n):
    while True:
        yield n
        n += 1

def sieve(ilist):
    p = next(ilist)
    yield p
    for q in sieve(n for n in ilist if n%p != 0):
        yield q


try:
    for p in sieve(intsfrom(2)):
        print p,

    print ''
except RuntimeError as e:
    print e

The try block is there because the algorithm runs until it blows the stack and without the try block the backtrace is displayed pushing the actual output you want to see off screen.

try 块在那里，因为算法会一直运行直到它炸毁堆栈，并且如果没有 try 块，则显示回溯将您想要在屏幕外看到的实际输出。

Answer 5

回答by Paul Gardiner

A simple speed hack: when you define the variable "primes," set the step to 2 to skip all even numbers automatically, and set the starting point to 1.

一个简单的速度技巧：当您定义变量“素数”时，将步长设置为 2 以自动跳过所有偶数，并将起点设置为 1。

Then you can further optimize by instead of for i in primes, use for i in primes[:round(len(primes) ** 0.5)]. That will dramatically increase performance. In addition, you can eliminate numbers ending with 5 to further increase speed.

然后你可以进一步优化而不是 for i in primes，使用 for i in primes[:round(len(primes) ** 0.5)]。这将显着提高性能。此外，您可以消除以 5 结尾的数字以进一步提高速度。

Answer 6

回答by MrHIDEn

Much faster:

快多了：

import time
def get_primes(n):
  m = n+1
  #numbers = [True for i in range(m)]
  numbers = [True] * m #EDIT: faster
  for i in range(2, int(n**0.5 + 1)):
    if numbers[i]:
      for j in range(i*i, m, i):
        numbers[j] = False
  primes = []
  for i in range(2, m):
    if numbers[i]:
      primes.append(i)
  return primes

start = time.time()
primes = get_primes(10000)
print(time.time() - start)
print(get_primes(100))

Answer 7

回答by SilentDirge

My implementation:

我的实现：

import math
n = 100
marked = {}
for i in range(2, int(math.sqrt(n))):
    if not marked.get(i):
        for x in range(i * i, n, i):
            marked[x] = True

for i in range(2, n):
    if not marked.get(i):
        print i

Answer 8

回答by Ajay

By combining contributions from many enthusiasts (including Glenn Maynard and MrHIDEn from above comments), I came up with following piece of code in python 2:

通过结合许多爱好者的贡献（包括上述评论中的 Glenn Maynard 和 MrHIDEn），我在 python 2 中提出了以下代码：

def simpleSieve(sieveSize):
    #creating Sieve.
    sieve = [True] * (sieveSize+1)
    # 0 and 1 are not considered prime.
    sieve[0] = False
    sieve[1] = False
    for i in xrange(2,int(math.sqrt(sieveSize))+1):
        if sieve[i] == False:
            continue
        for pointer in xrange(i**2, sieveSize+1, i):
            sieve[pointer] = False
    # Sieve is left with prime numbers == True
    primes = []
    for i in xrange(sieveSize+1):
        if sieve[i] == True:
            primes.append(i)
    return primes

sieveSize = input()
primes = simpleSieve(sieveSize)

Time taken for computation on my machine for different inputs in power of 10 is:

在我的机器上计算 10 次幂的不同输入所花费的时间是：

3 : 0.3 ms
4 : 2.4 ms
5 : 23 ms
6 : 0.26 s
7 : 3.1 s
8 : 33 s

3 : 0.3 毫秒
4 : 2.4 毫秒
5 : 23 毫秒
6 : 0.26 秒
7 : 3.1 秒
8 : 33 秒

Answer 9

回答by Jules May

Here's a version that's a bit more memory-efficient (and: a proper sieve, not trial divisions). Basically, instead of keeping an array of all the numbers, and crossing out those that aren't prime, this keeps an array of counters - one for each prime it's discovered - and leap-frogging them ahead of the putative prime. That way, it uses storage proportional to the number of primes, not up to to the highest prime.

这是一个内存效率更高的版本（并且：适当的筛子，而不是试验分区）。基本上，不是保留所有数字的数组，并删除那些不是素数的数组，而是保留一个计数器数组 - 每个发现的素数一个计数器 - 并在假定的素数之前跳过它们。这样，它使用的存储与质数的数量成正比，而不是最高的质数。

import itertools

def primes():

    class counter:
        def __init__ (this,  n): this.n, this.current,  this.isVirgin = n, n*n,  True
            # isVirgin means it's never been incremented
        def advancePast (this,  n): # return true if the counter advanced
            if this.current > n:
                if this.isVirgin: raise StopIteration # if this is virgin, then so will be all the subsequent counters.  Don't need to iterate further.
                return False
            this.current += this.n # pre: this.current == n; post: this.current > n.
            this.isVirgin = False # when it's gone, it's gone
            return True

    yield 1
    multiples = []
    for n in itertools.count(2):
        isPrime = True
        for p in (m.advancePast(n) for m in multiples):
            if p: isPrime = False
        if isPrime:
            yield n
            multiples.append (counter (n))

You'll note that primes()is a generator, so you can keep the results in a list or you can use them directly. Here's the first nprimes:

您会注意到这primes()是一个生成器，因此您可以将结果保存在列表中，也可以直接使用它们。这是第一个n素数：

import itertools

for k in itertools.islice (primes(),  n):
    print (k)

And, for completeness, here's a timer to measure the performance:

而且，为了完整起见，这里有一个计时器来衡量性能：

import time

def timer ():
    t,  k = time.process_time(),  10
    for p in primes():
        if p>k:
            print (time.process_time()-t,  " to ",  p,  "\n")
            k *= 10
            if k>100000: return

Just in case you're wondering, I also wrote primes()as a simple iterator (using __iter__and __next__), and it ran at almost the same speed. Surprised me too!

以防万一你想知道，我还写primes()了一个简单的迭代器（使用__iter__和__next__），它的运行速度几乎相同。也让我吃惊！

Answer 10

回答by FooBar167

I prefer NumPy because of speed.

我更喜欢 NumPy 因为速度。

import numpy as np

# Find all prime numbers using Sieve of Eratosthenes
def get_primes1(n):
    m = int(np.sqrt(n))
    is_prime = np.ones(n, dtype=bool)
    is_prime[:2] = False  # 0 and 1 are not primes

    for i in range(2, m):
        if is_prime[i] == False:
            continue
        is_prime[i*i::i] = False

    return np.nonzero(is_prime)[0]

# Find all prime numbers using brute-force.
def isprime(n):
    ''' Check if integer n is a prime '''
    n = abs(int(n))  # n is a positive integer
    if n < 2:  # 0 and 1 are not primes
        return False
    if n == 2:  # 2 is the only even prime number
        return True
    if not n & 1:  # all other even numbers are not primes
        return False
    # Range starts with 3 and only needs to go up the square root
    # of n for all odd numbers
    for x in range(3, int(n**0.5)+1, 2):
        if n % x == 0:
            return False
    return True

# To apply a function to a numpy array, one have to vectorize the function
def get_primes2(n):
    vectorized_isprime = np.vectorize(isprime)
    a = np.arange(n)
    return a[vectorized_isprime(a)]

Check the output:

检查输出：

n = 100
print(get_primes1(n))
print(get_primes2(n))    
    [ 2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]
    [ 2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]

Compare the speed of Sieve of Eratosthenes and brute-force on Jupyter Notebook. Sieve of Eratosthenes in 539 times faster than brute-force for million elements.

在 Jupyter Notebook 上比较 Eratosthenes 筛分法和蛮力法的速度。埃拉托色尼筛法比百万元素的蛮力法快 539 倍。

%timeit get_primes1(1000000)
%timeit get_primes2(1000000)
4.79 ms ± 90.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.58 s ± 31.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

埃拉托色尼筛法 - 寻找素数 Python

提问by Srikar Appalaraju

采纳答案by Pi Delport

回答by Glenn Maynard

回答by Saurabh Rana

回答by Paul Gardiner

回答by Paul Gardiner

回答by MrHIDEn

回答by SilentDirge

回答by Ajay

回答by Jules May

回答by FooBar167

相关推荐

最近更新

标签

埃拉托色尼筛法 - 寻找素数 Python

提问by Srikar Appalaraju

采纳答案by Pi Delport

回答by Glenn Maynard

回答by Saurabh Rana

回答by Paul Gardiner

回答by Paul Gardiner

回答by MrHIDEn

回答by SilentDirge

回答by Ajay

回答by Jules May

回答by FooBar167

相关推荐

Python 在 PyInstaller 中找不到导入的模块

你如何在 Python 中将文件读入列表？

在python中按索引访问字典值

如何逐行分析 Python 代码？

相关推荐

最近更新

标签