在 Python 中创建随机整数列表

Question

提问by Stiggo

I'd like to create a random list of integers for testing purposes. The distribution of the numbers is not important. The only thing that is counting is time. I know generating random numbers is a time-consuming task, but there must be a better way.

我想创建一个随机整数列表以进行测试。数字的分布并不重要。唯一在计算的是时间。我知道生成随机数是一项耗时的任务，但必须有更好的方法。

Here's my current solution:

这是我目前的解决方案：

import random
import timeit

# Random lists from [0-999] interval
print [random.randint(0, 1000) for r in xrange(10)] # v1
print [random.choice([i for i in xrange(1000)]) for r in xrange(10)] # v2

# Measurement:
t1 = timeit.Timer('[random.randint(0, 1000) for r in xrange(10000)]', 'import random') # v1
t2 = timeit.Timer('random.sample(range(1000), 10000)', 'import random') # v2

print t1.timeit(1000)/1000
print t2.timeit(1000)/1000

v2 is faster than v1, but it is not working on such a large scale. It gives the following error:

v2 比 v1 快，但它并没有在如此大规模的情况下工作。它给出了以下错误：

ValueError: sample larger than population

ValueError：样本大于总体

Is there a fast, efficient solution that works at that scale?

有没有一种快速、高效的解决方案可以在这种规模下工作？

Some results from the answer

答案的一些结果

Andrew's: 0.000290962934494

安德鲁的：0.000290962934494

gnibbler's: 0.0058455221653

尼布勒的：0.0058455221653

KennyTM's: 0.00219276118279

KennyTM 的：0.00219276118279

NumPy came, saw, and conquered.

NumPy 来了，看到了，征服了。

Answer 1

采纳答案by Andrew Jaffe

It is not entirely clear what you want, but I would use numpy.random.randint:

你想要什么并不完全清楚，但我会使用numpy.random.randint：

import numpy.random as nprnd
import timeit

t1 = timeit.Timer('[random.randint(0, 1000) for r in xrange(10000)]', 'import random') # v1

### Change v2 so that it picks numbers in (0, 10000) and thus runs...
t2 = timeit.Timer('random.sample(range(10000), 10000)', 'import random') # v2
t3 = timeit.Timer('nprnd.randint(1000, size=10000)', 'import numpy.random as nprnd') # v3

print t1.timeit(1000)/1000
print t2.timeit(1000)/1000
print t3.timeit(1000)/1000

which gives on my machine:

在我的机器上给出：

0.0233682730198
0.00781716918945
0.000147947072983

Note that randint is verydifferent from random.sample (in order for it to work in your case I had to change the 1,000 to 10,000 as one of the commentators pointed out -- if you really want them from 0 to 1,000 you could divide by 10).

请注意，randint与 random.sample非常不同（为了让它在您的情况下工作，我不得不将 1,000 更改为 10,000，正如一位评论员指出的那样——如果您真的希望它们从 0 到 1,000，您可以除以10）。

And if you really don't care what distribution you are getting then it is possible that you either don't understand your problem very well, or random numbers -- with apologies if that sounds rude...

如果你真的不关心你得到的是什么分布，那么你可能不是很好地理解你的问题，或者是随机数——如果这听起来很粗鲁，请道歉......

Answer 2

回答by John La Rooy

All the random methods end up calling random.random()so the best way is to call it directly:

所有随机方法最终都会调用，random.random()所以最好的方法是直接调用它：

[int(1000*random.random()) for i in xrange(10000)]

For example,

例如，

random.randintcalls random.randrange.
random.randrangehas a bunch of overhead to check the range before returning istart + istep*int(self.random() * n).

random.randint调用random.randrange。
random.randrange在返回之前有一堆开销来检查范围istart + istep*int(self.random() * n)。

NumPy is much faster still of course.

当然，NumPy 仍然要快得多。

Answer 3

回答by kennytm

Firstly, you should use randrange(0,1000)or randint(0,999), not randint(0,1000). The upper limit of randintis inclusive.

首先，你应该使用randrange(0,1000)or randint(0,999)，而不是randint(0,1000)。的上限randint包括在内。

For efficiently, randintis simply a wrapper of randrangewhich calls random, so you should just use random. Also, use xrangeas the argument to sample, not range.

为了有效地，randint只是randrange调用的包装器random，所以你应该只使用random. 此外，xrange用作的参数sample，而不是range。

You could use

你可以用

[a for a in sample(xrange(1000),1000) for _ in range(10000/1000)]

to generate 10,000 numbers in the range using sample10 times.

使用sample10 次在范围内生成 10,000 个数字。

(Of course this won't beat NumPy.)

（当然这不会打败 NumPy。）

$ python2.7 -m timeit -s 'from random import randrange' '[randrange(1000) for _ in xrange(10000)]'
10 loops, best of 3: 26.1 msec per loop

$ python2.7 -m timeit -s 'from random import sample' '[a%1000 for a in sample(xrange(10000),10000)]'
100 loops, best of 3: 18.4 msec per loop

$ python2.7 -m timeit -s 'from random import random' '[int(1000*random()) for _ in xrange(10000)]' 
100 loops, best of 3: 9.24 msec per loop

$ python2.7 -m timeit -s 'from random import sample' '[a for a in sample(xrange(1000),1000) for _ in range(10000/1000)]'
100 loops, best of 3: 3.79 msec per loop

$ python2.7 -m timeit -s 'from random import shuffle
> def samplefull(x):
>   a = range(x)
>   shuffle(a)
>   return a' '[a for a in samplefull(1000) for _ in xrange(10000/1000)]'
100 loops, best of 3: 3.16 msec per loop

$ python2.7 -m timeit -s 'from numpy.random import randint' 'randint(1000, size=10000)'
1000 loops, best of 3: 363 usec per loop

But since you don't care about the distribution of numbers, why not just use:

但既然你不关心数字的分布，为什么不直接使用：

range(1000)*(10000/1000)

?

Answer 4

回答by Colonel Panic

Your question about performance is moot—both functions are very fast. The speed of your code will be determined by what you dowith the random numbers.

您关于性能的问题没有实际意义——这两个函数都非常快。你的代码的速度由你什么来决定做与随机数。

However it's important you understand the difference in behaviourof those two functions. One does random sampling with replacement, the other does random sampling without replacement.

但是，了解这两个函数的行为差异很重要。一种是有放回随机抽样，另一种是无放回随机抽样。

在 Python 中创建随机整数列表

提问by Stiggo

Some results from the answer

答案的一些结果

采纳答案by Andrew Jaffe

回答by John La Rooy

回答by kennytm

回答by Colonel Panic

相关推荐

最近更新

标签

在 Python 中创建随机整数列表

提问by Stiggo

Some results from the answer

答案的一些结果

采纳答案by Andrew Jaffe

回答by John La Rooy

回答by kennytm

回答by Colonel Panic

相关推荐

Python 高效的循环缓冲区？

Python PyQT4：将文件拖放到 QListWidget

Python 如何知道我使用的是哪个 Django 版本？是 1.0、1.1 还是 1.2？

在 Python 中获取当前脚本的名称

相关推荐

最近更新

标签