在 Python 中创建随机整数列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4172131/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Create random list of integers in Python
提问by Stiggo
I'd like to create a random list of integers for testing purposes. The distribution of the numbers is not important. The only thing that is counting is time. I know generating random numbers is a time-consuming task, but there must be a better way.
我想创建一个随机整数列表以进行测试。数字的分布并不重要。唯一在计算的是时间。我知道生成随机数是一项耗时的任务,但必须有更好的方法。
Here's my current solution:
这是我目前的解决方案:
import random
import timeit
# Random lists from [0-999] interval
print [random.randint(0, 1000) for r in xrange(10)] # v1
print [random.choice([i for i in xrange(1000)]) for r in xrange(10)] # v2
# Measurement:
t1 = timeit.Timer('[random.randint(0, 1000) for r in xrange(10000)]', 'import random') # v1
t2 = timeit.Timer('random.sample(range(1000), 10000)', 'import random') # v2
print t1.timeit(1000)/1000
print t2.timeit(1000)/1000
v2 is faster than v1, but it is not working on such a large scale. It gives the following error:
v2 比 v1 快,但它并没有在如此大规模的情况下工作。它给出了以下错误:
ValueError: sample larger than population
ValueError:样本大于总体
Is there a fast, efficient solution that works at that scale?
有没有一种快速、高效的解决方案可以在这种规模下工作?
Some results from the answer
答案的一些结果
Andrew's: 0.000290962934494
安德鲁的:0.000290962934494
gnibbler's: 0.0058455221653
尼布勒的:0.0058455221653
KennyTM's: 0.00219276118279
KennyTM 的:0.00219276118279
NumPy came, saw, and conquered.
NumPy 来了,看到了,征服了。
采纳答案by Andrew Jaffe
It is not entirely clear what you want, but I would use numpy.random.randint:
你想要什么并不完全清楚,但我会使用numpy.random.randint:
import numpy.random as nprnd
import timeit
t1 = timeit.Timer('[random.randint(0, 1000) for r in xrange(10000)]', 'import random') # v1
### Change v2 so that it picks numbers in (0, 10000) and thus runs...
t2 = timeit.Timer('random.sample(range(10000), 10000)', 'import random') # v2
t3 = timeit.Timer('nprnd.randint(1000, size=10000)', 'import numpy.random as nprnd') # v3
print t1.timeit(1000)/1000
print t2.timeit(1000)/1000
print t3.timeit(1000)/1000
which gives on my machine:
在我的机器上给出:
0.0233682730198
0.00781716918945
0.000147947072983
Note that randint is verydifferent from random.sample (in order for it to work in your case I had to change the 1,000 to 10,000 as one of the commentators pointed out -- if you really want them from 0 to 1,000 you could divide by 10).
请注意,randint与 random.sample非常不同(为了让它在您的情况下工作,我不得不将 1,000 更改为 10,000,正如一位评论员指出的那样——如果您真的希望它们从 0 到 1,000,您可以除以10)。
And if you really don't care what distribution you are getting then it is possible that you either don't understand your problem very well, or random numbers -- with apologies if that sounds rude...
如果你真的不关心你得到的是什么分布,那么你可能不是很好地理解你的问题,或者是随机数——如果这听起来很粗鲁,请道歉......
回答by John La Rooy
All the random methods end up calling random.random()so the best way is to call it directly:
所有随机方法最终都会调用,random.random()所以最好的方法是直接调用它:
[int(1000*random.random()) for i in xrange(10000)]
For example,
例如,
random.randintcallsrandom.randrange.random.randrangehas a bunch of overhead to check the range before returningistart + istep*int(self.random() * n).
random.randint调用random.randrange。random.randrange在返回之前有一堆开销来检查范围istart + istep*int(self.random() * n)。
NumPy is much faster still of course.
当然,NumPy 仍然要快得多。
回答by kennytm
Firstly, you should use randrange(0,1000)or randint(0,999), not randint(0,1000). The upper limit of randintis inclusive.
首先,你应该使用randrange(0,1000)or randint(0,999),而不是randint(0,1000)。的上限randint包括在内。
For efficiently, randintis simply a wrapper of randrangewhich calls random, so you should just use random. Also, use xrangeas the argument to sample, not range.
为了有效地,randint只是randrange调用的包装器random,所以你应该只使用random. 此外,xrange用作 的参数sample,而不是range。
You could use
你可以用
[a for a in sample(xrange(1000),1000) for _ in range(10000/1000)]
to generate 10,000 numbers in the range using sample10 times.
使用sample10 次在范围内生成 10,000 个数字。
(Of course this won't beat NumPy.)
(当然这不会打败 NumPy。)
$ python2.7 -m timeit -s 'from random import randrange' '[randrange(1000) for _ in xrange(10000)]'
10 loops, best of 3: 26.1 msec per loop
$ python2.7 -m timeit -s 'from random import sample' '[a%1000 for a in sample(xrange(10000),10000)]'
100 loops, best of 3: 18.4 msec per loop
$ python2.7 -m timeit -s 'from random import random' '[int(1000*random()) for _ in xrange(10000)]'
100 loops, best of 3: 9.24 msec per loop
$ python2.7 -m timeit -s 'from random import sample' '[a for a in sample(xrange(1000),1000) for _ in range(10000/1000)]'
100 loops, best of 3: 3.79 msec per loop
$ python2.7 -m timeit -s 'from random import shuffle
> def samplefull(x):
> a = range(x)
> shuffle(a)
> return a' '[a for a in samplefull(1000) for _ in xrange(10000/1000)]'
100 loops, best of 3: 3.16 msec per loop
$ python2.7 -m timeit -s 'from numpy.random import randint' 'randint(1000, size=10000)'
1000 loops, best of 3: 363 usec per loop
But since you don't care about the distribution of numbers, why not just use:
但既然你不关心数字的分布,为什么不直接使用:
range(1000)*(10000/1000)
?
?
回答by Colonel Panic
Your question about performance is moot—both functions are very fast. The speed of your code will be determined by what you dowith the random numbers.
您关于性能的问题没有实际意义——这两个函数都非常快。你的代码的速度由你什么来决定做与随机数。
However it's important you understand the difference in behaviourof those two functions. One does random sampling with replacement, the other does random sampling without replacement.
但是,了解这两个函数的行为差异很重要。一种是有放回随机抽样,另一种是无放回随机抽样。

