Python 生成具有给定（数字）分布的随机数

Question

提问by pafcu

I have a file with some probabilities for different values e.g.:

我有一个文件，其中包含不同值的一些概率，例如：

I would like to generate random numbers using this distribution. Does an existing module that handles this exist? It's fairly simple to code on your own (build the cumulative density function, generate a random value [0,1] and pick the corresponding value) but it seems like this should be a common problem and probably someone has created a function/module for it.

我想使用这个分布生成随机数。是否存在处理此问题的现有模块？自己编写代码相当简单（构建累积密度函数，生成一个随机值 [0,1] 并选择相应的值），但这似乎应该是一个常见问题，可能有人为它。

I need this because I want to generate a list of birthdays (which do not follow any distribution in the standard randommodule).

我需要这个是因为我想生成一个生日列表（不遵循标准random模块中的任何分布）。

Answer 1

采纳答案by Sven Marnach

scipy.stats.rv_discretemight be what you want. You can supply your probabilities via the valuesparameter. You can then use the rvs()method of the distribution object to generate random numbers.

scipy.stats.rv_discrete可能是你想要的。您可以通过values参数提供概率。然后就可以使用rvs()分布对象的方法生成随机数了。

As pointed out by Eugene Pakhomov in the comments, you can also pass a pkeyword parameter to numpy.random.choice(), e.g.

正如 Eugene Pakhomov 在评论中指出的那样，您还可以将p关键字参数传递给numpy.random.choice()，例如

numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])

If you are using Python 3.6 or above, you can use random.choices()from the standard library – see the answer by Mark Dickinson.

如果您使用的是 Python 3.6 或更高版本，则可以random.choices()从标准库中使用 - 请参阅Mark Dickinson的答案。

Answer 2

回答by Manuel Salvadores

you might want to have a look at NumPy Random sampling distributions

你可能想看看 NumPy随机抽样分布

Answer 3

回答by Marcelo Cantos

(OK, I know you are asking for shrink-wrap, but maybe those home-grown solutions just weren't succinct enough for your liking. :-)

（好吧，我知道您要的是收缩包装，但也许那些本土解决方案不够简洁，不符合您的喜好。:-)

pdf = [(1, 0.1), (2, 0.05), (3, 0.05), (4, 0.2), (5, 0.4), (6, 0.2)]
cdf = [(i, sum(p for j,p in pdf if j < i)) for i,_ in pdf]
R = max(i for r in [random.random()] for i,c in cdf if c <= r)

I pseudo-confirmed that this works by eyeballing the output of this expression:

我通过观察这个表达式的输出来伪确认这是有效的：

sorted(max(i for r in [random.random()] for i,c in cdf if c <= r)
       for _ in range(1000))

Answer 4

回答by khachik

Make a list of items, based on their weights:

根据项目列出项目weights：

items = [1, 2, 3, 4, 5, 6]
probabilities= [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
# if the list of probs is normalized (sum(probs) == 1), omit this part
prob = sum(probabilities) # find sum of probs, to normalize them
c = (1.0)/prob # a multiplier to make a list of normalized probs
probabilities = map(lambda x: c*x, probabilities)
print probabilities

ml = max(probabilities, key=lambda x: len(str(x)) - str(x).find('.'))
ml = len(str(ml)) - str(ml).find('.') -1
amounts = [ int(x*(10**ml)) for x in probabilities]
itemsList = list()
for i in range(0, len(items)): # iterate through original items
  itemsList += items[i:i+1]*amounts[i]

# choose from itemsList randomly
print itemsList

An optimization may be to normalize amounts by the greatest common divisor, to make the target list smaller.

优化可能是通过最大公约数标准化数量，以使目标列表更小。

Also, thismight be interesting.

此外，这可能很有趣。

Answer 5

回答by Lucas Moeskops

Another answer, probably faster :)

另一个答案，可能更快:)

distribution = [(1, 0.2), (2, 0.3), (3, 0.5)]  
# init distribution  
dlist = []  
sumchance = 0  
for value, chance in distribution:  
    sumchance += chance  
    dlist.append((value, sumchance))  
assert sumchance == 1.0 # not good assert because of float equality  

# get random value  
r = random.random()  
# for small distributions use lineair search  
if len(distribution) < 64: # don't know exact speed limit  
    for value, sumchance in dlist:  
        if r < sumchance:  
            return value  
else:  
    # else (not implemented) binary search algorithm

Answer 6

回答by sdcvvc

An advantage to generating the list using CDF is that you can use binary search. While you need O(n) time and space for preprocessing, you can get k numbers in O(k log n). Since normal Python lists are inefficient, you can use arraymodule.

使用 CDF 生成列表的一个优点是您可以使用二分搜索。虽然您需要 O(n) 的时间和空间进行预处理，但您可以在 O(k log n) 中获得 k 个数字。由于普通的 Python 列表效率低下，您可以使用arraymodule。

If you insist on constant space, you can do the following; O(n) time, O(1) space.

如果你坚持不变的空间，你可以做以下事情；O(n) 时间，O(1) 空间。

def random_distr(l):
    r = random.uniform(0, 1)
    s = 0
    for item, prob in l:
        s += prob
        if s >= r:
            return item
    return item  # Might occur because of floating point inaccuracies

Answer 7

回答by Cris Stringfellow

None of these answers is particularly clear or simple.

这些答案都不是特别清楚或简单。

Here is a clear, simple method that is guaranteed to work.

这是一个清晰、简单的方法，可以保证有效。

accumulate_normalize_probabilitiestakes a dictionary pthat maps symbols to probabilities ORfrequencies. It outputs usable list of tuples from which to do selection.

accumulate_normalize_probabilities需要一个p将符号映射到概率或频率的字典。它输出可用的元组列表，从中进行选择。

def accumulate_normalize_values(p):
        pi = p.items() if isinstance(p,dict) else p
        accum_pi = []
        accum = 0
        for i in pi:
                accum_pi.append((i[0],i[1]+accum))
                accum += i[1]
        if accum == 0:
                raise Exception( "You are about to explode the universe. Continue ? Y/N " )
        normed_a = []
        for a in accum_pi:
                normed_a.append((a[0],a[1]*1.0/accum))
        return normed_a

Yields:

产量：

>>> accumulate_normalize_values( { 'a': 100, 'b' : 300, 'c' : 400, 'd' : 200  } )
[('a', 0.1), ('c', 0.5), ('b', 0.8), ('d', 1.0)]

Why it works

为什么有效

The accumulationstep turns each symbol into an interval between itself and the previous symbols probability or frequency (or 0 in the case of the first symbol). These intervals can be used to select from (and thus sample the provided distribution) by simply stepping through the list until the random number in interval 0.0 -> 1.0 (prepared earlier) is less or equal to the current symbol's interval end-point.

所述积累步骤变成每个符号到（在第一符号的情况下或0）本身和先前符号概率或频率之间的间隔。通过简单地遍历列表，直到区间 0.0 -> 1.0（之前准备好的）中的随机数小于或等于当前符号的区间端点，这些区间可用于从中进行选择（并因此对提供的分布进行采样）。

The normalizationreleases us from the need to make sure everything sums to some value. After normalization the "vector" of probabilities sums to 1.0.

在规范化释放我们从需求，以确保一切资金以一定的价值。归一化后，概率的“向量”总和为 1.0。

The rest of the codefor selection and generating a arbitrarily long sample from the distribution is below :

用于从分布中选择和生成任意长样本的其余代码如下：

def select(symbol_intervals,random):
        print symbol_intervals,random
        i = 0
        while random > symbol_intervals[i][1]:
                i += 1
                if i >= len(symbol_intervals):
                        raise Exception( "What did you DO to that poor list?" )
        return symbol_intervals[i][0]


def gen_random(alphabet,length,probabilities=None):
        from random import random
        from itertools import repeat
        if probabilities is None:
                probabilities = dict(zip(alphabet,repeat(1.0)))
        elif len(probabilities) > 0 and isinstance(probabilities[0],(int,long,float)):
                probabilities = dict(zip(alphabet,probabilities)) #ordered
        usable_probabilities = accumulate_normalize_values(probabilities)
        gen = []
        while len(gen) < length:
                gen.append(select(usable_probabilities,random()))
        return gen

Usage :

用法：

>>> gen_random (['a','b','c','d'],10,[100,300,400,200])
['d', 'b', 'b', 'a', 'c', 'c', 'b', 'c', 'c', 'c']   #<--- some of the time

Answer 8

回答by Ramon Martinez

Maybe it is kind of late. But you can use numpy.random.choice(), passing the pparameter:

也许有点晚了。但是你可以使用numpy.random.choice()，传递p参数：

val = numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])

Answer 9

回答by Saksham Varma

from __future__ import division
import random
from collections import Counter


def num_gen(num_probs):
    # calculate minimum probability to normalize
    min_prob = min(prob for num, prob in num_probs)
    lst = []
    for num, prob in num_probs:
        # keep appending num to lst, proportional to its probability in the distribution
        for _ in range(int(prob/min_prob)):
            lst.append(num)
    # all elems in lst occur proportional to their distribution probablities
    while True:
        # pick a random index from lst
        ind = random.randint(0, len(lst)-1)
        yield lst[ind]

Verification:

确认：

gen = num_gen([(1, 0.1),
               (2, 0.05),
               (3, 0.05),
               (4, 0.2),
               (5, 0.4),
               (6, 0.2)])
lst = []
times = 10000
for _ in range(times):
    lst.append(next(gen))
# Verify the created distribution:
for item, count in Counter(lst).iteritems():
    print '%d has %f probability' % (item, count/times)

1 has 0.099737 probability
2 has 0.050022 probability
3 has 0.049996 probability 
4 has 0.200154 probability
5 has 0.399791 probability
6 has 0.200300 probability

Answer 10

回答by Vaibhav

Here is a more effective wayof doing this:

这是一种更有效的方法：

Just call the following function with your 'weights' array (assuming the indices as the corresponding items) and the no. of samples needed. This function can be easily modified to handle ordered pair.

只需使用您的“权重”数组（假设索引为相应项目）和编号调用以下函数。需要的样品数量。这个函数可以很容易地修改来处理有序对。

Returns indexes (or items) sampled/picked (with replacement) using their respective probabilities:

使用它们各自的概率返回采样/挑选（带替换）的索引（或项目）：

def resample(weights, n):
    beta = 0

    # Caveat: Assign max weight to max*2 for best results
    max_w = max(weights)*2

    # Pick an item uniformly at random, to start with
    current_item = random.randint(0,n-1)
    result = []

    for i in range(n):
        beta += random.uniform(0,max_w)

        while weights[current_item] < beta:
            beta -= weights[current_item]
            current_item = (current_item + 1) % n   # cyclic
        else:
            result.append(current_item)
    return result

A short note on the concept used in the while loop. We reduce the current item's weight from cumulative beta, which is a cumulative value constructed uniformly at random, and increment current index in order to find the item, the weight of which matches the value of beta.

关于 while 循环中使用的概念的简短说明。我们从累积 beta 中减少当前 item 的权重，它是一个随机均匀构造的累积值，并增加当前索引以找到权重与 beta 值匹配的 item。

Python 生成具有给定（数字）分布的随机数

提问by pafcu

采纳答案by Sven Marnach

回答by Manuel Salvadores

回答by Marcelo Cantos

回答by khachik

回答by Lucas Moeskops

回答by sdcvvc

回答by Cris Stringfellow

回答by Ramon Martinez

回答by Saksham Varma

回答by Vaibhav

相关推荐

最近更新

标签

Python 生成具有给定（数字）分布的随机数

提问by pafcu

采纳答案by Sven Marnach

回答by Manuel Salvadores

回答by Marcelo Cantos

回答by khachik

回答by Lucas Moeskops

回答by sdcvvc

回答by Cris Stringfellow

回答by Ramon Martinez

回答by Saksham Varma

回答by Vaibhav

相关推荐

python3.x 中的 StringType 和 NoneType

Python 使用 pycrypto (RSA) 签署和验证数据

ipython 和 bpython 之间有什么区别？

Python：如何从 BaseHTTPRequestHandler HTTP POST 处理程序获取键/值对？

相关推荐

最近更新

标签