选择具有不同概率的列表元素的 Pythonic 方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4113307/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pythonic way to select list elements with different probability
提问by Christian
import random
pos = ["A", "B", "C"]
x = random.choice["A", "B", "C"]
This code gives me either "A", "B" or "C" with equal probability. Is there a nice way to express it when you want "A" with 30%, "B" with 40% and "C" with 30% probability?
这段代码以相同的概率给我“A”、“B”或“C”。当您想要 30% 的“A”、40% 的“B”和 30% 的“C”时,有什么好的表达方式吗?
采纳答案by unutbu
Weights define a probability distribution function (pdf). Random numbers from any such pdf can be generated by applying its associated inverse cumulative distribution functionto uniform random numbers between 0 and 1.
权重定义概率分布函数 (pdf)。可以通过将其关联的逆累积分布函数应用于 0 和 1 之间的均匀随机数来生成来自任何此类 pdf 的随机数。
See also this SO explanation, or, as explained by Wikipedia:
另请参阅此SO 解释,或者,如Wikipedia 所述:
If Y has a U[0,1] distribution then F?1(Y) is distributed as F. This is used in random number generation using the inverse transform sampling-method.
如果 Y 具有 U[0,1] 分布,则 F?1(Y) 分布为 F。这用于使用逆变换采样方法的随机数生成。
import random
import bisect
import collections
def cdf(weights):
total = sum(weights)
result = []
cumsum = 0
for w in weights:
cumsum += w
result.append(cumsum / total)
return result
def choice(population, weights):
assert len(population) == len(weights)
cdf_vals = cdf(weights)
x = random.random()
idx = bisect.bisect(cdf_vals, x)
return population[idx]
weights=[0.3, 0.4, 0.3]
population = 'ABC'
counts = collections.defaultdict(int)
for i in range(10000):
counts[choice(population, weights)] += 1
print(counts)
# % test.py
# defaultdict(<type 'int'>, {'A': 3066, 'C': 2964, 'B': 3970})
The choicefunction above uses bisect.bisect, so selection of a weighted random variable is done in O(log n)where nis the length of weights.
在choice上述用途功能bisect.bisect,所以加权随机变量的选择是在完成O(log n)其中n是的长度weights。
Note that as of version 1.7.0, NumPy has a Cythonized np.random.choice function. For example, this generates 1000 samples from the population [0,1,2,3]with weights [0.1, 0.2, 0.3, 0.4]:
请注意,从 1.7.0 版本开始,NumPy 具有 Cythonized np.random.choice 函数。例如,这会从[0,1,2,3]权重为 的总体中生成 1000 个样本[0.1, 0.2, 0.3, 0.4]:
import numpy as np
np.random.choice(4, 1000, p=[0.1, 0.2, 0.3, 0.4])
np.random.choicealso has a replaceparameter for sampling with or without replacement.
np.random.choice也有一个replace带或不带替换采样的参数。
A theoretically better algorithm is the Alias Method. It builds a table which requires O(n)time, but after that, samples can be drawn in O(1)time. So, if you need to draw many samples, in theory the Alias Method may be faster. There is a Python implementation of the Walker Alias Method here, and a numpy version here.
理论上更好的算法是别名方法。它建立一个需要O(n)时间的表格,但之后,可以及时抽取样本O(1)。所以,如果你需要抽取很多样本,理论上 Alias Method 可能会更快。这里有一个 Walker Alias Method 的 Python 实现,这里有一个numpy 版本。
回答by Ignacio Vazquez-Abrams
Not... so much...
没那么多...
pos = ['A'] * 3 + ['B'] * 4 + ['C'] * 3
print random.choice(pos)
or
或者
pos = {'A': 3, 'B': 4, 'C': 3}
print random.choice([x for x in pos for y in range(pos[x])])
回答by Glenn Maynard
Here's a class to expose a bunch of items with relative probabilities, without actually expanding the list:
这是一个公开一堆具有相对概率的项目的类,而无需实际扩展列表:
import bisect
class WeightedTuple(object):
"""
>>> p = WeightedTuple({'A': 2, 'B': 1, 'C': 3})
>>> len(p)
6
>>> p[0], p[1], p[2], p[3], p[4], p[5]
('A', 'A', 'B', 'C', 'C', 'C')
>>> p[-1], p[-2], p[-3], p[-4], p[-5], p[-6]
('C', 'C', 'C', 'B', 'A', 'A')
>>> p[6]
Traceback (most recent call last):
...
IndexError
>>> p[-7]
Traceback (most recent call last):
...
IndexError
"""
def __init__(self, items):
self.indexes = []
self.items = []
next_index = 0
for key in sorted(items.keys()):
val = items[key]
self.indexes.append(next_index)
self.items.append(key)
next_index += val
self.len = next_index
def __getitem__(self, n):
if n < 0:
n = self.len + n
if n < 0 or n >= self.len:
raise IndexError
idx = bisect.bisect_right(self.indexes, n)
return self.items[idx-1]
def __len__(self):
return self.len
Now, just say:
现在,只说:
data = WeightedTuple({'A': 30, 'B': 40, 'C': 30})
random.choice(data)
回答by Jeff Bradberry
Try this:
尝试这个:
import random
from decimal import Decimal
pos = {'A': Decimal("0.3"), 'B': Decimal("0.4"), 'C': Decimal("0.3")}
choice = random.random()
F_x = 0
for k, p in pos.iteritems():
F_x += p
if choice <= F_x:
x = k
break
回答by jsbueno
You can also make use this form, which does not create a list arbitrarily big (and can work with either integral or decimal probabilities):
您还可以使用这种形式,它不会创建任意大的列表(并且可以使用整数或小数概率):
pos = [("A", 30), ("B", 40), ("C", 30)]
from random import uniform
def w_choice(seq):
total_prob = sum(item[1] for item in seq)
chosen = random.uniform(0, total_prob)
cumulative = 0
for item, probality in seq:
cumulative += probality
if cumulative > chosen:
return item
回答by Jeet
There are some good solutions offered here, but I would suggest that you look at Eli Bendersky's thorough discussionof this issue, which compares various algorithms to achieve this (with implementations in Python) before choosing one.
这里提供了一些很好的解决方案,但我建议您在选择一种之前查看Eli Bendersky对这个问题的深入讨论,其中比较了实现此目的的各种算法(使用 Python 实现)。

