Python random.choice 的加权版本

Question

提问by Colin

I needed to write a weighted version of random.choice (each element in the list has a different probability for being selected). This is what I came up with:

我需要编写 random.choice 的加权版本（列表中的每个元素都有不同的被选中概率）。这就是我想出的：

def weightedChoice(choices):
    """Like random.choice, but each element can have a different chance of
    being selected.

    choices can be any iterable containing iterables with two items each.
    Technically, they can have more than two items, the rest will just be
    ignored.  The first item is the thing being chosen, the second item is
    its weight.  The weights can be any numeric values, what matters is the
    relative differences between them.
    """
    space = {}
    current = 0
    for choice, weight in choices:
        if weight > 0:
            space[current] = choice
            current += weight
    rand = random.uniform(0, current)
    for key in sorted(space.keys() + [current]):
        if rand < key:
            return choice
        choice = space[key]
    return None

This function seems overly complex to me, and ugly. I'm hoping everyone here can offer some suggestions on improving it or alternate ways of doing this. Efficiency isn't as important to me as code cleanliness and readability.

这个功能对我来说似乎过于复杂，而且丑陋。我希望这里的每个人都可以提供一些改进建议或替代方法。对我来说，效率不如代码整洁和可读性重要。

Answer 1

采纳答案by Ronan Paix?o

Since version 1.7.0, NumPy has a choicefunction that supports probability distributions.

从 1.7.0 版本开始，NumPy 有一个choice支持概率分布的函数。

from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
              p=probability_distribution)

Note that probability_distributionis a sequence in the same order of list_of_candidates. You can also use the keyword replace=Falseto change the behavior so that drawn items are not replaced.

请注意，这probability_distribution是一个与相同顺序的序列list_of_candidates。您还可以使用关键字replace=False来更改行为，以便不替换绘制的项目。

Answer 2

回答by Ned Batchelder

def weighted_choice(choices):
   total = sum(w for c, w in choices)
   r = random.uniform(0, total)
   upto = 0
   for c, w in choices:
      if upto + w >= r:
         return c
      upto += w
   assert False, "Shouldn't get here"

Answer 3

回答by PaulMcG

Crude, but may be sufficient:

粗略，但可能就足够了：

import random
weighted_choice = lambda s : random.choice(sum(([v]*wt for v,wt in s),[]))

Does it work?

它有效吗？

# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]

# initialize tally dict
tally = dict.fromkeys(choices, 0)

# tally up 1000 weighted choices
for i in xrange(1000):
    tally[weighted_choice(choices)] += 1

print tally.items()

Prints:

印刷：

[('WHITE', 904), ('GREEN', 22), ('RED', 74)]

Assumes that all weights are integers. They don't have to add up to 100, I just did that to make the test results easier to interpret. (If weights are floating point numbers, multiply them all by 10 repeatedly until all weights >= 1.)

假设所有权重都是整数。他们不必加起来为 100，我只是这样做是为了使测试结果更容易解释。（如果权重是浮点数，则将它们全部乘以 10，直到所有权重 >= 1。）

weights = [.6, .2, .001, .199]
while any(w < 1.0 for w in weights):
    weights = [w*10 for w in weights]
weights = map(int, weights)

Answer 4

回答by Raymond Hettinger

Arrange the weights into a cumulative distribution.
Use random.random()to pick a random float 0.0 <= x < total.
Search the distribution using bisect.bisectas shown in the example at http://docs.python.org/dev/library/bisect.html#other-examples.

将权重排列成累积分布。
使用random.random()选择一个随机 float 0.0 <= x < total。
使用bisect.bisect搜索分布，如http://docs.python.org/dev/library/bisect.html#other-examples中的示例所示。

from random import random
from bisect import bisect

def weighted_choice(choices):
    values, weights = zip(*choices)
    total = 0
    cum_weights = []
    for w in weights:
        total += w
        cum_weights.append(total)
    x = random() * total
    i = bisect(cum_weights, x)
    return values[i]

>>> weighted_choice([("WHITE",90), ("RED",8), ("GREEN",2)])
'WHITE'

If you need to make more than one choice, split this into two functions, one to build the cumulative weights and another to bisect to a random point.

如果您需要做出多个选择，请将其拆分为两个函数，一个用于构建累积权重，另一个用于平分到随机点。

Answer 5

回答by Tony Veijalainen

I looked the pointed other thread and came up with this variation in my coding style, this returns the index of choice for purpose of tallying, but it is simple to return the string ( commented return alternative):

我查看了另一个线程并在我的编码风格中提出了这种变化，这将返回选择的索引以进行计数，但返回字符串很简单（注释返回替代方案）：

import random
import bisect

try:
    range = xrange
except:
    pass

def weighted_choice(choices):
    total, cumulative = 0, []
    for c,w in choices:
        total += w
        cumulative.append((total, c))
    r = random.uniform(0, total)
    # return index
    return bisect.bisect(cumulative, (r,))
    # return item string
    #return choices[bisect.bisect(cumulative, (r,))][0]

# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]

tally = [0 for item in choices]

n = 100000
# tally up n weighted choices
for i in range(n):
    tally[weighted_choice(choices)] += 1

print([t/sum(tally)*100 for t in tally])

Answer 6

回答by Maxime

If you have a weighted dictionary instead of a list you can write this

如果你有一个加权字典而不是一个列表，你可以写这个

items = { "a": 10, "b": 5, "c": 1 } 
random.choice([k for k in items for dummy in range(items[k])])

Note that [k for k in items for dummy in range(items[k])]produces this list ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'b', 'b', 'b', 'b', 'b']

请注意，[k for k in items for dummy in range(items[k])]生成此列表['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'b', 'b', 'b', 'b', 'b']

Answer 7

回答by pweitzman

If you don't mind using numpy, you can use numpy.random.choice.

如果您不介意使用 numpy，则可以使用numpy.random.choice。

For example:

例如：

import numpy

items  = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05]
elems = [i[0] for i in items]
probs = [i[1] for i in items]

trials = 1000
results = [0] * len(items)
for i in range(trials):
    res = numpy.random.choice(items, p=probs)  #This is where the item is selected!
    results[items.index(res)] += 1
results = [r / float(trials) for r in results]
print "item\texpected\tactual"
for i in range(len(probs)):
    print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])

If you know how many selections you need to make in advance, you can do it without a loop like this:

如果您事先知道需要进行多少个选择，则可以在没有循环的情况下进行，如下所示：

numpy.random.choice(items, trials, p=probs)

Answer 8

回答by Mark

A general solution:

一个通用的解决方案：

import random
def weighted_choice(choices, weights):
    total = sum(weights)
    treshold = random.uniform(0, total)
    for k, weight in enumerate(weights):
        total -= weight
        if total < treshold:
            return choices[k]

Answer 9

回答by murphsp1

Here is another version of weighted_choice that uses numpy. Pass in the weights vector and it will return an array of 0's containing a 1 indicating which bin was chosen. The code defaults to just making a single draw but you can pass in the number of draws to be made and the counts per bin drawn will be returned.

这是另一个使用 numpy 的 weighted_choice 版本。传入权重向量，它将返回一个包含 1 的 0 数组，指示选择了哪个 bin。代码默认只进行一次抽奖，但您可以传入要进行的抽奖次数，并且将返回每个抽奖箱的计数。

If the weights vector does not sum to 1, it will be normalized so that it does.

如果权重向量的总和不为 1，则将对其进行归一化以使其达到。

import numpy as np

def weighted_choice(weights, n=1):
    if np.sum(weights)!=1:
        weights = weights/np.sum(weights)

    draws = np.random.random_sample(size=n)

    weights = np.cumsum(weights)
    weights = np.insert(weights,0,0.0)

    counts = np.histogram(draws, bins=weights)
    return(counts[0])

Answer 10

回答by whi

import numpy as np
w=np.array([ 0.4,  0.8,  1.6,  0.8,  0.4])
np.random.choice(w, p=w/sum(w))

Python random.choice 的加权版本

提问by Colin

采纳答案by Ronan Paix?o

回答by Ned Batchelder

回答by PaulMcG

回答by Raymond Hettinger

回答by Tony Veijalainen

回答by Maxime

回答by pweitzman

回答by Mark

回答by murphsp1

回答by whi

相关推荐

最近更新

标签

Python random.choice 的加权版本

提问by Colin

采纳答案by Ronan Paix?o

回答by Ned Batchelder

回答by PaulMcG

回答by Raymond Hettinger

回答by Tony Veijalainen

回答by Maxime

回答by pweitzman

回答by Mark

回答by murphsp1

回答by whi

相关推荐

Python Django 查询列表

Python Nohup 没有将日志写入输出文件

Python 运行时错误：主线程不在主循环中

Python `for` 语法：块代码与单行生成器表达式

相关推荐

最近更新

标签