选择具有不同概率的列表元素的 Pythonic 方法

Question

提问by Christian

import random
pos = ["A", "B", "C"]
x = random.choice["A", "B", "C"]

This code gives me either "A", "B" or "C" with equal probability. Is there a nice way to express it when you want "A" with 30%, "B" with 40% and "C" with 30% probability?

这段代码以相同的概率给我“A”、“B”或“C”。当您想要 30% 的“A”、40% 的“B”和 30% 的“C”时，有什么好的表达方式吗？

Answer 1

采纳答案by unutbu

Weights define a probability distribution function (pdf). Random numbers from any such pdf can be generated by applying its associated inverse cumulative distribution functionto uniform random numbers between 0 and 1.

权重定义概率分布函数 (pdf)。可以通过将其关联的逆累积分布函数应用于 0 和 1 之间的均匀随机数来生成来自任何此类 pdf 的随机数。

See also this SO explanation, or, as explained by Wikipedia:

另请参阅此SO 解释，或者，如Wikipedia 所述：

If Y has a U[0,1] distribution then F?1(Y) is distributed as F. This is used in random number generation using the inverse transform sampling-method.

如果 Y 具有 U[0,1] 分布，则 F?1(Y) 分布为 F。这用于使用逆变换采样方法的随机数生成。

import random
import bisect
import collections

def cdf(weights):
    total = sum(weights)
    result = []
    cumsum = 0
    for w in weights:
        cumsum += w
        result.append(cumsum / total)
    return result

def choice(population, weights):
    assert len(population) == len(weights)
    cdf_vals = cdf(weights)
    x = random.random()
    idx = bisect.bisect(cdf_vals, x)
    return population[idx]

weights=[0.3, 0.4, 0.3]
population = 'ABC'
counts = collections.defaultdict(int)
for i in range(10000):
    counts[choice(population, weights)] += 1
print(counts)

# % test.py
# defaultdict(<type 'int'>, {'A': 3066, 'C': 2964, 'B': 3970})

The choicefunction above uses bisect.bisect, so selection of a weighted random variable is done in O(log n)where nis the length of weights.

在choice上述用途功能bisect.bisect，所以加权随机变量的选择是在完成O(log n)其中n是的长度weights。

Note that as of version 1.7.0, NumPy has a Cythonized np.random.choice function. For example, this generates 1000 samples from the population [0,1,2,3]with weights [0.1, 0.2, 0.3, 0.4]:

请注意，从 1.7.0 版本开始，NumPy 具有 Cythonized np.random.choice 函数。例如，这会从[0,1,2,3]权重为的总体中生成 1000 个样本[0.1, 0.2, 0.3, 0.4]：

import numpy as np
np.random.choice(4, 1000, p=[0.1, 0.2, 0.3, 0.4])

np.random.choicealso has a replaceparameter for sampling with or without replacement.

np.random.choice也有一个replace带或不带替换采样的参数。

A theoretically better algorithm is the Alias Method. It builds a table which requires O(n)time, but after that, samples can be drawn in O(1)time. So, if you need to draw many samples, in theory the Alias Method may be faster. There is a Python implementation of the Walker Alias Method here, and a numpy version here.

理论上更好的算法是别名方法。它建立一个需要O(n)时间的表格，但之后，可以及时抽取样本O(1)。所以，如果你需要抽取很多样本，理论上 Alias Method 可能会更快。这里有一个 Walker Alias Method 的 Python 实现，这里有一个numpy 版本。

Answer 2

回答by Ignacio Vazquez-Abrams

Not... so much...

没那么多...

pos = ['A'] * 3 + ['B'] * 4 + ['C'] * 3
print random.choice(pos)

or

或者

pos = {'A': 3, 'B': 4, 'C': 3}
print random.choice([x for x in pos for y in range(pos[x])])

Answer 3

回答by Glenn Maynard

Here's a class to expose a bunch of items with relative probabilities, without actually expanding the list:

这是一个公开一堆具有相对概率的项目的类，而无需实际扩展列表：

import bisect
class WeightedTuple(object):
    """
    >>> p = WeightedTuple({'A': 2, 'B': 1, 'C': 3})
    >>> len(p)
    6
    >>> p[0], p[1], p[2], p[3], p[4], p[5]
    ('A', 'A', 'B', 'C', 'C', 'C')
    >>> p[-1], p[-2], p[-3], p[-4], p[-5], p[-6]
    ('C', 'C', 'C', 'B', 'A', 'A')
    >>> p[6]
    Traceback (most recent call last):
    ...
    IndexError
    >>> p[-7]
    Traceback (most recent call last):
    ...
    IndexError
    """
    def __init__(self, items):
        self.indexes = []
        self.items = []
        next_index = 0
        for key in sorted(items.keys()):
            val = items[key]
            self.indexes.append(next_index)
            self.items.append(key)
            next_index += val

        self.len = next_index

    def __getitem__(self, n):
        if n < 0:
            n = self.len + n
        if n < 0 or n >= self.len:
            raise IndexError

        idx = bisect.bisect_right(self.indexes, n)
        return self.items[idx-1]

    def __len__(self):
        return self.len

Now, just say:

现在，只说：

data = WeightedTuple({'A': 30, 'B': 40, 'C': 30})
random.choice(data)

Answer 4

回答by Jeff Bradberry

Try this:

尝试这个：

import random
from decimal import Decimal

pos = {'A': Decimal("0.3"), 'B': Decimal("0.4"), 'C': Decimal("0.3")}
choice = random.random()
F_x = 0
for k, p in pos.iteritems():
    F_x += p
    if choice <= F_x:
        x = k
        break

Answer 5

回答by jsbueno

You can also make use this form, which does not create a list arbitrarily big (and can work with either integral or decimal probabilities):

您还可以使用这种形式，它不会创建任意大的列表（并且可以使用整数或小数概率）：

pos = [("A", 30), ("B", 40), ("C", 30)]


from random import uniform
def w_choice(seq):
    total_prob = sum(item[1] for item in seq)
    chosen = random.uniform(0, total_prob)
    cumulative = 0
    for item, probality in seq:
        cumulative += probality
        if cumulative > chosen:
            return item

Answer 6

回答by Jeet

There are some good solutions offered here, but I would suggest that you look at Eli Bendersky's thorough discussionof this issue, which compares various algorithms to achieve this (with implementations in Python) before choosing one.

这里提供了一些很好的解决方案，但我建议您在选择一种之前查看Eli Bendersky对这个问题的深入讨论，其中比较了实现此目的的各种算法（使用 Python 实现）。

选择具有不同概率的列表元素的 Pythonic 方法

提问by Christian

采纳答案by unutbu

回答by Ignacio Vazquez-Abrams

回答by Glenn Maynard

回答by Jeff Bradberry

回答by jsbueno

回答by Jeet

相关推荐

最近更新

标签

选择具有不同概率的列表元素的 Pythonic 方法

提问by Christian

采纳答案by unutbu

回答by Ignacio Vazquez-Abrams

回答by Glenn Maynard

回答by Jeff Bradberry

回答by jsbueno

回答by Jeet

相关推荐

Python 无法在 Mac OS X 上安装 matplotlib

如何在 Python 中继承和扩展列表对象？

python中的点积

Python 如何使用 Django 获取主机服务器的名称？

相关推荐

最近更新

标签