Python 中的概率分布

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/526255/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:18:12  来源:igfitidea点击:

Probability distribution in Python

pythonalgorithmrandomdistributionprobability

提问by Nicholas Leonard

I have a bunch of keys that each have an unlikeliness variable. I want to randomly choose one of these keys, yet I want it to be more unlikely for unlikely (key, values) to be chosen than a less unlikely (a more likely) object. I am wondering if you would have any suggestions, preferably an existing python module that I could use, else I will need to make it myself.

我有一堆键,每个键都有一个不太可能的变量。我想随机选择这些键之一,但我希望不太可能(键,值)被选择的可能性比不太可能(更有可能)的对象更不可能。我想知道您是否有任何建议,最好是我可以使用的现有 python 模块,否则我需要自己制作。

I have checked out the random module; it does not seem to provide this.

我已经检查了随机模块;它似乎没有提供这个。

I have to make such choices many millions of times for 1000 different sets of objects each containing 2,455 objects. Each set will exchange objects among each other so the random chooser needs to be dynamic. With 1000 sets of 2,433 objects, that is 2,433 million objects; low memory consumption is crucial. And since these choices are not the bulk of the algorithm, I need this process to be quite fast; CPU-time is limited.

我必须为 1000 组不同的对象做出数百万次这样的选择,每组包含 2,455 个对象。每个集合将相互交换对象,因此随机选择器需要是动态的。1000组2433个对象,即24.33亿个对象;低内存消耗至关重要。由于这些选择不是算法的主要部分,我需要这个过程非常快;CPU 时间是有限的。

Thx

谢谢

Update:

更新:

Ok, I tried to consider your suggestions wisely, but time is so limited...

好吧,我试着明智地考虑你的建议,但时间太有限了......

I looked at the binary search tree approach and it seems too risky (complex and complicated). The other suggestions all resemble the ActiveState recipe. I took it and modified it a little in the hope of making more efficient:

我查看了二叉搜索树方法,它似乎太冒险(复杂和复杂)。其他建议都类似于 ActiveState 配方。我把它拿起来修改了一下,希望能提高效率:

def windex(dict, sum, max):
    '''an attempt to make a random.choose() function that makes
    weighted choices accepts a dictionary with the item_key and
    certainty_value as a pair like:
    >>> x = [('one', 20), ('two', 2), ('three', 50)], the
    maximum certainty value (max) and the sum of all certainties.'''
    n = random.uniform(0, 1)
    sum = max*len(list)-sum 
    for key, certainty in dict.iteritems():
        weight = float(max-certainty)/sum
        if n < weight:
            break
        n = n - weight
    return key

I am hoping to get an efficiency gain from dynamically maintaining the sum of certainties and the maximum certainty. Any further suggestions are welcome. You guys saves me so much time and effort, while increasing my effectiveness, it is crazy. Thx! Thx! Thx!

我希望通过动态维护确定性总和和最大确定性来提高效率。欢迎任何进一步的建议。你们省了我这么多时间和精力,同时提高了我的效率,真是太疯狂了。谢谢!谢谢!谢谢!

Update2:

更新2:

I decided to make it more efficient by letting it choose more choices at once. This will result in an acceptable loss in precision in my algo for it is dynamic in nature. Anyway, here is what I have now:

我决定让它一次选择更多选择来提高效率。这将在我的算法中导致可接受的精度损失,因为它本质上是动态的。无论如何,这就是我现在所拥有的:

def weightedChoices(dict, sum, max, choices=10):
    '''an attempt to make a random.choose() function that makes
    weighted choices accepts a dictionary with the item_key and
    certainty_value as a pair like:
    >>> x = [('one', 20), ('two', 2), ('three', 50)], the
    maximum certainty value (max) and the sum of all certainties.'''
    list = [random.uniform(0, 1) for i in range(choices)]
    (n, list) = relavate(list.sort())
    keys = []
    sum = max*len(list)-sum 
    for key, certainty in dict.iteritems():
        weight = float(max-certainty)/sum
        if n < weight:
            keys.append(key)
            if list: (n, list) = relavate(list)
            else: break
        n = n - weight
    return keys
def relavate(list):
    min = list[0]
    new = [l - min for l in list[1:]]
    return (min, new)

I haven't tried it out yet. If you have any comments/suggestions, please do not hesitate. Thx!

我还没有试过。如果您有任何意见/建议,请不要犹豫。谢谢!

Update3:

更新3:

I have been working all day on a task-tailored version of Rex Logan's answer. Instead of a 2 arrays of objects and weights, it is actually a special dictionary class; which makes things quite complex since Rex's code generates a random index... I also coded a test case that kind of resembles what will happen in my algo (but I can't really know until I try!). The basic principle is: the more a key is randomly generated often, the more unlikely it will be generated again:

我一整天都在研究 Rex Logan 答案的任务定制版本。它实际上是一个特殊的字典类,而不是 2 个对象和权重数组;这使得事情变得非常复杂,因为 Rex 的代码生成了一个随机索引......我还编写了一个测试用例,它类似于我的算法中会发生的事情(但我在尝试之前无法真正知道!)。基本原理是:一个key随机生成的次数越多,再次生成的可能性就越大:

import random, time
import psyco
psyco.full()

class ProbDict():
    """
    Modified version of Rex Logans RandomObject class. The more a key is randomly
    chosen, the more unlikely it will further be randomly chosen. 
    """
    def __init__(self,keys_weights_values={}):
        self._kw=keys_weights_values
        self._keys=self._kw.keys()
        self._len=len(self._keys)
        self._findSeniors()
        self._effort = 0.15
        self._fails = 0
    def __iter__(self):
        return self.next()
    def __getitem__(self, key):
        return self._kw[key]
    def __setitem__(self, key, value):
        self.append(key, value)
    def __len__(self):
        return self._len
    def next(self):
        key=self._key()
        while key:
            yield key
            key = self._key()
    def __contains__(self, key):
        return key in self._kw
    def items(self):
        return self._kw.items()
    def pop(self, key):  
        try:
            (w, value) = self._kw.pop(key)
            self._len -=1
            if w == self._seniorW:
                self._seniors -= 1
                if not self._seniors:
                    #costly but unlikely:
                    self._findSeniors()
            return [w, value]
        except KeyError:
            return None
    def popitem(self):
        return self.pop(self._key())
    def values(self):
        values = []
        for key in self._keys:
            try:
                values.append(self._kw[key][1])
            except KeyError:
                pass
        return values
    def weights(self):
        weights = []
        for key in self._keys:
            try:
                weights.append(self._kw[key][0])
            except KeyError:
                pass
        return weights
    def keys(self, imperfect=False):
        if imperfect: return self._keys
        return self._kw.keys()
    def append(self, key, value=None):
        if key not in self._kw:
            self._len +=1
            self._kw[key] = [0, value]
            self._keys.append(key)
        else:
            self._kw[key][1]=value
    def _key(self):
        for i in range(int(self._effort*self._len)):
            ri=random.randint(0,self._len-1) #choose a random object
            rx=random.uniform(0,self._seniorW)
            rkey = self._keys[ri]
            try:
                w = self._kw[rkey][0]
                if rx >= w: # test to see if that is the value we want
                    w += 1
                    self._warnSeniors(w)
                    self._kw[rkey][0] = w
                    return rkey
            except KeyError:
                self._keys.pop(ri)
        # if you do not find one after 100 tries then just get a random one
        self._fails += 1 #for confirming effectiveness only
        for key in self._keys:
            if key in self._kw:
                w = self._kw[key][0] + 1
                self._warnSeniors(w)
                self._kw[key][0] = w
                return key
        return None
    def _findSeniors(self):
        '''this function finds the seniors, counts them and assess their age. It
        is costly but unlikely.'''
        seniorW = 0
        seniors = 0
        for w in self._kw.itervalues():
            if w >= seniorW:
                if w == seniorW:
                    seniors += 1
                else:
                    seniorsW = w
                    seniors = 1
        self._seniors = seniors
        self._seniorW = seniorW
    def _warnSeniors(self, w):
        #a weight can only be incremented...good
        if w >= self._seniorW:
            if w == self._seniorW:
                self._seniors+=1
            else:
                self._seniors = 1
                self._seniorW = w
def test():
    #test code
    iterations = 200000
    size = 2500
    nextkey = size 


    pd = ProbDict(dict([(i,[0,i]) for i in xrange(size)]))
    start = time.clock()
    for i in xrange(iterations):
        key=pd._key()
        w=pd[key][0]
        if random.randint(0,1+pd._seniorW-w):
            #the heavier the object, the more unlikely it will be removed
            pd.pop(key)
        probAppend = float(500+(size-len(pd)))/1000
        if random.uniform(0,1) < probAppend:
            nextkey+=1
            pd.append(nextkey)
    print (time.clock()-start)*1000/iterations, "msecs / iteration with", pd._fails, "failures /", iterations, "iterations"
    weights = pd.weights()
    weights.sort()
    print "avg weight:", float(sum(weights))/pd._len, max(weights), pd._seniorW, pd._seniors, len(pd), len(weights)
    print weights
test()

Any comments are still welcome. @Darius: your binary trees are too complex and complicated for me; and I do not think its leafs can be removed efficiently... Thx all

仍然欢迎任何评论。@Darius:你的二叉树对我来说太复杂了;而且我不认为它的叶子可以有效地去除...... Thx all

回答by David

This activestate recipegives an easy-to-follow approach, specifically the version in the comments that doesn't require you to pre-normalize your weights:

这个 activestate 配方提供了一种易于遵循的方法,特别是评论中不需要您预先标准化权重的版本:

import random

def weighted_choice(items):
    """items is a list of tuples in the form (item, weight)"""
    weight_total = sum((item[1] for item in items))
    n = random.uniform(0, weight_total)
    for item, weight in items:
        if n < weight:
            return item
        n = n - weight
    return item

This will be slow if you have a large list of items. A binary search would probably be better in that case... but would also be more complicated to write, for little gain if you have a small sample size. Here's an example of the binary search approach in pythonif you want to follow that route.

如果您有大量项目,这将很慢。在这种情况下,二进制搜索可能会更好……但编写起来也会更复杂,如果样本量很小,则收益很小。 如果您想遵循该路线,这里有一个在 python 中使用二进制搜索方法的示例

(I'd recommend doing some quick performance testing of both methods on your dataset. The performance of different approaches to this sort of algorithm is often a bit unintuitive.)

(我建议在您的数据集上对这两种方法进行一些快速性能测试。这种算法的不同方法的性能通常有点不直观。)



Edit:I took my own advice, since I was curious, and did a few tests.

编辑:我接受了自己的建议,因为我很好奇,并做了一些测试。

I compared four approaches:

我比较了四种方法:

The weighted_choice function above.

上面的 weighted_choice 函数。

A binary-search choice function like so:

像这样的二分搜索选择函数:

def weighted_choice_bisect(items):
    added_weights = []
    last_sum = 0

    for item, weight in items:
        last_sum += weight
        added_weights.append(last_sum)

    return items[bisect.bisect(added_weights, random.random() * last_sum)][0]

A compiling version of 1:

1的编译版本:

def weighted_choice_compile(items):
    """returns a function that fetches a random item from items

    items is a list of tuples in the form (item, weight)"""
    weight_total = sum((item[1] for item in items))
    def choice(uniform = random.uniform):
        n = uniform(0, weight_total)
        for item, weight in items:
            if n < weight:
                return item
            n = n - weight
        return item
    return choice

A compiling version of 2:

编译版本2:

def weighted_choice_bisect_compile(items):
    """Returns a function that makes a weighted random choice from items."""
    added_weights = []
    last_sum = 0

    for item, weight in items:
        last_sum += weight
        added_weights.append(last_sum)

    def choice(rnd=random.random, bis=bisect.bisect):
        return items[bis(added_weights, rnd() * last_sum)][0]
    return choice

I then built a big list of choices like so:

然后我建立了一个很大的选择列表,如下所示:

choices = [(random.choice("abcdefg"), random.uniform(0,50)) for i in xrange(2500)]

And an excessively simple profiling function:

还有一个过于简单的分析功能:

def profiler(f, n, *args, **kwargs):
    start = time.time()
    for i in xrange(n):
        f(*args, **kwargs)
    return time.time() - start

The results:

结果:

(Seconds taken for 1,000 calls to the function.)

(调用该函数 1,000 次所用的秒数。)

  • Simple uncompiled: 0.918624162674
  • Binary uncompiled: 1.01497793198
  • Simple compiled: 0.287325024605
  • Binary compiled: 0.00327413797379
  • 简单的未编译:0.918624162674
  • 未编译的二进制文件:1.01497793198
  • 简单编译:0.287325024605
  • 二进制编译:0.00327413797379

The "compiled" results include the average time taken to compile the choice function once. (I timed 1,000 compiles, then divided that time by 1,000, and added the result to the choice function time.)

“编译”结果包括编译一次选择函数所花费的平均时间。(我对 1,000 次编译进行计时,然后将该时间除以 1,000,并将结果添加到选择函数时间。)

So: if you have a list of items+weights which change very rarely, the binary compiled method is by farthe fastest.

所以:如果你有一个很少改变的项目+权重列表,二进制编译方法是迄今为止最快的。

回答by Darius Bacon

In comments on the original post, Nicholas Leonard suggests that both the exchanging and the sampling need to be fast. Here's an idea for that case; I haven't tried it.

在对原帖的评论中,Nicholas Leonard 建议交换和采样都需要快速。这是这种情况的一个想法;我没试过。

If only sampling had to be fast, we could use an array of the values together with the running sum of their probabilities, and do a binary search on the running sum (with key being a uniform random number) -- an O(log(n)) operation. But an exchange would require updating all of the running-sum values appearing after the entries exchanged -- an O(n) operation. (Could you choose to exchange only items near the end of their lists? I'll assume not.)

如果只需要快速采样,我们可以使用一组值及其概率的运行总和,并对运行总和进行二分搜索(键是一个统一的随机数)——一个 O(log( n)) 操作。但是交换需要更新所有出现在交换条目之后的运行总和值——一个 O(n) 操作。(你可以选择只交换清单末尾附近的物品吗?我假设不会。)

So let's aim for O(log(n)) in both operations. Instead of an array, keep a binary tree for each set to sample from. A leaf holds the sample value and its (unnormalized) probability. A branch node holds the total probability of its children.

因此,让我们在两个操作中都以 O(log(n)) 为目标。为每个要从中采样的集合保留一个二叉树,而不是数组。叶子保存样本值及其(非标准化)概率。分支节点保存其子节点的总概率。

To sample, generate a uniform random number xbetween 0 and the total probability of the root, and descend the tree. At each branch, choose the left child if the left child has total probability <= x. Else subtract the left child's probability from xand go right. Return the leaf value you reach.

要采样,生成一个x介于 0 和根的总概率之间的均匀随机数,然后使树下降。在每个分支上,如果左孩子的总概率为 ,则选择左孩子<= x。否则减去左孩子的概率x并向右走。返回您达到的叶子值。

To exchange, remove the leaf from its tree and adjust the branches that lead down to it (decreasing their total probability, and cutting out any single-child branch nodes). Insert the leaf into the destination tree: you have a choice of where to put it, so keep it balanced. Picking a random child at each level is probably good enough -- that's where I'd start. Increase each parent node's probability, back up to the root.

要交换,请从其树中移除叶子并调整通向它的分支(降低它们的总概率,并删除任何单子分支节点)。将叶子插入目标树:您可以选择放置它的位置,因此请保持平衡。在每个级别随机挑选一个孩子可能已经足够了——这就是我要开始的地方。增加每个父节点的概率,回到根节点。

Now both sampling and exchange are O(log(n)) on average. (If you need guaranteed balance, a simple way is to add another field to the branch nodes holding the count of leaves in the whole subtree. When adding a leaf, at each level pick the child with fewer leaves. This leaves the possibility of a tree getting unbalanced solely by deletions; this can't be a problem if there's reasonably even traffic between the sets, but if it is, then choose rotations during deletion using the leaf-count information on each node in your traversal.)

现在采样和交换平均都是 O(log(n))。(如果您需要保证平衡,一个简单的方法是在保存整个子树中叶子数的分支节点上添加另一个字段。添加叶子时,在每个级别选择叶子较少的孩子。这留下了一个可能性树仅因删除而变得不平衡;如果集合之间的流量合理均匀,这不会成为问题,但如果是,则在删除期间使用遍历中每个节点的叶数信息选择旋转。)

Update:On request, here's a basic implementation. Haven't tuned it at all. Usage:

更新:根据要求,这里有一个基本实现。根本没调。用法:

>>> t1 = build_tree([('one', 20), ('two', 2), ('three', 50)])
>>> t1
Branch(Leaf(20, 'one'), Branch(Leaf(2, 'two'), Leaf(50, 'three')))
>>> t1.sample()
Leaf(50, 'three')
>>> t1.sample()
Leaf(20, 'one')
>>> t2 = build_tree([('four', 10), ('five', 30)])
>>> t1a, t2a = transfer(t1, t2)
>>> t1a
Branch(Leaf(20, 'one'), Leaf(2, 'two'))
>>> t2a
Branch(Leaf(10, 'four'), Branch(Leaf(30, 'five'), Leaf(50, 'three')))

Code:

代码:

import random

def build_tree(pairs):
    tree = Empty()
    for value, weight in pairs:
        tree = tree.add(Leaf(weight, value))
    return tree

def transfer(from_tree, to_tree):
    """Given a nonempty tree and a target, move a leaf from the former to
    the latter. Return the two updated trees."""
    leaf, from_tree1 = from_tree.extract()
    return from_tree1, to_tree.add(leaf)

class Tree:
    def add(self, leaf):
        "Return a new tree holding my leaves plus the given leaf."
        abstract
    def sample(self):
        "Pick one of my leaves at random in proportion to its weight."
        return self.sampling(random.uniform(0, self.weight))
    def extract(self):
        """Pick one of my leaves and return it along with a new tree
        holding my leaves minus that one leaf."""
        return self.extracting(random.uniform(0, self.weight))        

class Empty(Tree):
    weight = 0
    def __repr__(self):
        return 'Empty()'
    def add(self, leaf):
        return leaf
    def sampling(self, weight):
        raise Exception("You can't sample an empty tree")
    def extracting(self, weight):
        raise Exception("You can't extract from an empty tree")

class Leaf(Tree):
    def __init__(self, weight, value):
        self.weight = weight
        self.value = value
    def __repr__(self):
        return 'Leaf(%r, %r)' % (self.weight, self.value)
    def add(self, leaf):
        return Branch(self, leaf)
    def sampling(self, weight):
        return self
    def extracting(self, weight):
        return self, Empty()

def combine(left, right):
    if isinstance(left, Empty): return right
    if isinstance(right, Empty): return left
    return Branch(left, right)

class Branch(Tree):
    def __init__(self, left, right):
        self.weight = left.weight + right.weight
        self.left = left
        self.right = right
    def __repr__(self):
        return 'Branch(%r, %r)' % (self.left, self.right)
    def add(self, leaf):
        # Adding to a random branch as a clumsy way to keep an
        # approximately balanced tree.
        if random.random() < 0.5:
            return combine(self.left.add(leaf), self.right)
        return combine(self.left, self.right.add(leaf))
    def sampling(self, weight):
        if weight < self.left.weight:
            return self.left.sampling(weight)
        return self.right.sampling(weight - self.left.weight)
    def extracting(self, weight):
        if weight < self.left.weight:
            leaf, left1 = self.left.extracting(weight)
            return leaf, combine(left1, self.right)
        leaf, right1 = self.right.extracting(weight - self.left.weight)
        return leaf, combine(self.left, right1)

Update 2:In answering another problem, Jason Orendorff points out that the binary trees can be kept perfectly balanced by representing them in an array just like the classical heap structure. (This saves the space spent on pointers, too.) See my comments to that answer for how to adapt his code to this problem.

更新 2:回答另一个问题时,Jason Orendorff 指出二叉树可以通过将它们表示在一个数组中来保持完美平衡,就像经典的堆结构一样。(这也节省了在指针上花费的空间。)请参阅我对该答案的评论,了解如何使他的代码适应这个问题。

回答by Claudiu

Here is a classic way to do it, in pseudocode, where random.random() gives you a random float from 0 to 1.

这是一个经典的方法,在伪代码中,其中 random.random() 为您提供从 0 到 1 的随机浮点数。

let z = sum of all the convictions
let choice = random.random() * z 
iterate through your objects:
    choice = choice - the current object's conviction
    if choice <= 0, return this object
return the last object

For an example: imagine you have two objects, one with weight 2, another with weight 4. You generate a number from 0 to 6. If choiceis between 0 and 2, which will happen with 2/6 = 1/3 probability, then it will get subtracted by 2 and the first object is chosen. If choice is between 2 and 6, which will happen with 4/6 = 2/3 probability, then the first subtraction will still have choice being > 0, and the second subtraction will make the 2nd object get chosen.

例如:假设您有两个对象,一个权重为 2,另一个权重为 4。您生成一个从 0 到 6 的数字。如果choice介于 0 和 2 之间,这将以 2/6 = 1/3 的概率发生,则它将减去 2 并选择第一个对象。如果选择在 2 和 6 之间,这将以 4/6 = 2/3 的概率发生,那么第一次减法仍然有选择 > 0,第二次减法将使第二个对象被选中。

回答by Rex Logan

You want to give each object a weight. The bigger the weight the more likely it will happen. More precisely probx =weight/sum_all_weights.

你想给每个物体一个重量。权重越大,发生的可能性就越大。更准确地说,probx =weight/sum_all_weights。

Then generate a random number in the range 0 to sum_all_weights and map it to each object.

然后生成一个范围为 0 到 sum_all_weights 的随机数并将其映射到每个对象。

This code allows you to generate a random index and it is mapped when the object is created for speed. If all of your sets of objects have the same distribution then you can get by with only one RandomIndex object.

此代码允许您生成一个随机索引,并在创建对象时进行映射以提高速度。如果您的所有对象集都具有相同的分布,那么您只能使用一个 RandomIndex 对象。

import random

class RandomIndex:
    def __init__(self, wlist):
        self._wi=[]
        self._rsize=sum(wlist)-1
        self._m={}
        i=0
        s=wlist[i]
        for n in range(self._rsize+1):
            if n == s:
                i+=1
                s+=wlist[i]
            self._m[n]=i    

    def i(self):
        rn=random.randint(0,self._rsize)
        return self._m[rn]


sx=[1,2,3,4]


wx=[1,10,100,1000] #weight list
ri=RandomIndex(wx)

cnt=[0,0,0,0]

for i in range(1000):
    cnt[ri.i()] +=1  #keep track of number of times each index was generated

print(cnt)  

回答by David Raznick

I would use this recipe. You will need to add a weight to your objects, but that is just a simple ratio and put them in a list of tuples (object, conviction/(sum of convictions)). This should be easy to do using a list comprehension.

我会用这个食谱。您将需要为您的对象添加一个权重,但这只是一个简单的比率并将它们放入元组列表中(对象,信念/(信念的总和))。使用列表推导式应该很容易做到这一点。

回答by chaos

I suggest you port this PHP implementation of weighted randomto Python. In particular, the binary-search-based second algorithm helps address your speed concerns.

我建议您将此加权随机的 PHP 实现移植到 Python。特别是,基于二进制搜索的第二算法有助于解决您的速度问题。

回答by ali_m

About 3 years later...

大约3年后...

If you use numpy, perhaps the simplest option is to use np.random.choice, which takes a list of possible values, and an optional sequence of probabilities associated with each value:

如果您使用 numpy,也许最简单的选择是使用np.random.choice,它接受可能值的列表,以及与每个值关联的可选概率序列:

import numpy as np

values = ('A', 'B', 'C', 'D')
weights = (0.5, 0.1, 0.2, 0.2)

print ''.join(np.random.choice(values, size=60, replace=True, p=weights))
# ACCADAACCDACDBACCADCAAAAAAADACCDCAADDDADAAACCAAACBAAADCADABA

回答by denis

(A year later) Walker's alias method for random objects with different probablitiesis very fast and very simple

(一年后) Walker对于不同概率的随机对象的别名方法非常快,非常简单

回答by Aaron Maenpaa

The simplest thing to do is to use random.choice (which uses a uniform distribution) and vary the frequency of occurrence on the object in the source collection.

最简单的方法是使用 random.choice(使用均匀分布)并改变源集合中对象的出现频率。

>>> random.choice([1, 2, 3, 4])
4

... vs:

...对比:

>>> random.choice([1, 1, 1, 1, 2, 2, 2, 3, 3, 4])
2

So your objects could have a base occurrence rate (n) and between 1 and n objects are added to the source collection as a function of the conviction rate. This method is really simple; however, it can have significant overhead if the number of distinct objects is large or the conviction rate needs to be very fine grained.

因此,您的对象可能具有基本发生率 (n),并且将 1 到 n 个对象作为定罪率的函数添加到源集合中。这个方法真的很简单;但是,如果不同对象的数量很大或需要非常细粒度的定罪率,则它可能会产生大量开销。

Alternatively, if you generate more that one random number using a uniform distribution and sum them, numbers occurring near the mean are more probable that those occurring near the extremes (think of rolling two dice and the probability of getting 7 versus 12 or 2). You can then order the objects by conviction rate and generate a number using multiple die rolls which you use to calculate and index into the objects. Use numbers near the mean to index low conviction objects and numbers near the extremes to index high conviction items. You can vary the precise probability that a given object will be selected by changing the "number of sides" and number of your "dice" (it may be simpler to put the objects into buckets and use dice with a small number of sides rather than trying to associate each object with a specific result):

或者,如果您使用均匀分布生成多个随机数并将它们相加,则出现在均值附近的数字比出现在极端值附近的数字更有可能(想想掷两个骰子和得到 7 对 12 或 2 的概率)。然后,您可以按定罪率对对象进行排序,并使用用于计算和索引对象的多个掷骰子生成一个数字。使用接近均值的数字来索引低信念对象,使用接近极端的数字来索引高信念项目。您可以通过更改“边数”和“骰子”的数量来改变选择给定对象的精确概率(将对象放入桶中并使用边数较少的骰子可能更简单)尝试将每个对象与特定结果相关联):

>>> die = lambda sides : random.randint(1, sides)
>>> die(6)
3
>>> die(6) + die(6) + die(6)
10

回答by supercheetah

A very easy and simple way of doing this is to set weights for each of the values, and it wouldn't require much memory.

一种非常简单的方法是为每个值设置权重,并且不需要太多内存。

You could probably use a hash/dictionary to do this.

您可能可以使用哈希/字典来执行此操作。

What you'll want to do is to have the random number, x, multiplied and summed over the entire set of things you want selected, and divide that result over the number of objects in your set.

您要做的是将随机数x乘以您想要选择的整个集合的总和,然后将该结果除以集合中的对象数量。

Pseudo-code:

伪代码:

objectSet = [(object1, weight1), ..., (objectN, weightN)]
sum = 0
rand = random()
for obj, weight in objectSet
    sum = sum+weight*rand
choice = objectSet[floor(sum/objectSet.size())]

EDIT: I just thought of how slow my code would be with very large sets (it's O(n)). The following pseudo-code is O(log(n)), and is basically using a binary search.

编辑:我只是想到我的代码在非常大的集合(它是 O(n))下会有多慢。下面的伪代码是O(log(n)),基本上是使用二分查找。

objectSet = [(object1, weight1), ..., (objectN, weightN)]
sort objectSet from less to greater according to weights
choice = random() * N # where N is the number of objects in objectSet
do a binary search until you have just one answer

There are implementations of binary search in Python all over the 'net, so no need repeating here.

Python 中二分查找的实现遍及网络,这里不再赘述。