随机 Python 字典键，按值加权

Question

提问by hoju

I have a dictionary where each key has a list of variable length, eg:

我有一个字典，其中每个键都有一个可变长度的列表，例如：

d = {
 'a': [1, 3, 2],
 'b': [6],
 'c': [0, 0]
}

Is there a clean way to get a random dictionary key, weighted by the length of its value? random.choice(d.keys())will weight the keys equally, but in the case above I want 'a'to be returned roughly half the time.

有没有一种干净的方法来获取随机字典键，按其值的长度加权？ random.choice(d.keys())将平均分配键的权重，但在上述情况下，我希望'a'大约有一半的时间返回。

Answer 1

采纳答案by sth

This would work:

这会起作用：

random.choice([k for k in d for x in d[k]])

Answer 2

回答by James Thompson

Do you always know the total number of values in the dictionary? If so, this might be easy to do with the following algorithm, which can be used whenever you want to make a probabilistic selection of some items from an ordered list:

你总是知道字典中值的总数吗？如果是这样，使用以下算法可能很容易做到，只要您想从有序列表中对某些项目进行概率选择，就可以使用该算法：

Iterate over your list of keys.
Generate a uniformly distributed random value between 0 and 1 (aka "roll the dice").
Assuming that this key has N_VALS values associated with it and there are TOTAL_VALS total values in the entire dictionary, accept this key with a probability N_VALS / N_REMAINING, where N_REMAINING is the number of items left in the list.

迭代您的键列表。
在 0 和 1 之间生成一个均匀分布的随机值（又名“掷骰子”）。
假设这个键有 N_VALS 个与之关联的值，并且整个字典中有 TOTAL_VALS 个总值，接受这个键的概率为 N_VALS / N_REMAINING，其中 N_REMAINING 是列表中剩余的项目数。

This algorithm has the advantage of not having to generate any new lists, which is important if your dictionary is large. Your program is only paying for the loop over K keys to calculate the total, a another loop over the keys which will on average end halfway through, and whatever it costs to generate a random number between 0 and 1. Generating such a random number is a very common application in programming, so most languages have a fast implementation of such a function. In Python the random number generatora C implementation of the Mersenne Twister algorithm, which should be very fast. Additionally, the documentation claims that this implementation is thread-safe.

该算法的优点是不必生成任何新列表，如果您的字典很大，这一点很重要。您的程序只需支付 K 个密钥的循环以计算总数，另一个密钥循环平均会在中途结束，以及生成 0 和 1 之间的随机数所需的任何费用。生成这样的随机数是编程中非常常见的应用程序，因此大多数语言都可以快速实现此类功能。在 Python 中，随机数生成器是Mersenne Twister 算法的 C 实现，它应该非常快。此外，文档声称此实现是线程安全的。

Here's the code. I'm sure that you can clean it up if you'd like to use more Pythonic features:

这是代码。如果您想使用更多 Pythonic 功能，我相信您可以将其清理干净：

#!/usr/bin/python

import random

def select_weighted( d ):
   # calculate total
   total = 0
   for key in d:
      total = total + len(d[key])
   accept_prob = float( 1.0 / total )

   # pick a weighted value from d
   n_seen = 0
   for key in d:
      current_key = key
      for val in d[key]:
         dice_roll = random.random()
         accept_prob = float( 1.0 / ( total - n_seen ) )
         n_seen = n_seen + 1
         if dice_roll <= accept_prob:
            return current_key

dict = {
   'a': [1, 3, 2],
   'b': [6],
   'c': [0, 0]
}

counts = {}
for key in dict:
   counts[key] = 0

for s in range(1,100000):
   k = select_weighted(dict)
   counts[k] = counts[k] + 1

print counts

After running this 100 times, I get select keys this number of times:

运行 100 次后，我得到了多次选择键：

{'a': 49801, 'c': 33548, 'b': 16650}

Those are fairly close to your expected values of:

这些非常接近您的预期值：

{'a': 0.5, 'c': 0.33333333333333331, 'b': 0.16666666666666666}

Edit: Miles pointed out a serious error in my original implementation, which has since been corrected. Sorry about that!

编辑：Miles 指出了我最初实现中的一个严重错误，此后已更正。对于那个很抱歉！

Answer 3

回答by sth

Without constructing a new, possibly big list with repeated values:

无需构建具有重复值的新的、可能很大的列表：

def select_weighted(d):
   offset = random.randint(0, sum(d.itervalues())-1)
   for k, v in d.iteritems():
      if offset < v:
         return k
      offset -= v

Answer 4

回答by A. Coady

Given that your dict fits in memory, the random.choice method should be reasonable. But assuming otherwise, the next technique is to use a list of increasing weights, and use bisect to find a randomly chosen weight.

鉴于您的 dict 适合内存， random.choice 方法应该是合理的。但假设不是这样，下一个技术是使用增加权重的列表，并使用 bisect 来找到随机选择的权重。

>>> import random, bisect
>>> items, total = [], 0
>>> for key, value in d.items():
        total += len(value)
        items.append((total, key))


>>> items[bisect.bisect_left(items, (random.randint(1, total),))][1]
'a'
>>> items[bisect.bisect_left(items, (random.randint(1, total),))][1]
'c'

Answer 5

回答by David Seiler

Make a list in which each key is repeated a number of times equal to the length of its value. In your example: ['a', 'a', 'a', 'b', 'c', 'c']. Then use random.choice().

制作一个列表，其中每个键重复的次数等于其值的长度。在您的例子：['a', 'a', 'a', 'b', 'c', 'c']。然后使用random.choice().

Edit: or, less elegantly but more efficiently, try this: take the sum of the lengths of all values in the dictionary, S(you can cache and invalidate this value, or keep it up to date as you edit the dictionary, depending on the exact usage pattern you anticipate). Generate a random number from 0 to S, and do a linear search through the dictionary keys to find the range into which your random number falls.

编辑：或者，不太优雅但更有效，试试这个：取字典中所有值的长度总和，S（你可以缓存这个值并使这个值无效，或者在你编辑字典时保持最新，这取决于您预期的确切使用模式）。生成一个从 0 到 S 的随机数，并通过字典键进行线性搜索，以找到您的随机数落入的范围。

I think that's the best you can do without changing or adding to your data representation.

我认为这是最好的方法，而无需更改或添加数据表示。

Answer 6

回答by Rex Logan

Here is some code that is based on a previous answer I gave for probability distribution in pythonbut is using the length to set the weight. It uses an iterative markov chain so that it does not need to know what the total of all of the weights are. Currently it calculates the max length but if that is too slow just change

这是一些基于我在 python 中为概率分布给出的先前答案的代码，但使用长度来设置权重。它使用迭代马尔可夫链，因此不需要知道所有权重的总和。目前它计算最大长度但如果太慢就改变

  self._maxw = 1

to

到

  self._maxw = max lenght

and remove

并删除

for k in self._odata:
     if len(self._odata[k])> self._maxw:
          self._maxw=len(self._odata[k])

Here is the code.

这是代码。

import random


class RandomDict:
    """
    The weight is the length of each object in the dict.
    """

    def __init__(self,odict,n=0):
        self._odata = odict
        self._keys = list(odict.keys())
        self._maxw = 1  # to increase speed set me to max length
        self._len=len(odict)
        if n==0:
            self._n=self._len
        else:
            self._n=n
        # to increase speed set above max value and comment out next 3 lines
        for k in self._odata:
            if len(self._odata[k])> self._maxw:
                self._maxw=len(self._odata[k])


    def __iter__(self):
        return self.next()

    def next(self):
        while (self._len > 0) and (self._n>0):
            self._n -= 1
            for i in range(100):
                k=random.choice(self._keys)
                rx=random.uniform(0,self._maxw)
                if rx <= len(self._odata[k]): # test to see if that is the value we want
                    break
            # if you do not find one after 100 tries then just get a random one
            yield k

    def GetRdnKey(self):
        for i in range(100):
            k=random.choice(self._keys)
            rx=random.uniform(0,self._maxw)

            if rx <= len(self._odata[k]): # test to see if that is the value we want
                break
        # if you do not find one after 100 tries then just get a random one
        return k



#test code

d = {
 'a': [1, 3, 2],
 'b': [6],
 'c': [0, 0]
}


rd=RandomDict(d)

dc = {
 'a': 0,
 'b': 0,
 'c': 0
}
for i in range(100000):
    k=rd.GetRdnKey()
    dc[k]+=1

print("Key count=",dc)



#iterate over the objects

dc = {
 'a': 0,
 'b': 0,
 'c': 0
}

for k in RandomDict(d,100000):
    dc[k]+=1

print("Key count=",dc)

Test results

检测结果

Key count= {'a': 50181, 'c': 33363, 'b': 16456}
Key count= {'a': 50080, 'c': 33411, 'b': 16509}

Answer 7

回答by hughdbrown

I'd say this:

我会这样说：

random.choice("".join([k * len(d[k]) for k in d]))

This makes it clear that each k in d gets as many chances as the length of its value. Of course, it is relying on dictionary keys of length 1 that are characters....

这清楚地表明 d 中的每个 k 获得的机会与其值的长度一样多。当然，它依赖于长度为 1 的字符字典键......

Much later:

很久以后：

table = "".join([key * len(value) for key, value in d.iteritems()])
random.choice(table)

Answer 8

回答by Gattster

I modified some of the other answers to come up with this. It's a bit more configurable. It takes 2 arguments, a list and a lambda function to tell it how to generate a key.

我修改了其他一些答案来提出这个问题。它的可配置性更高一些。它需要 2 个参数、一个列表和一个 lambda 函数来告诉它如何生成密钥。

def select_weighted(lst, weight):
   """ Usage: select_weighted([0,1,10], weight=lambda x: x) """
   thesum = sum([weight(x) for x in lst])
   if thesum == 0:
      return random.choice(lst)
   offset = random.randint(0, thesum - 1)

   for k in lst:
      v = weight(k)
      if offset < v:
         return k
      offset -= v

Thanks to sth for the base code for this.

感谢 sth 为此提供基本代码。

Answer 9

回答by bcosta12

import numpy as np

my_dict = {
  "one": 5,
  "two": 1,
  "three": 25,
  "four": 14
}

probs = []

elements = [my_dict[x] for x in my_dict.keys()]
total = sum(elements)
probs[:] = [x / total for x in elements]
r = np.random.choice(len(my_dict), p=probs)

print(list(my_dict.values())[r])
# 25

随机 Python 字典键，按值加权

提问by hoju

采纳答案by sth

回答by James Thompson

回答by sth

回答by A. Coady

回答by David Seiler

回答by Rex Logan

回答by hughdbrown

回答by Gattster

回答by bcosta12

相关推荐

最近更新

标签

随机 Python 字典键，按值加权

提问by hoju

采纳答案by sth

回答by James Thompson

回答by sth

回答by A. Coady

回答by David Seiler

回答by Rex Logan

回答by hughdbrown

回答by Gattster

回答by bcosta12

相关推荐

Python 中的继承有什么意义？

浏览器自动化：Python + Firefox 使用 PyXPCOM

在 Python 中与另一个命令行程序交互

python 可以用正则表达式匹配字符重复吗？如何？

相关推荐

最近更新

标签