C++ 加权随机数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1761626/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 21:06:25  来源:igfitidea点击:

Weighted random numbers

c++boostrandom

提问by nhaa123

I'm trying to implement a weighted random numbers. I'm currently just banging my head against the wall and cannot figure this out.

我正在尝试实现加权随机数。我目前只是把头撞在墙上,无法弄清楚这一点。

In my project (Hold'em hand-ranges, subjective all-in equity analysis), I'm using Boost's random -functions. So, let's say I want to pick a random number between 1 and 3 (so either 1, 2 or 3). Boost's mersenne twister generator works like a charm for this. However, I want the pick to be weighted for example like this:

在我的项目(Hold'em 手牌范围,主观全押赢率分析)中,我使用了 Boost 的随机函数。所以,假设我想在 1 和 3 之间选择一个随机数(所以是 1、2 或 3)。Boost 的梅森捻线器发生器对此很有用。但是,我希望选择像这样加权:

1 (weight: 90)
2 (weight: 56)
3 (weight:  4)

Does Boost have some sort of functionality for this?

Boost 是否具有某种功能?

回答by Will

There is a straightforward algorithm for picking an item at random, where items have individual weights:

有一个简单的算法可以随机选择一个项目,其中项目具有单独的权重:

1) calculate the sum of all the weights

1) 计算所有权重的总和

2) pick a random number that is 0 or greater and is less than the sum of the weights

2) 选择一个大于等于 0 且小于权重之和的随机数

3) go through the items one at a time, subtracting their weight from your random number, until you get the item where the random number is less than that item's weight

3)一次检查一件物品,从你的随机数中减去它们的重量,直到你得到随机数小于该物品重量的物品

Pseudo-code illustrating this:

说明这一点的伪代码:

int sum_of_weight = 0;
for(int i=0; i<num_choices; i++) {
   sum_of_weight += choice_weight[i];
}
int rnd = random(sum_of_weight);
for(int i=0; i<num_choices; i++) {
  if(rnd < choice_weight[i])
    return i;
  rnd -= choice_weight[i];
}
assert(!"should never get here");

This should be straightforward to adapt to your boost containers and such.

这应该很容易适应您的 boost 容器等。



If your weights are rarely changed but you often pick one at random, and as long as your container is storing pointers to the objects or is more than a few dozen items long (basically, you have to profile to know if this helps or hinders), then there is an optimisation:

如果您的权重很少改变,但您经常随机选择一个,并且只要您的容器存储指向对象的指针或超过几十个项目的长度(基本上,您必须分析以了解这是否有帮助或阻碍) ,然后有一个优化:

By storing the cumulative weight sum in each item you can use a binary searchto pick the item corresponding to the pick weight.

通过在每个项目中存储累积权重总和,您可以使用二分搜索来挑选与挑选权重相对应的项目。



If you do not know the number of items in the list, then there's a very neat algorithm called reservtheitroad samplingthat can be adapted to be weighted.

如果您不知道列表中的项目数,那么有一个非常简洁的算法,称为水库采样,可以进行加权调整。

回答by Howard Hinnant

Updated answer to an old question. You can easily do this in C++11 with just the std::lib:

更新了一个旧问题的答案。您可以在 C++11 中使用 std::lib 轻松完成此操作:

#include <iostream>
#include <random>
#include <iterator>
#include <ctime>
#include <type_traits>
#include <cassert>

int main()
{
    // Set up distribution
    double interval[] = {1,   2,   3,   4};
    double weights[] =  {  .90, .56, .04};
    std::piecewise_constant_distribution<> dist(std::begin(interval),
                                                std::end(interval),
                                                std::begin(weights));
    // Choose generator
    std::mt19937 gen(std::time(0));  // seed as wanted
    // Demonstrate with N randomly generated numbers
    const unsigned N = 1000000;
    // Collect number of times each random number is generated
    double avg[std::extent<decltype(weights)>::value] = {0};
    for (unsigned i = 0; i < N; ++i)
    {
        // Generate random number using gen, distributed according to dist
        unsigned r = static_cast<unsigned>(dist(gen));
        // Sanity check
        assert(interval[0] <= r && r <= *(std::end(interval)-2));
        // Save r for statistical test of distribution
        avg[r - 1]++;
    }
    // Compute averages for distribution
    for (double* i = std::begin(avg); i < std::end(avg); ++i)
        *i /= N;
    // Display distribution
    for (unsigned i = 1; i <= std::extent<decltype(avg)>::value; ++i)
        std::cout << "avg[" << i << "] = " << avg[i-1] << '\n';
}

Output on my system:

我的系统上的输出:

avg[1] = 0.600115
avg[2] = 0.373341
avg[3] = 0.026544

Note that most of the code above is devoted to just displaying and analyzing the output. The actual generation is just a few lines of code. The output demonstrates that the requested "probabilities" have been obtained. You have to divide the requested output by 1.5 since that is what the requests add up to.

请注意,上面的大部分代码仅用于显示和分析输出。实际生成只是几行代码。输出表明已获得请求的“概率”。您必须将请求的输出除以 1.5,因为这是请求的总和。

回答by mmdanziger

If your weights change more slowly than they are drawn, C++11 discrete_distributionis going to be the easiest:

如果权重的变化比绘制的慢,C++11discrete_distribution将是最简单的:

#include <random>
#include <vector>
std::vector<double> weights{90,56,4};
std::discrete_distribution<int> dist(std::begin(weights), std::end(weights));
std::mt19937 gen;
gen.seed(time(0));//if you want different results from different runs
int N = 100000;
std::vector<int> samples(N);
for(auto & i: samples)
    i = dist(gen);
//do something with your samples...

Note, however, that the c++11 discrete_distributioncomputes all of the cumulative sums on initialization. Usually, you want that because it speeds up the sampling time for a one time O(N) cost. But for a rapidly changing distribution it will incur a heavy calculation (and memory) cost. For instance if the weights represented how many items there are and every time you draw one, you remove it, you will probably want a custom algorithm.

但是请注意,c++11discrete_distribution在初始化时计算所有累积和。通常,您希望这样做是因为它以一次性 O(N) 成本加快了采样时间。但是对于快速变化的分布,它会产生大量的计算(和内存)成本。例如,如果权重表示有多少个项目,并且每次绘制一个项目时,将其删除,您可能需要一个自定义算法。

Will's answer https://stackoverflow.com/a/1761646/837451avoids this overhead but will be slower to draw from than the C++11 because it can't use binary search.

Will 的回答https://stackoverflow.com/a/1761646/837451避免了这种开销,但比 C++11 更慢,因为它不能使用二进制搜索。

To see that it does this, you can see the relevant lines (/usr/include/c++/5/bits/random.tccon my Ubuntu 16.04 + GCC 5.3 install):

要查看它是否执行此操作,您可以查看相关行(/usr/include/c++/5/bits/random.tcc在我的 Ubuntu 16.04 + GCC 5.3 安装中):

  template<typename _IntType>
    void
    discrete_distribution<_IntType>::param_type::
    _M_initialize()
    {
      if (_M_prob.size() < 2)
        {
          _M_prob.clear();
          return;
        }

      const double __sum = std::accumulate(_M_prob.begin(),
                                           _M_prob.end(), 0.0);
      // Now normalize the probabilites.
      __detail::__normalize(_M_prob.begin(), _M_prob.end(), _M_prob.begin(),
                            __sum);
      // Accumulate partial sums.
      _M_cp.reserve(_M_prob.size());
      std::partial_sum(_M_prob.begin(), _M_prob.end(),
                       std::back_inserter(_M_cp));
      // Make sure the last cumulative probability is one.
      _M_cp[_M_cp.size() - 1] = 1.0;
    }

回答by Chirry

What I do when I need to weight numbers is using a random number for the weight.

当我需要对数字进行加权时,我所做的是使用随机数作为权重。

For example: I need that generate random numbers from 1 to 3 with the following weights:

例如:我需要生成从 1 到 3 的具有以下权重的随机数:

  • 10% of a random number could be 1
  • 30% of a random number could be 2
  • 60% of a random number could be 3
  • 随机数的 10% 可能是 1
  • 随机数的 30% 可能是 2
  • 随机数的 60% 可能是 3

Then I use:

然后我使用:

weight = rand() % 10;

switch( weight ) {

    case 0:
        randomNumber = 1;
        break;
    case 1:
    case 2:
    case 3:
        randomNumber = 2;
        break;
    case 4:
    case 5:
    case 6:
    case 7:
    case 8:
    case 9:
        randomNumber = 3;
        break;
}

With this, randomly it has 10% of the probabilities to be 1, 30% to be 2 and 60% to be 3.

有了这个,随机它有 10% 的概率是 1,30% 是 2,60% 是 3。

You can play with it as your needs.

您可以根据需要使用它。

Hope I could help you, Good Luck!

希望能帮到你,祝你好运!

回答by Martin York

Build a bag (or std::vector) of all the items that can be picked.
Make sure that the number of each items is proportional to your weighting.

构建一个包含所有可以选择的项目的包(或 std::vector)。
确保每个项目的数量与您的权重成正比。

Example:

例子:

  • 1 60%
  • 2 35%
  • 3 5%
  • 1 60%
  • 2 35%
  • 3 5%

So have a bag with 100 items with 60 1's, 35 2's and 5 3's.
Now randomly sort the bag (std::random_shuffle)

因此,有一个装有 100 个物品的包,其中有 60 个 1、35 个 2 和 5 个 3。
现在随机排序包(std::random_shuffle)

Pick elements from the bag sequentially until it is empty.
Once empty re-randomize the bag and start again.

依次从袋子中挑选元素,直到它为空。
一旦空了,重新随机化袋子并重新开始。

回答by Jonathan Graehl

Choose a random number on [0,1), which should be the default operator() for a boost RNG. Choose the item with cumulative probability density function >= that number:

在 [0,1) 上选择一个随机数,这应该是 boost RNG 的默认 operator()。选择累积概率密度函数>=那个数字的项目:

template <class It,class P>
It choose_p(It begin,It end,P const& p)
{
    if (begin==end) return end;
    double sum=0.;
    for (It i=begin;i!=end;++i)
        sum+=p(*i);
    double choice=sum*random01();
    for (It i=begin;;) {
        choice -= p(*i);
        It r=i;
        ++i;
        if (choice<0 || i==end) return r;
    }
    return begin; //unreachable
}

Where random01() returns a double >=0 and <1. Note that the above doesn't require the probabilities to sum to 1; it normalizes them for you.

其中 random01() 返回双精度 >=0 和 <1。请注意,以上并不要求概率之和为 1;它会为您规范化它们。

p is just a function assigning a probability to an item in the collection [begin,end). You can omit it (or use an identity) if you just have a sequence of probabilities.

p 只是为集合 [begin,end) 中的项目分配概率的函数。如果您只有一系列概率,则可以省略它(或使用身份)。

回答by Leonid Ganeline

I've implemented several simple weighted random algorithms.

我已经实现了几个简单的加权随机算法