Java 带有概率的随机数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20327958/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 01:06:31  来源:igfitidea点击:

Random number with Probabilities

javarandomprobability

提问by marc wellman

I am wondering what would be the best way (e.g. in Java) to generate random numbers within a particular range where each number has a certain probability to occur or not?

我想知道在特定范围内生成随机数的最佳方法是什么(例如在 Java 中),其中每个数字都有一定的概率发生与否?

e.g.

例如

Generate random integers from within [1;3] with the following probabilities:

从 [1;3] 内生成随机整数,概率如下:

P(1) = 0.2
P(2) = 0.3
P(3) = 0.5

P(1) = 0.2
P(2) = 0.3
P(3) = 0.5



Right now I am considering the approach to generate a random integer within [0;100] and do the following:

现在我正在考虑在 [0;100] 内生成随机整数并执行以下操作的方法:

If it is within [0;20] --> I got my random number 1.
If it is within [21;50] --> I got my random number 2.
If it is within [51;100] --> I got my random number 3.

What would you say?

如果它在 [0;20] 之内 --> 我得到了我的随机数 1.
如果它在 [21;50] 之内 --> 我得到了我的随机数 2.
如果它在 [51;100] 之内 -->我得到了我的随机数 3。

你会怎么说?

采纳答案by usr2564301

Yours is a pretty good way already and works well with any range.

你的已经是一个很好的方法,适用于任何范围。

Just thinking: another possibility is to get rid of the fractions by multiplying with a constant multiplier, and then build an array with the sizeof this multiplier. Multiplying by 10 you get

只是想:另一种可能性是通过乘以一个常数乘数来去除分数,然后构建一个具有这个乘数大小的数组。乘以 10 你得到

P(1) = 2
P(2) = 3
P(3) = 5

Then you create an array with the inverse values -- '1' goes into elements 1 and 2, '2' into 3 to 6, and so on:

然后创建一个具有相反值的数组——“1”进入元素 1 和 2,“2”进入 3 到 6,依此类推:

P = (1,1, 2,2,2, 3,3,3,3,3);

P = (1,1, 2,2,2, 3,3,3,3,3);

and then you can pick a random element from this array instead.

然后你可以从这个数组中选择一个随机元素。



(Add.) Using the probabilities from the example in kiruwka's comment:

(添加。)使用 kiruwka 评论中示例中的概率:

int[] numsToGenerate           = new int[]    { 1,   2,    3,   4,    5   };
double[] discreteProbabilities = new double[] { 0.1, 0.25, 0.3, 0.25, 0.1 };

the smallest multiplier that leads to all-integers is 20, which gives you

导致全整数的最小乘数是 20,它给你

2, 5, 6, 5, 2

and so the length of numsToGeneratewould be 20, with the following values:

因此长度为numsToGenerate20,具有以下值:

1 1
2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4
5 5

The distribution is exactlythe same: the chance of '1', for example, is now 2 out of 20 -- still 0.1.

分布完全相同:例如,“1”的几率现在是 20 分之 2 —— 仍然是 0.1。

This is based on your original probabilities all adding up to 1. If they do not, multiply the total by this same factor (which is then going to be your array length as well).

这是基于您的原始概率加起来为 1。如果不是,请将总数乘以相同的因子(这也将是您的数组长度)。

回答by TwoThe

You already wrote the implementation in your question. ;)

您已经在问题中编写了实现。;)

final int ran = myRandom.nextInt(100);
if (ran > 50) { return 3; }
else if (ran > 20) { return 2; } 
else { return 1; }

You can speed this up for more complex implementations by per-calculating the result on a switch table like this:

对于更复杂的实现,您可以通过在像这样的开关表上计算结果来加快速度:

t[0] = 1; t[1] = 1; // ... one for each possible result
return t[ran];

But this should only be used if this is a performance bottleneck and called several hundred times per second.

但只有在这是性能瓶颈并且每秒调用数百次时才应该使用它。

回答by chro

If you have performance issue instead of searching all the n values O(n)

如果您有性能问题而不是搜索所有 n 个值 O(n)

you could perform binary search which costs O(log n)

您可以执行二进制搜索,其成本为 O(log n)

Random r=new Random();      
double[] weights=new double[]{0.1,0.1+0.2,0.1+0.2+0.5};
// end of init
double random=r.nextDouble();
// next perform the binary search in weights array

you only need to access log2(weights.length) in average if you have a lot of weights elements.

如果你有很多权重元素,你只需要平均访问 log2(weights.length) 。

回答by trylimits

Some time ago I wrote a helper class to solve this issue. The source code should show the concept clear enough:

前段时间写了一个辅助类来解决这个问题。源代码应该足够清楚地显示这个概念:

public class DistributedRandomNumberGenerator {

    private Map<Integer, Double> distribution;
    private double distSum;

    public DistributedRandomNumberGenerator() {
        distribution = new HashMap<>();
    }

    public void addNumber(int value, double distribution) {
        if (this.distribution.get(value) != null) {
            distSum -= this.distribution.get(value);
        }
        this.distribution.put(value, distribution);
        distSum += distribution;
    }

    public int getDistributedRandomNumber() {
        double rand = Math.random();
        double ratio = 1.0f / distSum;
        double tempDist = 0;
        for (Integer i : distribution.keySet()) {
            tempDist += distribution.get(i);
            if (rand / ratio <= tempDist) {
                return i;
            }
        }
        return 0;
    }

}

The usage of the class is as follows:

类的用法如下:

DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.3d); // Adds the numerical value 1 with a probability of 0.3 (30%)
// [...] Add more values

int random = drng.getDistributedRandomNumber(); // Generate a random number

Test driver to verify functionality:

测试驱动程序以验证功能:

    public static void main(String[] args) {
        DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
        drng.addNumber(1, 0.2d);
        drng.addNumber(2, 0.3d);
        drng.addNumber(3, 0.5d);

        int testCount = 1000000;

        HashMap<Integer, Double> test = new HashMap<>();

        for (int i = 0; i < testCount; i++) {
            int random = drng.getDistributedRandomNumber();
            test.put(random, (test.get(random) == null) ? (1d / testCount) : test.get(random) + 1d / testCount);
        }

        System.out.println(test.toString());
    }

Sample output for this test driver:

此测试驱动程序的示例输出:

{1=0.20019100000017953, 2=0.2999349999988933, 3=0.4998739999935438}

回答by pjs

Your approach is fine for the specific numbers you picked, although you could reduce storage by using an array of 10 instead of an array of 100. However, this approach doesn't generalize well to large numbers of outcomes or outcomes with probabilities such as 1/eor 1/PI.

你的做法是为您挑选的具体数字精细,虽然你可以使用的,而不是100的阵列10的阵列降低存储然而,这种做法不符合概率,如推广很好地大量结果或结果1/e1/PI.

A potentially better solution is to use an alias table. The alias method takes O(n)work to set up the table for noutcomes, but then is constant time to generate regardless of how many outcomes there are.

一个可能更好的解决方案是使用别名表。别名方法需要O(n)工作来设置n结果表,但是无论有多少结果,生成时间都是恒定的。

回答by E.R.Tan

Written this class for interview after referencing the paper pointed by pjs in another post, the population of base64 table can be further optimized. The result is amazingly fast, initialization is slightly expensive, but if the probabilities are not changing often, this is a good approach.

写这类面试引用在另一PJS指出纸后,BASE64表的人可进一步优化。结果出奇的快,初始化有点贵,但如果概率不经常变化,这是一个很好的方法。

*For duplicate key, the last probability is taken instead of being combined (slightly different from EnumeratedIntegerDistribution behaviour)

*对于重复键,取最后一个概率而不是组合(与 EnumeratedIntegerDistribution 行为略有不同)

public class RandomGen5 extends BaseRandomGen {

    private int[] t_array = new int[4];
    private int sumOfNumerator;
    private final static int DENOM = (int) Math.pow(2, 24);
    private static final int[] bitCount = new int[] {18, 12, 6, 0};
    private static final int[] cumPow64 = new int[] {
            (int) ( Math.pow( 64, 3 ) + Math.pow( 64, 2 ) + Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
            (int) ( Math.pow( 64, 2 ) + Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
            (int) ( Math.pow( 64, 1 ) + Math.pow( 64, 0 ) ),
            (int) ( Math.pow( 64, 0 ) )
    };


    ArrayList[] base64Table = {new ArrayList<Integer>()
            , new ArrayList<Integer>()
            , new ArrayList<Integer>()
            , new ArrayList<Integer>()};

    public int nextNum() {
        int rand = (int) (randGen.nextFloat() * sumOfNumerator);

        for ( int x = 0 ; x < 4 ; x ++ ) {
                if (rand < t_array[x])
                    return x == 0 ? (int) base64Table[x].get(rand >> bitCount[x])
                            : (int) base64Table[x].get( ( rand - t_array[x-1] ) >> bitCount[x]) ;
        }
        return 0;
    }

    public void setIntProbList( int[] intList, float[] probList ) {
        Map<Integer, Float> map = normalizeMap( intList, probList );
        populateBase64Table( map );
    }

    private void clearBase64Table() {
        for ( int x = 0 ; x < 4 ; x++ ) {
            base64Table[x].clear();
        }
    }

    private void populateBase64Table( Map<Integer, Float> intProbMap ) {
        int startPow, decodedFreq, table_index;
        float rem;

        clearBase64Table();

        for ( Map.Entry<Integer, Float> numObj : intProbMap.entrySet() ) {
            rem = numObj.getValue();
            table_index = 3;
            for ( int x = 0 ; x < 4 ; x++ ) {
                decodedFreq = (int) (rem % 64);
                rem /= 64;
                for ( int y = 0 ; y < decodedFreq ; y ++ ) {
                    base64Table[table_index].add( numObj.getKey() );
                }
                table_index--;
            }
        }

        startPow = 3;
        for ( int x = 0 ; x < 4 ; x++ ) {
            t_array[x] = x == 0 ? (int) ( Math.pow( 64, startPow-- ) * base64Table[x].size() )
                    : ( (int) ( ( Math.pow( 64, startPow-- ) * base64Table[x].size() ) + t_array[x-1] ) );
        }

    }

    private Map<Integer, Float> normalizeMap( int[] intList, float[] probList ) {
        Map<Integer, Float> tmpMap = new HashMap<>();
        Float mappedFloat;
        int numerator;
        float normalizedProb, distSum = 0;

        //Remove duplicates, and calculate the sum of non-repeated keys
        for ( int x = 0 ; x < probList.length ; x++ ) {
            mappedFloat = tmpMap.get( intList[x] );
            if ( mappedFloat != null ) {
                distSum -= mappedFloat;
            } else {
                distSum += probList[x];
            }
            tmpMap.put( intList[x], probList[x] );
        }

        //Normalise the map to key -> corresponding numerator by multiplying with 2^24
        sumOfNumerator = 0;
        for ( Map.Entry<Integer, Float> intProb : tmpMap.entrySet() ) {
            normalizedProb = intProb.getValue() / distSum;
            numerator = (int) ( normalizedProb * DENOM );
            intProb.setValue( (float) numerator );
            sumOfNumerator += numerator;
        }

        return tmpMap;
    }
}

回答by Giulio Pilotto

Try this: In this example i use an array of chars, but you can substitute it with your integer array.

试试这个:在这个例子中,我使用了一个字符数组,但你可以用你的整数数组替换它。

Weight list contains for each char the associated probability. It represent the probability distribution of my charset.

权重列表包含每个字符的相关概率。它代表我的字符集的概率分布。

In weightsum list for each char i stored his actual probability plus the sum of any antecedent probability.

在每个字符的权重列表中,我存储了他的实际概率加上任何先行概率的总和。

For example in weightsum the third element corresponding to 'C', is 65:
P('A') + P('B) + P('C') = P(X=>c)
10 + 20 + 25 = 65

例如,在 weightsum 中,对应于 'C' 的第三个元素是 65:
P('A') + P('B) + P('C') = P(X=>c)
10 + 20 + 25 = 65

So weightsum represent the cumulative distribution of my charset. weightsum contains the following values:

所以 weightsum 代表我的字符集的累积分布。weightsum 包含以下值:

It's easy to see that the 8th element correspondig to H, have a larger gap (80 of course like his probability) then is more like to happen!

很容易看出,对应于H的第8个元素,有更大的差距(80当然像他的概率)那么更可能发生!

        List<Character> charset =   Arrays.asList('A','B','C','D','E','F','G','H','I','J');
        List<Integer> weight = Arrays.asList(10,30,25,60,20,70,10,80,20,30);
        List<Integer>  weightsum = new ArrayList<>();

        int i=0,j=0,k=0;
        Random Rnd = new Random();

        weightsum.add(weight.get(0));

        for (i = 1; i < 10; i++)
            weightsum.add(weightsum.get(i-1) + weight.get(i));

Then i use a cycle to get 30 random char extractions from charset,each one drawned accordingly to the cumulative probability.

然后我使用一个循环从字符集中获取 30 个随机字符提取,每个提取相应于累积概率。

In k i stored a random number from 0 to the max value allocated in weightsum. Then i look up in weightsum for a number grather than k, the position of the number in weightsum correspond to the same position of the char in charset.

在 ki 中存储了一个从 0 到分配在 weightsum 中的最大值的随机数。然后我在 weightsum 中查找大于 k 的数字,weightsum 中数字的位置对应于 charset 中 char 的相同位置。

   for (j = 0; j < 30; j++)
   {
   Random r = new Random();
   k =   r.nextInt(weightsum.get(weightsum.size()-1));

   for (i = 0; k > weightsum.get(i); i++) ;
   System.out.print(charset.get(i));
   }

The code give out that sequence of char:

代码给出了字符序列:

HHFAIIDFBDDDHFICJHACCDFJBGBHHB

HHFAIIDFBDDDHFICJHACCDFJBGBHHB

Let's do the math!

让我们来算一算吧!

A = 2
B = 4
C = 3
D = 5
E = 0
F = 4
G = 1
H = 6
I = 3
J = 2

A = 2
B = 4
C = 3
D = 5
E = 0
F = 4
G = 1
H = 6
I = 3
J = 2

Total.:30
As we wish D and H are have more occurances (70% and 80% prob.)
Otherwinse E didn't come out at all. (10% prob.)

总计:30
因为我们希望 D 和 H 有更多的出现(70% 和 80% 概率)。
否则 E 根本没有出现。(概率为 10%)

回答by Andrei Ciobanu

If you are not against adding a new library in your code, this feature is already implemented in MockNeat, check the probabilities()method.

如果您不反对在代码中添加新库,则此功能已在MockNeat 中实现,请检查probabilities()方法。

Some examples directly from the wiki:

一些直接来自维基的例子:

String s = mockNeat.probabilites(String.class)
                .add(0.1, "A") // 10% chance
                .add(0.2, "B") // 20% chance
                .add(0.5, "C") // 50% chance
                .add(0.2, "D") // 20% chance
                .val();

Or if you want to generate numbers within given ranges with a given probability you can do something like:

或者,如果您想以给定的概率生成给定范围内的数字,您可以执行以下操作:

Integer x = m.probabilites(Integer.class)
             .add(0.2, m.ints().range(0, 100))
             .add(0.5, m.ints().range(100, 200))
             .add(0.3, m.ints().range(200, 300))
             .val();

Disclaimer: I am the author of the library, so I might be biased when I am recommending it.

免责声明:我是这个库的作者,所以我在推荐它时可能会有偏见。

回答by Albert Chen

Here is the python code even though you ask for java, but it's very similar.

即使您要求使用java,这里也是python代码,但它非常相似。

# weighted probability

theta = np.array([0.1,0.25,0.6,0.05])
print(theta)

sample_axis = np.hstack((np.zeros(1), np.cumsum(theta))) 
print(sample_axis)

[0. 0.1 0.35 0.95 1. ]. This represent the cumulative distribution.

[0. 0.1 0.35 0.95 1. ]。这表示累积分布。

you can use a uniform distribution to draw an index in this unit range.

您可以使用均匀分布在此单位范围内绘制索引。

def binary_search(axis, q, s, e):
    if e-s <= 1:
        print(s)
        return s
    else: 
        m = int( np.around( (s+e)/2 ) )
        if q < axis[m]:
            binary_search(axis, q, s, m)
        else:
            binary_search(axis, q, m, e)



range_index = np.random.rand(1)
print(range_index)
q = range_index
s = 0
e = sample_axis.shape[0]-1
binary_search(sample_axis, q, 0, e)

回答by RoberMP

Also responded here: find random country but probability of picking higher population country should be higher. Using TreeMap:

也在这里回应:找到随机国家但选择人口较多国家的概率应该更高。使用树图:

TreeMap<Integer, Integer> map = new TreeMap<>();
map.put(percent1, 1);
map.put(percent1 + percent2, 2);
// ...

int random = (new Random()).nextInt(100);
int result = map.ceilingEntry(random).getValue();