Java Collections.shuffle() 真的足够随机吗?实际例子似乎否定了这个说法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9701639/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-16 06:25:01  来源:igfitidea点击:

Is Collections.shuffle() really random enough? Practical examples seem to deny this statement

javaarrayscollectionsrandomshuffle

提问by basZero

I have 1000 unique objects in a java.util.List, each referring to an image, each image in the 1000-list is unique and now I'd like to shuffle them, so that I can use the first 20 objects and present them to the website-user. The user can then click a button saying "Shuffle", and I retrieve the 1000 images again from scratch and calling again shuffle(). However, it seems that out of 1000 image objects, I very often see the same image again and again between the 20-image-selections.

我在 a 中有 1000 个唯一对象java.util.List,每个对象都指向一个图像,1000 个列表中的每个图像都是唯一的,现在我想对它们进行随机排列,以便我可以使用前 20 个对象并将它们呈现给网站用户. 然后用户可以单击一个按钮,说“Shuffle”,然后我再次从头开始检索 1000 张图像并再次调用shuffle()。然而,似乎在 1000 个图像对象中,我经常在 20 个图像选择之间一次又一次地看到相同的图像。

Something seems to be wrong, any better suggestion, advices?

似乎有些不对劲,有什么更好的建议,建议吗?

My code is very simple:

我的代码很简单:

List<String> imagePaths = get1000Images();
Collections.shuffle(imagePaths);

int i = 0;
for (String path: imagePaths) {
  ... do something with the path ...
  i++;
  if (i >= 20) break;
}

I know that Collections.shuffle()is well distributed: see for instance http://blog.ryanrampersad.com/2012/03/03/more-on-shuffling-an-array-correctly/

我知道这Collections.shuffle()是很好的分布:例如参见http://blog.ryanrampersad.com/2012/03/03/more-on-shuffling-an-array-correctly/

However, I just have the feeling that the probability of seeing the same image over and over again in a set of 20 images out of 1000 should be much less...

但是,我只是觉得在 1000 张图像中的 20 张图像中一遍又一遍地看到相同图像的可能性应该小得多......

Inputs highly appreciated.

高度赞赏的投入。

采纳答案by Dave Webb

If you're showing 20 images out of 1000 the probability of seeing any one of that 20repeated in the next iteration is approximately 0.34 so you shouldn't be surprised to see images repeating.

如果您显示 1000 张图像中的 20 张,则在下一次迭代中看到这 20 张中的任何一张重复的概率约为 0.34,因此看到图像重复您应该不会感到惊讶。

The chances of seeing a specific image is still one in a thousand, but if you're looking for twenty images the chances are much higher.

看到特定图像的机会仍然是千分之一,但如果您正在寻找 20 张图像,则机会要高得多。

We can calculate the probability of none of the previous 20 images repeating as:

我们可以计算前 20 张图像均不重复的概率为:

 980   979         961
———— × ——— × ... × ——— ≈ 0.66
1000   999         981

And so the probability of seeing a repeat is one minus this, or approximately 0.34.

所以看到重复的概率是 1 减去这个,或者大约 0.34。

And the probability of seeing an image repeated in either of the next two iterations is:

并且在接下来的两次迭代中看到重复的图像的概率是:

1 - (0.66 × 0.66) ≈ 0.56

In other words, it's more likely than not that you'll see a repeated image over the two following cycles. (And this isn't including images repeated from the second cycle in the third which will only make it more likely.)

换句话说,您很可能会在接下来的两个周期中看到重复的图像。(这不包括从第三个周期中的第二个周期重复的图像,这只会使它更有可能。)

For what it's worth, here's some Java code to do the above calculation:

对于它的价值,这里有一些 Java 代码来进行上述计算:

float result = 1.0f;
int totalImages = 1000;
int displayedImages = 20;

for (int i = 0; i < displayedImages; i++) {
  result = result * (totalImages - displayedImages - i) / (totalImages - i);
}

System.out.println(result);

回答by Graham Borland

With that code, if you're seeing the same image over and over, it means the same image exists many times in the list. Whereever you're getting your 1000 images from, there are duplicates.

使用该代码,如果您一遍又一遍地看到相同的图像,则意味着列表中多次出现相同的图像。无论您从何处获取 1000 张图像,都会有重复项。

回答by amit

Your intuition is correct for a specific image [you are not likely to see a specific imageover and over again], but not for a general image [you are likely to see some imagerepeating]. This is one of these places in probability that our automatic intuition is wrong...

您的直觉对于特定图像是正确的[您不太可能一遍又一遍地看到特定图像],但对于一般图像[您可能会看到某些图像重复],您的直觉是正确的。这是我们的自动直觉可能是错误的这些地方之一......

This reminds me the birthday paradox, which contradicts the intuition, and says - for a group of 23 people, the likelihood of 2 of them having the same birthday is 0.5, much more then the intuition expects!

这让我想起了生日悖论,它与直觉相矛盾,并说 - 对于一组 23 人,其中 2 个人生日相同的可能性是 0.5,远远超过直觉预期!

回答by AlexR

Following your question I wrote the following program. I created list of sequential integers and shuffled it 10, 100, 1000 and 10000 times. After every series of shuffles I checked value of element in 5th position of the array and created array of counters: how many times each number appears at 5th position.

根据您的问题,我编写了以下程序。我创建了连续整数列表并将其打乱了 10、100、1000 和 10000 次。在每一系列洗牌之后,我检查了数组第 5 个位置的元素值并创建了计数器数组:每个数字出现在第 5 个位置的次数。

Here is the program:

这是程序:

public class MyTest {
    public static void main(String[] args) {
        int n = 10;
        List<Integer> list = new ArrayList<Integer>();
        for (int i = 0;  i < n;  i++) {
            list.add(i);
        }

        int[] counters = new int[n];

        for(int shuffles : new int[] {10, 100, 1000, 10000}) {
            Arrays.fill(counters, 0);
            for (int i = 0;  i < shuffles; i++) {
                Collections.shuffle(list);
                // check 5-th element
                int fifth = list.get(5);
                counters[fifth] = counters[fifth] + 1;
            }
            System.out.println(shuffles + ": " + Arrays.toString(counters));
        }
    }
}

And here are the results:

结果如下:

10: [0, 1, 1, 1, 2, 0, 0, 3, 2, 0] 100: [11, 9, 9, 7, 10, 12, 13, 13, 8, 8] 1000: [100, 101, 107, 101, 95, 96, 109, 83, 93, 115] 10000: [1015, 942, 990, 1003, 1015, 1037, 977, 1060, 950, 1011]

10: [0, 1, 1, 1, 2, 0, 0, 3, 2, 0] 100: [11, 9, 9, 7, 10, 12, 13, 13, 8, 8] 1000: [100 , 101, 107, 101, 95, 96, 109, 83, 93, 115] 10000: [1015, 942, 990, 1003, 1015, 1037, 977, 1060, 1000] 10000

As you can see the "randomality" depends on number of shuffles. If you shuffle array 10 times the minimal counter is 0 and the maximal is 3. The difference between these values for 100 shuffles (in per cents) much smaller. The numbers a almost the same for 10000 shuffles.

如您所见,“随机性”取决于洗牌次数。如果将数组 shuffle 10 次,最小计数器为 0,最大值为 3。 100 次 shuffle(以百分比为单位)的这些值之间的差异要小得多。10000 次 shuffle 的数字 a 几乎相同。

I think that this test models your use-case: you are showing images in specific position of shuffled collection.

我认为这个测试模拟了你的用例:你在洗牌集合的特定位置显示图像。

Please see post of @amit that describes the meaning of shuffle.

请参阅@amit 的帖子,其中描述了 shuffle 的含义。

So, the solution for you is to shuffle your array 10 times.

因此,您的解决方案是将数组洗牌 10 次。

EDIT: @Dave Webb gave perfect explanation for the case.

编辑:@Dave Webb 对此案给出了完美的解释。

The second thinking is the following: you actually do not have to shuffleyou list of 1000 elements to take 20 first element from it. It is enough to take 20 random elements. You will get the same effect but much more effective solution:

第二个想法是:你其实不必打乱你1000级的组件列表拿20第一个元素从它。取 20 个随机元素就足够了。您将获得相同的效果但更有效的解决方案:

Set<Image> show = new HashSet<Image>();
Random r = new Random(System.currentTimeMillis());
for (int i = 0;  show.size() < 20;  i++) {
    show.add(list.get(r.nextInt()));
}

回答by Peter Lawrey

Its human nature to see patterns which are not there. Many people see patterns in the planets and stars as guiding their life.

看到不存在的模式是人的天性。许多人将行星和恒星的模式视为指导他们的生活。

In the first 1000 digits of PI there are six nines in a row. Does that mean the digits of PI are not random? no. The pattern doesn't occur again any more than your might expect.

在 PI 的前 1000 位数字中,连续有六个 9。这是否意味着 PI 的数字不是随机的?不。该模式不会再次出现,超出您的预期。

Having said that, Random is not completely random and it will repeat after 2^48 calls. (it uses a 48-bit seed) This means its not possible to produce every possible longor doubleusing it. If you want more randomness you can use SecureRandom with shuffle instead.

话虽如此, Random 并不是完全随机的,它会在 2^48 次调用后重复。(它使用 48 位种子)这意味着它不可能产生longdouble使用它。如果你想要更多的随机性,你可以使用 SecureRandom 和 shuffle。

It sounds like what you want is something like this

听起来你想要的是这样的

List<String> imagePaths = new ArrayList<>();

// called repeatedly
if (imagePaths.size() <= 500) {
    imagePaths = get1000Images();
    Collections.shuffle(imagePaths);
}

for (String path: imagePaths.subList(0, 20)) {
  ... do something with the path ...
}

imagePaths = imagePaths.subList(20, imagePaths.size());

This will ensure that you don't see the same image in the last 500 calls.

这将确保您在最近 500 次调用中不会看到相同的图像。

回答by Nicholas

I did a 52 card shuffle four different times and marked every time each iteration repeated the exact same card in the exact same slot, which gave me approximately 14 out of 208 cards, which was approximately 93.3% random.

我做了四次不同的 52 张牌洗牌,每次迭代都在完全相同的插槽中重复完全相同的牌,这给了我大约 208 张牌中的 14 张,大约是 93.3% 随机。