Javascript 从数组中采样随机子集

Question

提问by Jeroen

What is a clean way of taking a random sample, without replacement from an array in javascript? So suppose there is an array

什么是随机采样的干净方法，而不用从 javascript 中的数组替换？所以假设有一个数组

x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

and I want to randomly sample 5 unique values; i.e. generate a random subset of length 5. To generate one random sample one could do something like:

我想随机抽样 5 个唯一值；即生成长度为 5 的随机子集。要生成一个随机样本，可以执行以下操作：

x[Math.floor(Math.random()*x.length)];

But if this is done multiple times, there is a risk of a grabbing the same entry multiple times.

但如果多次执行此操作，则存在多次抓取同一个条目的风险。

Answer 1

回答by Tim Down

I suggest shuffling a copy of the array using the Fisher-Yates shuffleand taking a slice:

我建议使用Fisher-Yates shuffle对数组的副本进行洗牌并取一个切片：

function getRandomSubarray(arr, size) {
    var shuffled = arr.slice(0), i = arr.length, temp, index;
    while (i--) {
        index = Math.floor((i + 1) * Math.random());
        temp = shuffled[index];
        shuffled[index] = shuffled[i];
        shuffled[i] = temp;
    }
    return shuffled.slice(0, size);
}

var x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15];
var fiveRandomMembers = getRandomSubarray(x, 5);

Note that this will not be the most efficient method for getting a small random subset of a large array because it shuffles the whole array unnecessarily. For better performance you could do a partial shuffle instead:

请注意，这不是获取大型数组的小随机子集的最有效方法，因为它不必要地对整个数组进行了混洗。为了获得更好的性能，您可以进行部分洗牌：

function getRandomSubarray(arr, size) {
    var shuffled = arr.slice(0), i = arr.length, min = i - size, temp, index;
    while (i-- > min) {
        index = Math.floor((i + 1) * Math.random());
        temp = shuffled[index];
        shuffled[index] = shuffled[i];
        shuffled[i] = temp;
    }
    return shuffled.slice(min);
}

Answer 2

回答by alengel

A little late to the party but this could be solved with underscore's new samplemethod (underscore 1.5.2 - Sept 2013):

聚会有点晚了，但这可以通过下划线的新示例方法解决（下划线 1.5.2 - 2013 年 9 月）：

var x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15];

var randomFiveNumbers = _.sample(x, 5);

Answer 3

回答by ntalbs

Or... if you use underscore.js...

或者...如果您使用 underscore.js...

_und = require('underscore');

...

function sample(a, n) {
    return _und.take(_und.shuffle(a), n);
}

Simple enough.

足够简单。

Answer 4

回答by tkellehe

In my opinion, I do not think shuffling the entire deck necessary. You just need to make sure your sample is random not your deck. What you can do, is select the sizeamount from the front then swap each one in the sampling array with another position in it. So, if you allow replacement you get more and more shuffled.

在我看来，我认为没有必要洗整整个套牌。你只需要确保你的样本是随机的，而不是你的牌组。您可以做的是size从前面选择数量，然后将采样阵列中的每个数量与其中的另一个位置交换。所以，如果你允许更换，你会变得越来越混乱。

function getRandom(length) { return Math.floor(Math.random()*(length)); }

function getRandomSample(array, size) {
    var length = array.length;

    for(var i = size; i--;) {
        var index = getRandom(length);
        var temp = array[index];
        array[index] = array[i];
        array[i] = temp;
    }

    return array.slice(0, size);
}

This algorithm is only 2*sizesteps, if you include the slicemethod, to select the random sample.

2*size如果包含该slice方法，则此算法只是选择随机样本的步骤。

More Random

更随机

To make the sample more random, we can randomly select the starting point of the sample. But it is a little more expensive to get the sample.

为了让样本更随机，我们可以随机选择样本的起点。但是拿到样品要贵一点。

function getRandomSample(array, size) {
    var length = array.length, start = getRandom(length);

    for(var i = size; i--;) {
        var index = (start + i)%length, rindex = getRandom(length);
        var temp = array[rindex];
        array[rindex] = array[index];
        array[index] = temp;
    }
    var end = start + size, sample = array.slice(start, end);
    if(end > length)
        sample = sample.concat(array.slice(0, end - length));
    return sample;
}

What makes this more random is the fact that when you always just shuffling the front items you tend to not get them very often in the sample if the sampling array is large and the sample is small. This would not be a problem if the array was not supposed to always be the same. So, what this method does is change up this position where the shuffled region starts.

使这更加随机的事实是，当您总是只是洗牌前项时，如果抽样数组很大而样本很小，则往往不会经常在样本中得到它们。如果数组不应该总是相同的，这将不是问题。所以，这个方法所做的就是改变这个混洗区域开始的位置。

No Replacement

无更换

To not have to copy the sampling array and not worry about replacement, you can do the following but it does give you 3*sizevs the 2*size.

为了不必复制采样数组而不用担心替换，您可以执行以下操作，但它确实为您提供了3*size与2*size.

function getRandomSample(array, size) {
    var length = array.length, swaps = [], i = size, temp;

    while(i--) {
        var rindex = getRandom(length);
        temp = array[rindex];
        array[rindex] = array[i];
        array[i] = temp;
        swaps.push({ from: i, to: rindex });
    }

    var sample = array.slice(0, size);

    // Put everything back.
    i = size;
    while(i--) {
         var pop = swaps.pop();
         temp = array[pop.from];
         array[pop.from] = array[pop.to];
         array[pop.to] = temp;
    }

    return sample;
}

No Replacement and More Random

无替换，更随机

To apply the algorithm that gave a little bit more random samples to the no replacement function:

将提供更多随机样本的算法应用于无替换函数：

function getRandomSample(array, size) {
    var length = array.length, start = getRandom(length),
        swaps = [], i = size, temp;

    while(i--) {
        var index = (start + i)%length, rindex = getRandom(length);
        temp = array[rindex];
        array[rindex] = array[index];
        array[index] = temp;
        swaps.push({ from: index, to: rindex });
    }

    var end = start + size, sample = array.slice(start, end);
    if(end > length)
        sample = sample.concat(array.slice(0, end - length));

    // Put everything back.
    i = size;
    while(i--) {
         var pop = swaps.pop();
         temp = array[pop.from];
         array[pop.from] = array[pop.to];
         array[pop.to] = temp;
    }

    return sample;
}

Faster...

快点...

Like all of these post, this uses the Fisher-Yates Shuffle. But, I removed the over head of copying the array.

像所有这些帖子一样，这使用了 Fisher-Yates Shuffle。但是，我删除了复制数组的开销。

function getRandomSample(array, size) {
    var r, i = array.length, end = i - size, temp, swaps = getRandomSample.swaps;

    while (i-- > end) {
        r = getRandom(i + 1);
        temp = array[r];
        array[r] = array[i];
        array[i] = temp;
        swaps.push(i);
        swaps.push(r);
    }

    var sample = array.slice(end);

    while(size--) {
        i = swaps.pop();
        r = swaps.pop();
        temp = array[i];
        array[i] = array[r];
        array[r] = temp;
    }

    return sample;
}
getRandomSample.swaps = [];

Answer 5

回答by Selfish

While I strongly support using the Fisher-Yates Shuffle, as suggested by Tim Down, here's a very short method for achieving a random subset as requested, mathematically correct, including the empty set, and the given set itself.

虽然我强烈支持使用 Fisher-Yates Shuffle，正如Tim Down所建议的那样，但这里有一种非常简短的方法可以根据要求实现随机子集，在数学上是正确的，包括空集和给定集本身。

Note solution depends on lodash/ underscore:

注意解决方案取决于lodash/ underscore：

Lodash v4

洛达什 v4

const _ = require('loadsh')

function subset(arr) {
    return _.sampleSize(arr, _.random(arr.length))
}

Lodash v3

洛达什 v3

const _ = require('loadsh')

function subset(arr) {
    return _.sample(arr, _.random(arr.length));
}

Answer 6

回答by Jesús López

Here is another implementation based on Fisher-Yates Shuffle. But this one is optimized for the case where the sample size is significantly smaller than the array length. This implementation doesn't scan the entire array nor allocates arrays as large as the original array. It uses sparse arrays to reduce memory allocation.

这是另一个基于 Fisher-Yates Shuffle 的实现。但是这个是针对样本大小明显小于数组长度的情况进行了优化。此实现不会扫描整个数组，也不会分配与原始数组一样大的数组。它使用稀疏数组来减少内存分配。

function getRandomSample(array, count) {
    var indices = [];
    var result = new Array(count);
    for (let i = 0; i < count; i++ ) {
        let j = Math.floor(Math.random() * (array.length - i) + i);
        result[i] = array[indices[j] === undefined ? j : indices[j]];
        indices[j] = indices[i] === undefined ? i : indices[i];
    }
    return result;
}

Answer 7

回答by chovy

If you're using lodash the API changed in 4.x:

如果你使用 lodash，API 在 4.x 中发生了变化：

const oneItem = _.sample(arr);
const nItems = _.sampleSize(arr, n);

https://lodash.com/docs#sampleSize

Answer 8

回答by Luis Marin

You can get a 5 elements sample by this way:

您可以通过这种方式获得 5 个元素的样本：

var sample = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
.map(a => [a,Math.random()])
.sort((a,b) => {return a[1] < b[1] ? -1 : 1;})
.slice(0,5)
.map(a => a[0]);

You can define it as a function to use in your code:

您可以将其定义为要在代码中使用的函数：

var randomSample = function(arr,num){ return arr.map(a => [a,Math.random()]).sort((a,b) => {return a[1] < b[1] ? -1 : 1;}).slice(0,num).map(a => a[0]); }

Or add it to the Array object itself:

或者将它添加到 Array 对象本身：

    Array.prototype.sample = function(num){ return this.map(a => [a,Math.random()]).sort((a,b) => {return a[1] < b[1] ? -1 : 1;}).slice(0,num).map(a => a[0]); };

if you want, you can separate the code for to have 2 functionalities (Shuffle and Sample):

如果需要，您可以将代码分开以获得 2 个功能（Shuffle 和 Sample）：

    Array.prototype.shuffle = function(){ return this.map(a => [a,Math.random()]).sort((a,b) => {return a[1] < b[1] ? -1 : 1;}).map(a => a[0]); };
    Array.prototype.sample = function(num){ return this.shuffle().slice(0,num); };

Answer 9

回答by AnyWhichWay

Perhaps I am missing something, but it seems there is a solution that does not require the complexity or potential overhead of a shuffle:

也许我遗漏了一些东西，但似乎有一种解决方案不需要 shuffle 的复杂性或潜在开销：

function sample(array,size) {
  const results = [],
    sampled = {};
  while(results.length<size && results.length<array.length) {
    const index = Math.trunc(Math.random() * array.length);
    if(!sampled[index]) {
      results.push(array[index]);
      sampled[index] = true;
    }
  }
  return results;
}

Answer 10

回答by mamapitufo

You can remove the elements from a copy of the array as you select them. Performance is probably not ideal, but it might be OK for what you need:

您可以在选择元素时从数组的副本中删除这些元素。性能可能并不理想，但它可能满足您的需求：

function getRandom(arr, size) {
  var copy = arr.slice(0), rand = [];
  for (var i = 0; i < size && i < copy.length; i++) {
    var index = Math.floor(Math.random() * copy.length);
    rand.push(copy.splice(index, 1)[0]);
  }
  return rand;
}

Javascript 从数组中采样随机子集

提问by Jeroen

回答by Tim Down

回答by alengel

回答by ntalbs

回答by tkellehe

More Random

更随机

No Replacement

无更换

No Replacement and More Random

无替换，更随机

Faster...

快点...

回答by Selfish

Lodash v4

洛达什 v4

Lodash v3

洛达什 v3

回答by Jesús López

回答by chovy

回答by Luis Marin

回答by AnyWhichWay

回答by mamapitufo

相关推荐

最近更新

标签

Javascript 从数组中采样随机子集

提问by Jeroen

回答by Tim Down

回答by alengel

回答by ntalbs

回答by tkellehe

More Random

更随机

No Replacement

无更换

No Replacement and More Random

无替换，更随机

Faster...

快点...

回答by Selfish

Lodash v4

洛达什 v4

Lodash v3

洛达什 v3

回答by Jesús López

回答by chovy

回答by Luis Marin

回答by AnyWhichWay

回答by mamapitufo

相关推荐

Javascript 随机身体背景图片

JavaScript 中美元金额的正则表达式

Javascript HTML5 DnD dataTransfer setData 或 getData 不适用于除 Firefox 之外的所有浏览器

Javascript window.onload = init(); 有什么区别？和 window.onload = init;

相关推荐

最近更新

标签