java 如何在java中从一组大小为n的集合中迭代生成k个元素子集?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4504974/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 06:39:30  来源:igfitidea点击:

How to iteratively generate k elements subsets from a set of size n in java?

javasubset

提问by meriton

I'm working on a puzzle that involves analyzing all size k subsets and figuring out which one is optimal. I wrote a solution that works when the number of subsets is small, but it runs out of memory for larger problems. Now I'm trying to translate an iterative function written in python to java so that I can analyze each subset as it's created and get only the value that represents how optimized it is and not the entire set so that I won't run out of memory. Here is what I have so far and it doesn't seem to finish even for very small problems:

我正在解决一个难题,该难题涉及分析所有大小为 k 的子集并找出哪个是最佳的。我写了一个解决方案,当子集数量很少时,它可以工作,但它会因为更大的问题而耗尽内存。现在我正在尝试将用 python 编写的迭代函数转换为 java,以便我可以在创建每个子集时对其进行分析,并仅获取表示优化程度的值而不是整个集合,这样我就不会用完记忆。这是我到目前为止所拥有的,即使对于非常小的问题,它似乎也没有完成:

public static LinkedList<LinkedList<Integer>> getSets(int k, LinkedList<Integer> set)
{
    int N = set.size();
    int maxsets = nCr(N, k);
    LinkedList<LinkedList<Integer>> toRet = new LinkedList<LinkedList<Integer>>();

    int remains, thresh;
    LinkedList<Integer> newset; 
    for (int i=0; i<maxsets; i++)
    {
        remains = k;
        newset = new LinkedList<Integer>();
        for (int val=1; val<=N; val++)
        {
            if (remains==0)
                break;
            thresh = nCr(N-val, remains-1);
            if (i < thresh)
            {
                newset.add(set.get(val-1));
                remains --;
            }
            else 
            {
                i -= thresh;
            }
        }
        toRet.add(newset);
    }

    return toRet;

}

Can anybody help me debug this function or suggest another algorithm for iteratively generating size k subsets?

有人可以帮我调试这个函数或建议另一种算法来迭代生成大小为 k 的子集吗?

EDIT: I finally got this function working, I had to create a new variable that was the same as i to do the i and thresh comparison because python handles for loop indexes differently.

编辑:我终于让这个函数工作了,我必须创建一个与 i 相同的新变量来进行 i 和 thresh 比较,因为 python 处理循环索引的方式不同。

回答by meriton

First, if you intend to do random access on a list, you should pick a list implementation that supports that efficiently. From the javadoc on LinkedList:

首先,如果您打算对列表进行随机访问,您应该选择一个有效支持它的列表实现。来自 LinkedList 上的 javadoc:

All of the operations perform as could be expected for a doubly-linked list. Operations that index into the list will traverse the list from the beginning or the end, whichever is closer to the specified index.

对于双向链表,所有操作都按预期执行。索引到列表中的操作将从开头或结尾遍历列表,以更接近指定索引的为准。

An ArrayList is both more space efficient and much faster for random access. Actually, since you know the length beforehand, you can even use a plain array.

对于随机访问,ArrayList 的空间效率更高,速度也更快。实际上,由于您事先知道长度,您甚至可以使用普通数组。

To algorithms: Let's start simple: How would you generate all subsets of size 1? Probably like this:

对于算法:让我们从简单开始:您将如何生成大小为 1 的所有子集?大概是这样的:

for (int i = 0; i < set.length; i++) {
    int[] subset = {i};
    process(subset);
}

Where process is a method that does something with the set, such as checking whether it is "better" than all subsets processed so far.

其中 process 是一种对集合执行某些操作的方法,例如检查它是否比迄今为止处理的所有子集“更好”。

Now, how would you extend that to work for subsets of size 2? What is the relationship between subsets of size 2 and subsets of size 1? Well, any subset of size 2 can be turned into a subset of size 1 by removing its largest element. Put differently, each subset of size 2 can be generated by taking a subset of size 1 and adding a new element larger than all other elements in the set. In code:

现在,您将如何扩展它以适用于大小为 2 的子集?大小为 2 的子集和大小为 1 的子集之间有什么关系?好吧,任何大小为 2 的子集都可以通过删除其最大元素来转换为大小为 1 的子集。换句话说,每个大小为 2 的子集可以通过取一个大小为 1 的子集并添加一个比集合中所有其他元素都大的新元素来生成。在代码中:

processSubset(int[] set) {
    int subset = new int[2];
    for (int i = 0; i < set.length; i++) {
        subset[0] = set[i];
        processLargerSets(set, subset, i);
    }
}

void processLargerSets(int[] set, int[] subset, int i) {
    for (int j = i + 1; j < set.length; j++) {
        subset[1] = set[j];
        process(subset);
    }
}

For subsets of arbitrary size k, observe that any subset of size k can be turned into a subset of size k-1 by chopping of the largest element. That is, all subsets of size k can be generated by generating all subsets of size k - 1, and for each of these, and each value larger than the largest in the subset, add that value to the set. In code:

对于任意大小为 k 的子集,观察到任何大小为 k 的子集都可以通过截断最大元素而变成大小为 k-1 的子集。也就是说,所有大小为 k 的子集可以通过生成大小为 k - 1 的所有子集来生成,并且对于这些子集中的每一个,以及每个大于子集中最大的值,将该值添加到集合中。在代码中:

static void processSubsets(int[] set, int k) {
    int[] subset = new int[k];
    processLargerSubsets(set, subset, 0, 0);
}

static void processLargerSubsets(int[] set, int[] subset, int subsetSize, int nextIndex) {
    if (subsetSize == subset.length) {
        process(subset);
    } else {
        for (int j = nextIndex; j < set.length; j++) {
            subset[subsetSize] = set[j];
            processLargerSubsets(set, subset, subsetSize + 1, j + 1);
        }
    }
}

Test code:

测试代码:

static void process(int[] subset) {
    System.out.println(Arrays.toString(subset));
}


public static void main(String[] args) throws Exception {
    int[] set = {1,2,3,4,5};
    processSubsets(set, 3);
}

But before you invoke this on huge sets remember that the number of subsets can grow rather quickly.

但是在大型集合上调用它之前,请记住子集的数量可能会增长得相当快。

回答by Yoni Zohar

You can use org.apache.commons.math3.util.Combinations.

您可以使用 org.apache.commons.math3.util.Combinations

Example:

例子:

import java.util.Arrays;
import java.util.Iterator;

import org.apache.commons.math3.util.Combinations;

public class tmp {
    public static void main(String[] args) {
        for (Iterator<int[]> iter = new Combinations(5, 3).iterator(); iter.hasNext();) {
            System.out.println(Arrays.toString(iter.next()));
        }
    }

}

Output: [0, 1, 2] [0, 1, 3] [0, 2, 3] [1, 2, 3] [0, 1, 4] [0, 2, 4] [1, 2, 4] [0, 3, 4] [1, 3, 4] [2, 3, 4]

输出: [0, 1, 2] [0, 1, 3] [0, 2, 3] [1, 2, 3] [0, 1, 4] [0, 2, 4] [1, 2, 4 ] [0, 3, 4] [1, 3, 4] [2, 3, 4]

回答by kharole

Here is a combination iterator I wrote recetnly

这是我最近写的一个组合迭代器

package psychicpoker;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;

import static com.google.common.base.Preconditions.checkArgument;

public class CombinationIterator<T> implements Iterator<Collection<T>> {

private int[] indices;
private List<T> elements;
private boolean hasNext = true;

public CombinationIterator(List<T> elements, int k) throws IllegalArgumentException {
    checkArgument(k<=elements.size(), "Impossible to select %d elements from hand of size %d", k, elements.size());
    this.indices = new int[k];
    for(int i=0; i<k; i++)
        indices[i] = k-1-i;
    this.elements = elements;
}

public boolean hasNext() {
    return hasNext;
}

private int inc(int[] indices, int maxIndex, int depth) throws IllegalStateException {
    if(depth == indices.length) {
        throw new IllegalStateException("The End");
    }
    if(indices[depth] < maxIndex) {
        indices[depth] = indices[depth]+1;
    } else {
        indices[depth] = inc(indices, maxIndex-1, depth+1)+1;
    }
    return indices[depth];
}

private boolean inc() {
    try {
        inc(indices, elements.size() - 1, 0);
        return true;
    } catch (IllegalStateException e) {
        return false;
    }
}

public Collection<T> next() {
    Collection<T> result = new ArrayList<T>(indices.length);
    for(int i=indices.length-1; i>=0; i--) {
        result.add(elements.get(indices[i]));
    }
    hasNext = inc();
    return result;
}

public void remove() {
    throw new UnsupportedOperationException();
}

}

}

回答by afsantos

I've had the same problem today, of generating all k-sizedsubsets of a n-sizedset.

我今天遇到了同样的问题,即生成n大小集合的所有k 大小子集。

I had a recursive algorithm, written in Haskell, but the problem required that I wrote a new version in Java.
In Java, I thought I'd probably have to use memoization to optimize recursion. Turns out, I found a way to do it iteratively. I was inspired by this image, from Wikipedia, on the article about Combinations.

我有一个递归算法,用 Haskell 编写,但问题要求我用 Java 编写一个新版本。
在 Java 中,我想我可能不得不使用记忆来优化递归。事实证明,我找到了一种迭代方法。我的灵感来自维基百科关于组合的文章中的这张图片

Method to calculate all k-sizedsubsets:

计算所有k 大小子集的方法

public static int[][] combinations(int k, int[] set) {
    // binomial(N, K)
    int c = (int) binomial(set.length, k);
    // where all sets are stored
    int[][] res = new int[c][Math.max(0, k)];
    // the k indexes (from set) where the red squares are
    // see image above
    int[] ind = k < 0 ? null : new int[k];
    // initialize red squares
    for (int i = 0; i < k; ++i) { ind[i] = i; }
    // for every combination
    for (int i = 0; i < c; ++i) {
        // get its elements (red square indexes)
        for (int j = 0; j < k; ++j) {
            res[i][j] = set[ind[j]];
        }
        // update red squares, starting by the last
        int x = ind.length - 1;
        boolean loop;
        do {
            loop = false;
            // move to next
            ind[x] = ind[x] + 1;
            // if crossing boundaries, move previous
            if (ind[x] > set.length - (k - x)) {
                --x;
                loop = x >= 0;
            } else {
                // update every following square
                for (int x1 = x + 1; x1 < ind.length; ++x1) {
                    ind[x1] = ind[x1 - 1] + 1;
                }
            }
        } while (loop);
    }
    return res;
}

Method for the binomial:
(Adapted from Python example, from Wikipedia)

二项式的方法:(
改编自 Python 示例,来自维基百科)

private static long binomial(int n, int k) {
    if (k < 0 || k > n) return 0;
    if (k > n - k) {    // take advantage of symmetry
        k = n - k;
    }
    long c = 1;
    for (int i = 1; i < k+1; ++i) {
        c = c * (n - (k - i));
        c = c / i;
    }
    return c;
}

Of course, combinations will always have the problem of space, as they likely explode.
In the context of my own problem, the maximum possible is about 2,000,000 subsets. My machine calculated this in 1032 milliseconds.

当然,组合总会有空间问题,因为它们很可能会爆炸。
在我自己的问题的上下文中,最大可能是大约 2,000,000 个子集。我的机器在 1032 毫秒内计算出来。

回答by Jez

Inspired by afsantos's answer :-)... I decided to write a C# .NET implementation to generate all subset combinations of a certain size from a full set. It doesn't need to calc the total number of possible subsets; it detects when it's reached the end. Here it is:

受到 afsantos 的回答的启发:-)... 我决定编写一个 C# .NET 实现来从一个完整的集合中生成特定大小的所有子集组合。不需要计算可能子集的总数;它检测何时到达终点。这里是:

public static List<object[]> generateAllSubsetCombinations(object[] fullSet, ulong subsetSize) {
    if (fullSet == null) {
        throw new ArgumentException("Value cannot be null.", "fullSet");
    }
    else if (subsetSize < 1) {
        throw new ArgumentException("Subset size must be 1 or greater.", "subsetSize");
    }
    else if ((ulong)fullSet.LongLength < subsetSize) {
        throw new ArgumentException("Subset size cannot be greater than the total number of entries in the full set.", "subsetSize");
    }

    // All possible subsets will be stored here
    List<object[]> allSubsets = new List<object[]>();

    // Initialize current pick; will always be the leftmost consecutive x where x is subset size
    ulong[] currentPick = new ulong[subsetSize];
    for (ulong i = 0; i < subsetSize; i++) {
        currentPick[i] = i;
    }

    while (true) {
        // Add this subset's values to list of all subsets based on current pick
        object[] subset = new object[subsetSize];
        for (ulong i = 0; i < subsetSize; i++) {
            subset[i] = fullSet[currentPick[i]];
        }
        allSubsets.Add(subset);

        if (currentPick[0] + subsetSize >= (ulong)fullSet.LongLength) {
            // Last pick must have been the final 3; end of subset generation
            break;
        }

        // Update current pick for next subset
        ulong shiftAfter = (ulong)currentPick.LongLength - 1;
        bool loop;
        do {
            loop = false;

            // Move current picker right
            currentPick[shiftAfter]++;

            // If we've gotten to the end of the full set, move left one picker
            if (currentPick[shiftAfter] > (ulong)fullSet.LongLength - (subsetSize - shiftAfter)) {
                if (shiftAfter > 0) {
                    shiftAfter--;
                    loop = true;
                }
            }
            else {
                // Update pickers to be consecutive
                for (ulong i = shiftAfter+1; i < (ulong)currentPick.LongLength; i++) {
                    currentPick[i] = currentPick[i-1] + 1;
                }
            }
        } while (loop);
    }

    return allSubsets;
}

回答by AHH

This solutionworked for me:

这个解决方案对我有用:

 private static void findSubsets(int array[])
    {
      int numOfSubsets = 1 << array.length; 

      for(int i = 0; i < numOfSubsets; i++)
     {
        int pos = array.length - 1;
       int bitmask = i;

       System.out.print("{");
       while(bitmask > 0)
       {
        if((bitmask & 1) == 1)
         System.out.print(array[pos]+",");
        bitmask >>= 1;
        pos--;
       }
       System.out.print("}");
     }
    }

回答by pauln

Swift implementation:

迅捷实现:

Below are two variants on the answer provided by afsantos.

以下是afsantos提供的答案的两个变体。

The first implementation of the combinationsfunction mirrors the functionality of the original Java implementation.

combinations函数的第一个实现反映了原始 Java 实现的功能。

The second implementation is a general case for finding all combinations of kvalues from the set [0, setSize). If this is really all you need, this implementation will be a bit more efficient.

第二种实现是k从集合中查找所有值组合的一般情况[0, setSize)。如果这真的是你所需要的,那么这个实现会更有效率一些。

In addition, they include a few minor optimizations and a smidgin logic simplification.

此外,它们还包括一些小的优化和 smidgin 逻辑简化。

/// Calculate the binomial for a set with a subset size
func binomial(setSize: Int, subsetSize: Int) -> Int
{
    if (subsetSize <= 0 || subsetSize > setSize) { return 0 }

    // Take advantage of symmetry
    var subsetSizeDelta = subsetSize
    if (subsetSizeDelta > setSize - subsetSizeDelta)
    {
        subsetSizeDelta = setSize - subsetSizeDelta
    }

    // Early-out
    if subsetSizeDelta == 0 { return 1 }

    var c = 1
    for i in 1...subsetSizeDelta
    {
        c = c * (setSize - (subsetSizeDelta - i))
        c = c / i
    }
    return c
}

/// Calculates all possible combinations of subsets of `subsetSize` values within `set`
func combinations(subsetSize: Int, set: [Int]) -> [[Int]]?
{
    // Validate inputs
    if subsetSize <= 0 || subsetSize > set.count { return nil }

    // Use a binomial to calculate total possible combinations
    let comboCount = binomial(setSize: set.count, subsetSize: subsetSize)
    if comboCount == 0 { return nil }

    // Our set of combinations
    var combos = [[Int]]()
    combos.reserveCapacity(comboCount)

    // Initialize the combination to the first group of set indices
    var subsetIndices = [Int](0..<subsetSize)

    // For every combination
    for _ in 0..<comboCount
    {
        // Add the new combination
        var comboArr = [Int]()
        comboArr.reserveCapacity(subsetSize)
        for j in subsetIndices { comboArr.append(set[j]) }
        combos.append(comboArr)

        // Update combination, starting with the last
        var x = subsetSize - 1
        while true
        {
            // Move to next
            subsetIndices[x] = subsetIndices[x] + 1

            // If crossing boundaries, move previous
            if (subsetIndices[x] > set.count - (subsetSize - x))
            {
                x -= 1
                if x >= 0 { continue }
            }
            else
            {
                for x1 in x+1..<subsetSize
                {
                    subsetIndices[x1] = subsetIndices[x1 - 1] + 1
                }
            }
            break
        }
    }
    return combos
}


/// Calculates all possible combinations of subsets of `subsetSize` values within a set
/// of zero-based values for the set [0, `setSize`)
func combinations(subsetSize: Int, setSize: Int) -> [[Int]]?
{
    // Validate inputs
    if subsetSize <= 0 || subsetSize > setSize { return nil }

    // Use a binomial to calculate total possible combinations
    let comboCount = binomial(setSize: setSize, subsetSize: subsetSize)
    if comboCount == 0 { return nil }

    // Our set of combinations
    var combos = [[Int]]()
    combos.reserveCapacity(comboCount)

    // Initialize the combination to the first group of elements
    var subsetValues = [Int](0..<subsetSize)

    // For every combination
    for _ in 0..<comboCount
    {
        // Add the new combination
        combos.append([Int](subsetValues))

        // Update combination, starting with the last
        var x = subsetSize - 1
        while true
        {
            // Move to next
            subsetValues[x] = subsetValues[x] + 1

            // If crossing boundaries, move previous
            if (subsetValues[x] > setSize - (subsetSize - x))
            {
                x -= 1
                if x >= 0 { continue }
            }
            else
            {
                for x1 in x+1..<subsetSize
                {
                    subsetValues[x1] = subsetValues[x1 - 1] + 1
                }
            }
            break
        }
    }
    return combos
}