从n返回k个元素的所有组合的算法

时间:2020-03-06 14:40:07  来源:igfitidea点击:

我想编写一个函数,该函数采用字母数组作为参数,并选择多个字母。

假设我们提供8个字母组成的数组,并希望从中选择3个字母。然后我们将获得:

8! / ((8 - 3)! * 3!) = 56

返回由3个字母组成的数组(或者单词)。

解决方案

《计算机编程艺术》第4卷:分册3比我所描述的更适合特定情况。

格雷码

我们会遇到的一个问题当然是内存,而且很快,集合-20C3 = 1140中的20个元素就会有问题。而且,如果要遍历集合,最好使用修改后的格雷码算法,因此我们并没有将所有这些都保留在内存中。这些将根据之前的内容生成下一个组合,并避免重复。其中有许多用于不同用途。我们是否要最大化连续组合之间的差异?最小化?等等。

一些描述格雷码的原始论文:

  • 某些汉密尔顿路径和最小变化算法
  • 相邻立交组合生成算法

以下是涉及该主题的其他一些论文:

  • Eades,Hickey,Read相邻互换组合生成算法(PDF,带有Pascal中的代码)的有效实现
  • 组合发电机
  • 组合格雷码调查(PostScript)
  • 格雷码算法

蔡斯的小提琴(算法)

菲利普·J·蔡斯(Phillip J Chase),"算法382:N个对象中的M个组合"(1970年)

C语言中的算法

词典顺序的组合索引(Buckles算法515)

我们也可以按组合索引(按字典顺序)引用组合。意识到索引应该是基于索引从右到左的一些变化,我们可以构造一些应该恢复组合的东西。

因此,我们有一组{1,2,3,4,5,6} ...并且我们想要三个元素。假设{1,2,3},我们可以说元素之间的差异是一个,并且顺序是最小的。 {1,2,4}有一个变化,按字典顺序是数字2. 因此,最后一位的"变化"数量说明了字典顺序中的一个变化。发生变化的第二个位置{1,3,4}发生了一个变化,但是由于它位于第二位(与原始集合中元素的数量成正比),因此说明了更多的变化。

我所描述的方法似乎是一种解构,从设置到索引,我们需要做相反的事情-这要复杂得多。这是扣具解决问题的方式。我写了一些C来计算它们,并做了些微改动-我使用集合的索引而不是数字范围来表示集合,因此我们始终从0 ... n开始工作。
笔记:

  • 由于组合是无序的,因此{1,3,2} = {1,2,3}-我们将它们排序为词典顺序。
  • 此方法具有一个隐式0,以开始第一个差异的集合。

词典顺序的组合索引(McCaffrey)

还有另一种方法:它的概念更易于掌握和编程,但是却没有优化Buckles。幸运的是,它也不会产生重复的组合:

最大化的集合,其中<img src =" https://i.stack.imgur.com/vIeiI.gif" alt =" C(n,r)= {n选择r}">。

例如:27 = C(6,4)+ C(5,3)+ C(2,2)+ C(1,1)。因此,第四项的第27个字典组合为:{1,2,5,6},这些是我们要查看的任何集合的索引。下面的示例(OCaml)需要choose函数,留给读者:

(* this will find the [x] combination of a [set] list when taking [k] elements *)
let combination_maccaffery set k x =
    (* maximize function -- maximize a that is aCb              *)
    (* return largest c where c < i and choose(c,i) <= z        *)
    let rec maximize a b x =
        if (choose a b ) <= x then a else maximize (a-1) b x
    in
    let rec iterate n x i = match i with
        | 0 -> []
        | i ->
            let max = maximize n i x in
            max :: iterate n (x - (choose max i)) (i-1)
    in
    if x < 0 then failwith "errors" else
    let idxs =  iterate (List.length set) x k in
    List.map (List.nth set) (List.sort (-) idxs)

假设字母数组如下所示:" ABCDEFGH"。我们有三个索引(i,j,k),它们指示当前单词要使用的字母,我们从以下位置开始:

A B C D E F G H
^ ^ ^
i j k

首先,我们改变k,所以下一步看起来像这样:

A B C D E F G H
^ ^   ^
i j   k

如果到达末尾,则继续并依次改变j和k。

A B C D E F G H
^   ^ ^
i   j k

A B C D E F G H
^   ^   ^
i   j   k

一旦我们达到G,我们也将开始改变i。

A B C D E F G H
  ^ ^ ^
  i j k

A B C D E F G H
  ^ ^   ^
  i j   k
...

用代码写的看起来像这样

void print_combinations(const char *string)
{
    int i, j, k;
    int len = strlen(string);

    for (i = 0; i < len - 2; i++)
    {
        for (j = i + 1; j < len - 1; j++)
        {
            for (k = j + 1; k < len; k++)
                printf("%c%c%c\n", string[i], string[j], string[k]);
        }
    }
}

static IEnumerable<string> Combinations(List<string> characters, int length)
{
    for (int i = 0; i < characters.Count; i++)
    {
        // only want 1 character, just return this one
        if (length == 1)
            yield return characters[i];

        // want more than one character, return this one plus all combinations one shorter
        // only use characters after the current one for the rest of the combinations
        else
            foreach (string next in Combinations(characters.GetRange(i + 1, characters.Count - (i + 1)), length - 1))
                yield return characters[i] + next;
    }
}

如果我们可以使用SQL语法说,如果我们正在使用LINQ来访问结构或者数组的字段,或者直接访问具有仅包含一个字符字段" Letter"的名为" Alphabet"的表的数据库,则可以改编以下代码:

SELECT A.Letter, B.Letter, C.Letter
FROM Alphabet AS A, Alphabet AS B, Alphabet AS C
WHERE A.Letter<>B.Letter AND A.Letter<>C.Letter AND B.Letter<>C.Letter
AND A.Letter<B.Letter AND B.Letter<C.Letter

即使我们在"字母"表中有多少个字母(也可以是3、8、10、27等),这将返回3个字母的所有组合。

如果我们想要的是所有排列,而不是组合(即我们希望" ACB"和" ABC"算作不同,而不是只出现一次),只需删除最后一行(AND),即可完成。

编辑后:重新阅读问题后,我意识到需要的是通用算法,而不仅仅是选择3个项目的特定算法。亚当·休斯(Adam Hughes)的回答是完整的,很遗憾,我还不能投票赞成。这个答案很简单,但仅适用于我们只需要3个项目的情况。

以下递归算法从有序集合中选择所有k元素组合:

  • 选择组合的第一个元素" i"
  • 将" i"与从大于" i"的元素集中递归选择的" k-1"个元素的每种组合结合起来。

对集合中的每个i进行上述迭代。

必须选择大于" i"的其余元素,以避免重复。这样,[3,5]将仅被选择一次,因为[3]与[5]组合,而不是两次(条件消除了[5] + [3])。没有这种条件,我们将获得变化而不是组合。

这是我在C ++中的主张

我试图对迭代器类型施加尽可能小的限制,因此此解决方案仅假定转发迭代器,并且它可以是const_iterator。这适用于任何标准容器。如果参数没有意义,则抛出std :: invalid_argumnent

#include <vector>
#include <stdexcept>

template <typename Fci> // Fci - forward const iterator
std::vector<std::vector<Fci> >
enumerate_combinations(Fci begin, Fci end, unsigned int combination_size)
{
    if(begin == end && combination_size > 0u)
        throw std::invalid_argument("empty set and positive combination size!");
    std::vector<std::vector<Fci> > result; // empty set of combinations
    if(combination_size == 0u) return result; // there is exactly one combination of
                                              // size 0 - emty set
    std::vector<Fci> current_combination;
    current_combination.reserve(combination_size + 1u); // I reserve one aditional slot
                                                        // in my vector to store
                                                        // the end sentinel there.
                                                        // The code is cleaner thanks to that
    for(unsigned int i = 0u; i < combination_size && begin != end; ++i, ++begin)
    {
        current_combination.push_back(begin); // Construction of the first combination
    }
    // Since I assume the itarators support only incrementing, I have to iterate over
    // the set to get its size, which is expensive. Here I had to itrate anyway to  
    // produce the first cobination, so I use the loop to also check the size.
    if(current_combination.size() < combination_size)
        throw std::invalid_argument("combination size > set size!");
    result.push_back(current_combination); // Store the first combination in the results set
    current_combination.push_back(end); // Here I add mentioned earlier sentinel to
                                        // simplyfy rest of the code. If I did it 
                                        // earlier, previous statement would get ugly.
    while(true)
    {
        unsigned int i = combination_size;
        Fci tmp;                            // Thanks to the sentinel I can find first
        do                                  // iterator to change, simply by scaning
        {                                   // from right to left and looking for the
            tmp = current_combination[--i]; // first "bubble". The fact, that it's 
            ++tmp;                          // a forward iterator makes it ugly but I
        }                                   // can't help it.
        while(i > 0u && tmp == current_combination[i + 1u]);

        // Here is probably my most obfuscated expression.
        // Loop above looks for a "bubble". If there is no "bubble", that means, that
        // current_combination is the last combination, Expression in the if statement
        // below evaluates to true and the function exits returning result.
        // If the "bubble" is found however, the ststement below has a sideeffect of 
        // incrementing the first iterator to the left of the "bubble".
        if(++current_combination[i] == current_combination[i + 1u])
            return result;
        // Rest of the code sets posiotons of the rest of the iterstors
        // (if there are any), that are to the right of the incremented one,
        // to form next combination

        while(++i < combination_size)
        {
            current_combination[i] = current_combination[i - 1u];
            ++current_combination[i];
        }
        // Below is the ugly side of using the sentinel. Well it had to haave some 
        // disadvantage. Try without it.
        result.push_back(std::vector<Fci>(current_combination.begin(),
                                          current_combination.end() - 1));
    }
}

我在python中有一个用于项目euler的置换算法:

def missing(miss,src):
    "Returns the list of items in src not present in miss"
    return [i for i in src if i not in miss]

def permutation_gen(n,l):
    "Generates all the permutations of n items of the l list"
    for i in l:
        if n<=1: yield [i]
        r = [i]
        for j in permutation_gen(n-1,missing([i],l)):  yield r+j

如果

n<len(l)

我们应该没有重复就拥有所有需要的组合,我们需要吗?

它是一个生成器,因此我们可以在以下方式中使用它:

for comb in permutation_gen(3,list("ABCDEFGH")):
    print comb

在Python中,例如Andrea Ambu,但没有为选择三个而进行硬编码。

def combinations(list, k):
    """Choose combinations of list, choosing k elements(no repeats)"""
    if len(list) < k:
        return []
    else:
        seq = [i for i in range(k)]
        while seq:
            print [list[index] for index in seq]
            seq = get_next_combination(len(list), k, seq)

def get_next_combination(num_elements, k, seq):
        index_to_move = find_index_to_move(num_elements, seq)
        if index_to_move == None:
            return None
        else:
            seq[index_to_move] += 1

            #for every element past this sequence, move it down
            for i, elem in enumerate(seq[(index_to_move+1):]):
                seq[i + 1 + index_to_move] = seq[index_to_move] + i + 1

            return seq

def find_index_to_move(num_elements, seq):
        """Tells which index should be moved"""
        for rev_index, elem in enumerate(reversed(seq)):
            if elem < (num_elements - rev_index - 1):
                return len(seq) - rev_index - 1
        return None

我为此在SQL Server 2005中创建了一个解决方案,并将其发布在我的网站上:http://www.jessemclain.com/downloads/code/sql/fn_GetMChooseNCombos.sql.htm

这是显示用法的示例:

SELECT * FROM dbo.fn_GetMChooseNCombos('ABCD', 2, '')

结果:

Word
----
AB
AC
AD
BC
BD
CD

(6 row(s) affected)

在C ++中,以下例程将生成范围[first,last)之间的长度距离(first,k)的所有组合:

#include <algorithm>

template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
   /* Credits: Mark Nelson http://marknelson.us */
   if ((first == last) || (first == k) || (last == k))
      return false;
   Iterator i1 = first;
   Iterator i2 = last;
   ++i1;
   if (last == i1)
      return false;
   i1 = last;
   --i1;
   i1 = k;
   --i2;
   while (first != i1)
   {
      if (*--i1 < *i2)
      {
         Iterator j = k;
         while (!(*i1 < *j)) ++j;
         std::iter_swap(i1,j);
         ++i1;
         ++j;
         i2 = k;
         std::rotate(i1,j,last);
         while (last != j)
         {
            ++j;
            ++i2;
         }
         std::rotate(k,i2,last);
         return true;
      }
   }
   std::rotate(first,k,last);
   return false;
}

可以这样使用:

#include <string>
#include <iostream>

int main()
{
    std::string s = "12345";
    std::size_t comb_size = 3;
    do
    {
        std::cout << std::string(s.begin(), s.begin() + comb_size) << std::endl;
    } while (next_combination(s.begin(), s.begin() + comb_size, s.end()));

    return 0;
}

这将打印以下内容:

123
124
125
134
135
145
234
235
245
345

在C#中:

public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k)
{
  return k == 0 ? new[] { new T[0] } :
    elements.SelectMany((e, i) =>
      elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] {e}).Concat(c)));
}

用法:

var result = Combinations(new[] { 1, 2, 3, 4, 5 }, 3);

结果:

123
124
125
134
135
145
234
235
245
345

这是我最近用Java写的代码,该代码计算并返回" outOf"元素中" num"元素的所有组合。

// author: Sourabh Bhat ([email protected])

public class Testing
{
    public static void main(String[] args)
    {

// Test case num = 5, outOf = 8.

        int num = 5;
        int outOf = 8;
        int[][] combinations = getCombinations(num, outOf);
        for (int i = 0; i < combinations.length; i++)
        {
            for (int j = 0; j < combinations[i].length; j++)
            {
                System.out.print(combinations[i][j] + " ");
            }
            System.out.println();
        }
    }

    private static int[][] getCombinations(int num, int outOf)
    {
        int possibilities = get_nCr(outOf, num);
        int[][] combinations = new int[possibilities][num];
        int arrayPointer = 0;

        int[] counter = new int[num];

        for (int i = 0; i < num; i++)
        {
            counter[i] = i;
        }
        breakLoop: while (true)
        {
            // Initializing part
            for (int i = 1; i < num; i++)
            {
                if (counter[i] >= outOf - (num - 1 - i))
                    counter[i] = counter[i - 1] + 1;
            }

            // Testing part
            for (int i = 0; i < num; i++)
            {
                if (counter[i] < outOf)
                {
                    continue;
                } else
                {
                    break breakLoop;
                }
            }

            // Innermost part
            combinations[arrayPointer] = counter.clone();
            arrayPointer++;

            // Incrementing part
            counter[num - 1]++;
            for (int i = num - 1; i >= 1; i--)
            {
                if (counter[i] >= outOf - (num - 1 - i))
                    counter[i - 1]++;
            }
        }

        return combinations;
    }

    private static int get_nCr(int n, int r)
    {
        if(r > n)
        {
            throw new ArithmeticException("r is greater then n");
        }
        long numerator = 1;
        long denominator = 1;
        for (int i = n; i >= r + 1; i--)
        {
            numerator *= i;
        }
        for (int i = 2; i <= n - r; i++)
        {
            denominator *= i;
        }

        return (int) (numerator / denominator);
    }
}

这是一些简单的代码,可以打印所有C(n,m)组合。它通过初始化并移动指向下一个有效组合的一组数组索引来工作。索引被初始化为指向最低的m个索引(按字典顺序,最小的组合)。然后,从第m个索引开始,我们尝试将索引向前移动。如果索引已达到其极限,则尝试使用先前的索引(一直向下到索引1)。如果我们可以向前移动索引,那么我们将重置所有更大的索引。

m=(rand()%n)+1; // m will vary from 1 to n

for (i=0;i<n;i++) a[i]=i+1;

// we want to print all possible C(n,m) combinations of selecting m objects out of n
printf("Printing C(%d,%d) possible combinations ...\n", n,m);

// This is an adhoc algo that keeps m pointers to the next valid combination
for (i=0;i<m;i++) p[i]=i; // the p[.] contain indices to the a vector whose elements constitute next combination

done=false;
while (!done)
{
    // print combination
    for (i=0;i<m;i++) printf("%2d ", a[p[i]]);
    printf("\n");

    // update combination
    // method: start with p[m-1]. try to increment it. if it is already at the end, then try moving p[m-2] ahead.
    // if this is possible, then reset p[m-1] to 1 more than (the new) p[m-2].
    // if p[m-2] can not also be moved, then try p[m-3]. move that ahead. then reset p[m-2] and p[m-1].
    // repeat all the way down to p[0]. if p[0] can not also be moved, then we have generated all combinations.
    j=m-1;
    i=1;
    move_found=false;
    while ((j>=0) && !move_found)
    {
        if (p[j]<(n-i)) 
        {
            move_found=true;
            p[j]++; // point p[j] to next index
            for (k=j+1;k<m;k++)
            {
                p[k]=p[j]+(k-j);
            }
        }
        else
        {
            j--;
            i++;
        }
    }
    if (!move_found) done=true;
}

这是Scala中一个优雅的通用实现,如99 Scala Problems中所述。

object P26 {
  def flatMapSublists[A,B](ls: List[A])(f: (List[A]) => List[B]): List[B] = 
    ls match {
      case Nil => Nil
      case sublist@(_ :: tail) => f(sublist) ::: flatMapSublists(tail)(f)
    }

  def combinations[A](n: Int, ls: List[A]): List[List[A]] =
    if (n == 0) List(Nil)
    else flatMapSublists(ls) { sl =>
      combinations(n - 1, sl.tail) map {sl.head :: _}
    }
}

我可以提出我的递归Python解决方案吗?

def choose_iter(elements, length):
    for i in xrange(len(elements)):
        if length == 1:
            yield (elements[i],)
        else:
            for next in choose_iter(elements[i+1:len(elements)], length-1):
                yield (elements[i],) + next
def choose(l, k):
    return list(choose_iter(l, k))

用法示例:

>>> len(list(choose_iter("abcdefgh",3)))
56

我喜欢它的简单性。

在这里,我们有一个用C#编码的算法的惰性评估版:

static bool nextCombination(int[] num, int n, int k)
    {
        bool finished, changed;

        changed = finished = false;

        if (k > 0)
        {
            for (int i = k - 1; !finished && !changed; i--)
            {
                if (num[i] < (n - 1) - (k - 1) + i)
                {
                    num[i]++;
                    if (i < k - 1)
                    {
                        for (int j = i + 1; j < k; j++)
                        {
                            num[j] = num[j - 1] + 1;
                        }
                    }
                    changed = true;
                }
                finished = (i == 0);
            }
        }

        return changed;
    }

    static IEnumerable Combinations<T>(IEnumerable<T> elements, int k)
    {
        T[] elem = elements.ToArray();
        int size = elem.Length;

        if (k <= size)
        {
            int[] numbers = new int[k];
            for (int i = 0; i < k; i++)
            {
                numbers[i] = i;
            }

            do
            {
                yield return numbers.Select(n => elem[n]);
            }
            while (nextCombination(numbers, size, k));
        }
    }

并测试部分:

static void Main(string[] args)
    {
        int k = 3;
        var t = new[] { "dog", "cat", "mouse", "zebra"};

        foreach (IEnumerable<string> i in Combinations(t, k))
        {
            Console.WriteLine(string.Join(",", i));
        }
    }

希望这对我们有所帮助!