python 将数字列表划分为 2 个相等的总和列表的算法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/890171/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:04:13  来源:igfitidea点击:

Algorithm to Divide a list of numbers into 2 equal sum lists

pythonalgorithmdynamic-programmingnp-completeknapsack-problem

提问by Lakshman Prasad

There is a list of numbers.

有一个数字列表。

The list is to be divided into 2 equal sized lists, with a minimal difference in sum. The sums have to be printed.

该列表将被分成 2 个大小相等的列表,总和的差异最小。必须打印总和。

#Example:
>>>que = [2,3,10,5,8,9,7,3,5,2]
>>>make_teams(que)
27 27

Is there an error in the following code algorithm for some case?

在某些情况下,以下代码算法中是否存在错误?

How do I optimize and/or pythonize this?

我如何优化和/或pythonize这个?

def make_teams(que):
    que.sort()
    if len(que)%2: que.insert(0,0)
    t1,t2 = [],[]
    while que:
    val = (que.pop(), que.pop())
    if sum(t1)>sum(t2):
        t2.append(val[0])
        t1.append(val[1])
    else:
        t1.append(val[0])
        t2.append(val[1])
    print min(sum(t1),sum(t2)), max(sum(t1),sum(t2)), "\n"

Question is from http://www.codechef.com/problems/TEAMSEL/

问题来自http://www.codechef.com/problems/TEAMSEL/

采纳答案by Unknown

New Solution

新解决方案

This is a breadth-first search with heuristics culling. The tree is limited to a depth of players/2. The player sum limit is totalscores/2. With a player pool of 100, it took approximately 10 seconds to solve.

这是使用启发式剔除的广度优先搜索。树的深度限制为玩家/2。玩家总和限制为总分/2。玩家池为 100 人时,大约需要 10 秒才能解决。

def team(t):
    iterations = range(2, len(t)/2+1)

    totalscore = sum(t)
    halftotalscore = totalscore/2.0

    oldmoves = {}

    for p in t:
        people_left = t[:]
        people_left.remove(p)
        oldmoves[p] = people_left

    if iterations == []:
        solution = min(map(lambda i: (abs(float(i)-halftotalscore), i), oldmoves.keys()))
        return (solution[1], sum(oldmoves[solution[1]]), oldmoves[solution[1]])

    for n in iterations:
        newmoves = {}
        for total, roster in oldmoves.iteritems():
            for p in roster:
                people_left = roster[:]
                people_left.remove(p)
                newtotal = total+p
                if newtotal > halftotalscore: continue
                newmoves[newtotal] = people_left
        oldmoves = newmoves

    solution = min(map(lambda i: (abs(float(i)-halftotalscore), i), oldmoves.keys()))
    return (solution[1], sum(oldmoves[solution[1]]), oldmoves[solution[1]])

print team([90,200,100])
print team([2,3,10,5,8,9,7,3,5,2])
print team([1,1,1,1,1,1,1,1,1,9])
print team([87,100,28,67,68,41,67,1])
print team([1, 1, 50, 50, 50, 1000])

#output
#(200, 190, [90, 100])
#(27, 27, [3, 9, 7, 3, 5])
#(5, 13, [1, 1, 1, 1, 9])
#(229, 230, [28, 67, 68, 67])
#(150, 1002, [1, 1, 1000])

Also note that I attempted to solve this using GS's description, but it is impossible to get enough information simply by storing the running totals. And if you stored boththe number of items and totals, then it would be the same as this solution except you kept needless data. Because you only need to keep the n-1 and n iterations up to numplayers/2.

另请注意,我尝试使用 GS 的描述来解决此问题,但仅通过存储运行总数不可能获得足够的信息。如果您同时存储项目数和总数,那么它与此解决方案相同,只是您保留了不必要的数据。因为您只需要将 n-1 和 n 次迭代保持到 numplayers/2。

I had an old exhaustive one based on binomial coefficients (look in history). It solved the example problems of length 10 just fine, but then I saw that the competition had people of up to length 100.

我有一个基于二项式系数的旧的详尽的(查看历史)。它很好地解决了长度为 10 的示例问题,但后来我看到比赛有长达 100 的人。

回答by Georg Sch?lly

Dynamic programmingis the solution you're looking for.

动态规划是您正在寻找的解决方案。

Example with [4, 3, 10, 3, 2, 5]:

以 [4, 3, 10, 3, 2, 5] 为例:

X-Axis: Reachable sum of group.        max = sum(all numbers) / 2    (rounded up)
Y-Axis: Count elements in group.       max = count numbers / 2       (rounded up)

      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  |  |  | 4|  |  |  |  |  |  |  |  |  |  |       //  4
 2  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
 3  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  |  | 3| 4|  |  |  |  |  |  |  |  |  |  |       //  3
 2  |  |  |  |  |  |  | 3|  |  |  |  |  |  |  |
 3  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  |  | 3| 4|  |  |  |  |  |10|  |  |  |  |       // 10
 2  |  |  |  |  |  |  | 3|  |  |  |  |  |10|10|
 3  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  |  | 3| 4|  |  |  |  |  |10|  |  |  |  |       //  3
 2  |  |  |  |  |  | 3| 3|  |  |  |  |  |10|10|
 3  |  |  |  |  |  |  |  |  |  | 3|  |  |  |  |
      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  | 2| 3| 4|  |  |  |  |  |10|  |  |  |  |       //  2
 2  |  |  |  |  | 2| 3| 3|  |  |  |  | 2|10|10|
 3  |  |  |  |  |  |  |  | 2| 2| 3|  |  |  |  |
      1  2  3  4  5  6  7  8  9 10 11 12 13 14
 1  |  | 2| 3| 4| 5|  |  |  |  |10|  |  |  |  |       //  5
 2  |  |  |  |  | 2| 3| 3| 5| 5|  |  | 2|10|10|
 3  |  |  |  |  |  |  |  | 2| 2| 3| 5| 5|  |  |
                                       ^

12 is our lucky number! Backtracing to get the group:

12是我们的幸运数字!回溯获取组:

12 - 5 = 7        {5}
 7 - 3 = 4        {5, 3}
 4 - 4 = 0        {5, 3, 4}

The other set can then be calculated: {4,3,10,3,2,5} - {5,3,4} = {10,3,2}

然后可以计算另一组:{4,3,10,3,2,5} - {5,3,4} = {10,3,2}

All fields with a number are possible solutions for one bag. Choose the one that is furthest in the bottom right corner.

所有带有数字的字段都是一个包的可能解决方案。选择右下角最远的那个。

BTW: It's called the knapsack-problem.

顺便说一句:这叫做背包问题

If all weights (w1, ..., wn and W) are nonnegative integers, the knapsack problem can be solved in pseudo-polynomial time using dynamic programming.

如果所有的权重 (w1, ..., wn 和 W) 都是非负整数,背包问题可以使用动态规划在伪多项式时间内解决。

回答by unj2

Q. Given a multiset S of integers, is there a way to partition S into two subsetsS1 and S2 such that the sumof the numbers in S1 equals the sum of the numbers in S2?

问:给定一个整数的多集小号,有没有办法来划分S成两个子集S1和S2,使得总和在S1的数字等于在S2的数字的总和?

A.Set Partition Problem.

A.设置分区问题

Best of luck approximating. : )

祝你好运近似。:)

回答by Sean Turner

Well, you can find a solution to a percentage precision in polynomial time, but to actually find the optimal (absolute minimal difference) solution, the problem is NP-complete. This means that there is no polynomial time solution to the problem. As a result, even with a relatively small list of numbers, it is too compute intensive to solve. If you really need a solution, take a look at some of the approximation algorithms for this.

好吧,您可以在多项式时间内找到百分比精度的解决方案,但要真正找到最佳(绝对最小差异)解决方案,问题是 NP 完全的。这意味着该问题没有多项式时间解。因此,即使数字列表相对较小,计算量也太大而无法求解。如果您确实需要解决方案,请查看一些用于此的近似算法。

http://en.wikipedia.org/wiki/Subset_sum_problem

http://en.wikipedia.org/wiki/Subset_sum_problem

回答by odwl

Note that it is also an heuristic and I moved the sort out of the function.

请注意,它也是一种启发式方法,我将排序移出了该函数。

 def g(data):
   sums = [0, 0]
   for pair in zip(data[::2], data[1::2]):
     item1, item2 = sorted(pair)
     sums = sorted([sums[0] + item2, sums[1] + item1])
   print sums

data = sorted([2,3,10,5,8,9,7,3,5,2])
g(data)

回答by tom10

A test case where your method doesn't work is

您的方法不起作用的测试用例是

que = [1, 1, 50, 50, 50, 1000]

The problem is that you're analyzing things in pairs, and in this example, you want all the 50's to be in the same group. This should be solved though if you remove the pair analysis aspect and just do one entry at a time.

问题是您要成对分析事物,在本例中,您希望所有 50 人都在同一组中。如果您删除配对分析方面并一次只输入一个条目,这应该可以解决。

Here's the code that does this

这是执行此操作的代码

def make_teams(que):
    que.sort()
    que.reverse()
    if len(que)%2: que.insert(0,0)
    t1,t2 = [],[]
    while que:
        if abs(len(t1)-len(t2))>=len(que):
            [t1, t2][len(t1)>len(t2)].append(que.pop(0))
        else:
            [t1, t2][sum(t1)>sum(t2)].append(que.pop(0))
    print min(sum(t1),sum(t2)), max(sum(t1),sum(t2)), "\n"

if __name__=="__main__":
    que = [2,3,10,5,8,9,7,3,5,2]
    make_teams(que)
    que = [1, 1, 50, 50, 50, 1000]
    make_teams(que)

This give 27, 27 and 150, 1002 which are the answers that make sense to me.

这给出了 27, 27 和 150, 1002 这些对我来说有意义的答案。

Edit:In review, I find this to not actually work, though in the end, I'm not quite sure why. I'll post my test code here though, as it might be useful. The test just generates random sequence that have equal sums, puts these together and compares (with sad results).

编辑:在中,我发现这实际上不起作用,但最后,我不太确定为什么。我会在这里发布我的测试代码,因为它可能有用。该测试仅生成总和相等的随机序列,将它们放在一起并进行比较(结果令人遗憾)。

Edit #2:Based in the example pointed out by Unknown, [87,100,28,67,68,41,67,1], it's clear why my method doesn't work. Specifically, to solve this example, the two largest numbers need to both be added to the same sequence to get a valid solution.

编辑 #2:根据 Unknown 指出的示例[87,100,28,67,68,41,67,1],很明显为什么我的方法不起作用。具体来说,要解决这个例子,需要将两个最大的数都加到同一个序列中,才能得到一个有效的解决方案。

def make_sequence():
    """return the sums and the sequence that's devided to make this sum"""
    while 1:
        seq_len = randint(5, 200)
        seq_max = [5, 10, 100, 1000, 1000000][randint(0,4)]
        seqs = [[], []]
        for i in range(seq_len):
            for j in (0, 1):
                seqs[j].append(randint(1, seq_max))
        diff = sum(seqs[0])-sum(seqs[1])
        if abs(diff)>=seq_max: 
            continue
        if diff<0:
            seqs[0][-1] += -diff
        else:
            seqs[1][-1] += diff
        return sum(seqs[0]), sum(seqs[1]), seqs[0], seqs[1]

if __name__=="__main__":

    for i in range(10):
        s0, s1, seq0, seq1 = make_sequence()
        t0, t1 = make_teams(seq0+seq1)
        print s0, s1, t0, t1
        if s0 != t0 or s1 != t1:
            print "FAILURE", s0, s1, t0, t1

回答by klochner

It's actually PARTITION, a special case of KNAPSACK.

其实就是PARTITION,KNAPSACK的一个特例。

It is NP Complete, with pseudo-polynomial dp algorithms. The pseudo in pseudo-polynomial refers to the fact that the run time depends on the range of the weights.

它是 NP Complete,具有伪多项式 dp 算法。伪多项式中的伪是指运行时间取决于权重范围的事实。

In general you will have to first decide if there is an exact solution before you can admit a heuristic solution.

通常,您必须首先确定是否存在精确解,然后才能承认启发式解决方案。

回答by Graham Toal

They are obviously looking for a dynamic programming knapsack solution. So after my first effort (a pretty good original heuristic I thought), and my second effort (a really sneaky exact combinatorial solution that worked for shortish data sets, and even for sets up to 100 elements as long as the number of uniquevalues was low), I finally succumbed to peer pressure and wrote the one they wanted (not too hard - handling duplicated entries was the trickiest part - the underlying algorithm I based it on only works if all the inputs are unique - I'm sure glad that long longis big enough to hold 50 bits!).

他们显然在寻找动态编程背包解决方案。因此,在我的第一次努力(我认为是一个很好的原始启发式)和我的第二次努力之后(一个非常狡猾的精确组合解决方案,适用于较短的数据集,甚至适用于多达 100 个元素的集合,只要唯一值的数量是低),我终于屈服于同侪压力并写了他们想要的(不太难 - 处理重复条目是最棘手的部分 - 我基于它的底层算法只有在所有输入都是唯一的情况下才有效 - 我很高兴long long足以容纳 50 位!)。

So for all the test data and awkward edge cases I put together while testing my first two efforts, it gives the same answer. At least for the ones I checked with the combinatorial solver, I knowthey're correct. But I'm still failing the submission with some wrong answer!

因此,对于我在测试前两项工作时放在一起的所有测试数据和尴尬的边缘情况,它给出了相同的答案。至少对于我使用组合求解器检查过的那些,我知道它们是正确的。但我仍然没有通过一些错误的答案提交!

I'm not asking for anyone to fix my code here, but I would be very greatful if anyone can find a case for which the code below generates the wrong answer.

我不是要求任何人在这里修复我的代码,但如果有人能找到以下代码生成错误答案的案例,我将非常感激。

Thanks,

谢谢,

Graham

格雷厄姆

PS This code does always execute within the time limit but it is farfrom optimised. i'm keeping it simple until it passes the test, then I have some ideas to speed it up, maybe by a factor of 10 or more.

PS 此代码确实总是在时间限制内执行,但远未优化。我一直保持简单,直到它通过测试,然后我有一些想法可以加快它的速度,可能提高 10 倍或更多。

#include <stdio.h>

#define TRUE (0==0)
#define FALSE (0!=0)

static int debug = TRUE;

//int simple(const void *a, const void *b) {
//  return *(int *)a - *(int *)b;
//}

int main(int argc, char **argv) {
  int p[101];
  char *s, line[128];
  long long mask, c0[45001], c1[45001];
  int skill, players, target, i, j, tests, total = 0;

  debug = (argc == 2 && argv[1][0] == '-' && argv[1][1] == 'd' && argv[1][2] == '
import random
def f(data, nb_iter=20):
  diff = None
  sums = (None, None)
  for _ in xrange(nb_iter):
    random.shuffle(data)
    mid = len(data)/2
    sum1 = sum(data[:mid])
    sum2 = sum(data[mid:])
    if diff is None or abs(sum1 - sum2) < diff:
      sums = (sum1, sum2)
  print sums
'); s = fgets(line, 127, stdin); tests = atoi(s); while (tests --> 0) { for (i = 0; i < 45001; i++) {c0[i] = 0LL;} s = fgets(line, 127, stdin); /* blank line */ s = fgets(line, 127, stdin); /* no of players */ players = atoi(s); for (i = 0; i < players; i++) {s = fgets(line, 127, stdin); p[i] = atoi(s);} if (players == 1) { printf("0 %d\n", p[0]); } else { if (players&1) p[players++] = 0; // odd player fixed by adding a single player of 0 strength //qsort(p, players, sizeof(int), simple); total = 0; for ( i = 0; i < players; i++) total += p[i]; target = total/2; // ok if total was odd and result rounded down - teams of n, n+1 mask = 1LL << (((long long)players/2LL)-1LL); for (i = 0; i < players; i++) { for (j = 0; j <= target; j++) {c1[j] = 0LL;} // memset would be faster skill = p[i]; //add this player to every other player and every partial subset for (j = 0; j <= target-skill; j++) { if (c0[j]) c1[j+skill] = c0[j]<<1; // highest = highest j+skill for later optimising } c0[skill] |= 1; // so we don't add a skill number to itself unless it occurs more than once for (j = 0; j <= target; j++) {c0[j] |= c1[j];} if (c0[target]&mask) break; // early return for perfect fit! } for (i = target; i > 0; i--) { if (debug || (c0[i] & mask)) { fprintf(stdout, "%d %d\n", i, total-i); if (debug) { if (c0[i] & mask) printf("******** ["); else printf(" ["); for (j = 0; j <= players; j++) if (c0[i] & (1LL<<(long long)j)) printf(" %d", j+1); printf(" ]\n"); } else break; } } } if (tests) printf("\n"); } return 0; }

回答by Graham Toal

In an earlier comment I hypothesized that the problem as set was tractable because they had carefully chosen the test data to be compatible with various algorithms within the time alloted. This turned out not to be the case - instead it is the problem constraints - numbers no higher than 450 and a final set no larger than 50 numbers is the key. These are compatible with solving the problem using the dynamic programming solution I put up in a later post. None of the other algorithms (heuristics, or exhaustive enumeration by a combinatorial pattern generator) can possibly work because there will be test cases large enough or hard enough to break those algorithms. It's rather annoying to be honest because those other solutions are more challenging and certainly more fun. Note that without a lot of extra work, the dynamic programming solution just says whether a solution is possible with N/2 for any given sum, but it doesn't tell you the contents of either partition.

在之前的评论中,我假设设置的问题是易于处理的,因为他们在分配的时间内仔细选择了与各种算法兼容的测试数据。事实证明并非如此——而是问题的限制——数字不超过 450 并且最终集合不超过 50 个数字是关键。这些与使用我在稍后的帖子中提出的动态规划解决方案解决问题兼容。其他算法(启发式算法或组合模式生成器的详尽枚举)都不可能工作,因为会有足够大或足够硬的测试用例来破坏这些算法。老实说这很烦人,因为其他解决方案更具挑战性,当然也更有趣。请注意,无需大量额外工作,

回答by odwl

After some thinking, for not too big problem, I think that the best kind of heuristics will be something like:

经过一番思考,对于不太大的问题,我认为最好的启发式方法是这样的:

##代码##

You can adjust nb_iter if the problem is bigger.

如果问题更大,您可以调整 nb_iter。

It solves all the problem mentioned above mostly all the times.

它几乎始终解决了上述所有问题。