Python 运行总计的列表理解

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3432830/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:04:03  来源:igfitidea点击:

List comprehension for running total

pythonlist-comprehensionrunning-total

提问by Kit

I want to get a running total from a list of numbers.

我想从数字列表中得到一个运行总数。

For demo purposes, I start with a sequential list of numbers using range

出于演示目的,我首先使用数字的顺序列表 range

a = range(20)

runningTotal = []
for n in range(len(a)):
    new = runningTotal[n-1] + a[n] if n > 0 else a[n]
    runningTotal.append(new)

# This one is a syntax error
# runningTotal = [a[n] for n in range(len(a)) if n == 0 else runningTotal[n-1] + a[n]]

for i in zip(a, runningTotal):
    print "{0:>3}{1:>5}".format(*i)

yields

产量

  0    0
  1    1
  2    3
  3    6
  4   10
  5   15
  6   21
  7   28
  8   36
  9   45
 10   55
 11   66
 12   78
 13   91
 14  105
 15  120
 16  136
 17  153
 18  171
 19  190

As you can see, I initialize an empty list [], then append()in each loop iteration. Is there a more elegant way to this, like a list comprehension?

如您所见,我初始化了一个空列表[],然后append()在每个循环迭代中。有没有更优雅的方法,比如列表理解?

采纳答案by Alex Martelli

A list comprehension has no good (clean, portable) way to refer to the very list it's building. One good and elegant approach might be to do the job in a generator:

列表理解没有好的(干净的、可移植的)方式来引用它正在构建的列表。一种好的和优雅的方法可能是在生成器中完成这项工作:

def running_sum(a):
  tot = 0
  for item in a:
    tot += item
    yield tot

to get this as a list instead, of course, use list(running_sum(a)).

要将其作为列表来代替,当然,请使用list(running_sum(a)).

回答by pjz

I'm not sure about 'elegant', but I think the following is much simpler and more intuitive (at the cost of an extra variable):

我不确定“优雅”,但我认为以下更简单、更直观(以额外的变量为代价):

a = range(20)

runningTotal = []

total = 0
for n in a:
  total += n
  runningTotal.append(total)

The functional way to do the same thing is:

做同样事情的功能方法是:

a = range(20)
runningTotal = reduce(lambda x, y: x+[x[-1]+y], a, [0])[1:]

...but that's much less readable/maintainable, etc.

...但那可读性/可维护性要差得多,等等。

@Omnifarous suggests this should be improved to:

@Omnifarous 建议这应该改进为:

a = range(20)
runningTotal = reduce(lambda l, v: (l.append(l[-1] + v) or l), a, [0])

...but I still find that less immediately comprehensible than my initial suggestion.

...但我仍然觉得这比我最初的建议更不容易理解。

Remember the words of Kernighan: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

记住 Kernighan 的话:“调试的难度是编写代码的两倍。因此,如果您尽可能聪明地编写代码,那么根据定义,您就不够聪明来调试它。”

回答by Mr Fooz

If you can use numpy, it has a built-in function named cumsumthat does this.

如果您可以使用numpy,它有一个名为的内置函数cumsum来执行此操作。

import numpy
tot = numpy.cumsum(a)  # returns a numpy.ndarray
tot = list(tot)        # if you prefer a list

回答by aaronasterling

I would use a coroutine for this:

我会为此使用协程:

def runningTotal():
    accum = 0
    yield None
    while True:
        accum += yield accum

tot = runningTotal()
next(tot)
running_total = [tot.send(i) for i in xrange(N)]

回答by satoru

This can be implemented in 2 lines in Python.

这可以在 Python 中用 2 行代码实现。

Using a default parameter eliminates the need to maintain an aux variable outside, and then we just do a mapto the list.

使用默认参数就不需要在外面维护一个 aux 变量,然后我们只map需要对列表做一个。

def accumulate(x, l=[0]): l[0] += x; return l[0];
map(accumulate, range(20))

回答by Tony Veijalainen

This is inefficient as it does it every time from beginning but possible it is:

这是低效的,因为它从一开始就每次都这样做,但可能是:

a = range(20)
runtot=[sum(a[:i+1]) for i,item in enumerate(a)]
for line in zip(a,runtot):
    print line

回答by Cirdec

You are looking for two things: fold (reduce) and a funny function that keeps a list of the results of another function, which I have called running. I made versions both with and without an initial parameter; either way these need to go to reduce with an initial [].

您正在寻找两件事:折叠(减少)和一个有趣的函数,它保留另一个函数的结果列表,我称之为运行。我制作了带有和不带有初始参数的版本;无论哪种方式,这些都需要使用初始 [] 来减少。

def last_or_default(list, default):
    if len(list) > 0:
        return list[-1]
    return default

def initial_or_apply(list, f, y):
    if list == []:
        return [y]
    return list + [f(list[-1], y)]

def running_initial(f, initial):
    return (lambda x, y: x + [f(last_or_default(x,initial), y)])

def running(f):
    return (lambda x, y: initial_or_apply(x, f, y))

totaler = lambda x, y: x + y
running_totaler = running(totaler)
running_running_totaler = running_initial(running_totaler, [])

data = range(0,20)
running_total = reduce(running_totaler, data, [])
running_running_total = reduce(running_running_totaler, data, [])

for i in zip(data, running_total, running_running_total):
    print "{0:>3}{1:>4}{2:>83}".format(*i)

These will take a long time on really large lists due to the + operator. In a functional language, if done correctly, this list construction would be O(n).

由于 + 运算符,这些将在非常大的列表上花费很长时间。在函数式语言中,如果正确完成,这个列表构造将是 O(n)。

Here are the first few lines of output:

这是输出的前几行:

0   0                      [0]
1   1                   [0, 1]
2   3                [0, 1, 3]
3   6             [0, 1, 3, 6]
4  10         [0, 1, 3, 6, 10]
5  15     [0, 1, 3, 6, 10, 15]
6  21 [0, 1, 3, 6, 10, 15, 21]

回答by user1908017

I wanted to do the same thing to generate cumulative frequencies that I could use bisect_left over - this is the way I've generated the list;

我想做同样的事情来生成我可以使用 bisect_left over 的累积频率 - 这就是我生成列表的方式;

[ sum( a[:x] ) for x in range( 1, len(a)+1 ) ]

回答by mleyfman

Use itertools.accumulate(). Here is an example:

使用itertools.accumulate(). 下面是一个例子:

from itertools import accumulate

a = range(20)
runningTotals = list(accumulate(a))

for i in zip(a, runningTotals):
    print "{0:>3}{1:>5}".format(*i)

This only works on Python 3. On Python 2 you can use the backport in the more-itertoolspackage.

这仅适用于 Python 3。在 Python 2 上,您可以使用more-itertools包中的 backport 。

回答by April Arcus

When we take the sum of a list, we designate an accumulator (memo) and then walk through the list, applying the binary function "x+y" to each element and the accumulator. Procedurally, this looks like:

当我们对一个列表求和时,我们指定一个累加器 ( memo) 然后遍历列表,将二元函数“x+y”应用于每个元素和累加器。在程序上,这看起来像:

def mySum(list):
    memo = 0
    for e in list:
        memo = memo + e
    return memo

This is a common pattern, and useful for things other than taking sums — we can generalize it to any binary function, which we'll supply as a parameter, and also let the caller specify an initial value. This gives us a function known as reduce, foldl, or inject[1]:

这是一个常见的模式,对于求和以外的事情很有用——我们可以将它推广到任何二元函数,我们将其作为参数提供,并且还让调用者指定一个初始值。这给我们称为函数reducefoldlinject[1]

def myReduce(function, list, initial):
    memo = initial
    for e in list:
        memo = function(memo, e)
    return memo

def mySum(list):
    return myReduce(lambda memo, e: memo + e, list, 0)

In Python 2, reducewas a built-in function, but in Python 3 it's been moved to the functoolsmodule:

在 Python 2 中,reduce是一个内置函数,但在 Python 3 中,它被移到了functools模块中:

from functools import reduce

We can do all kinds of cool stuff with reducedepending on the function we supply as its the first argument. If we replace "sum" with "list concatenation", and "zero" with "empty list", we get the (shallow) copyfunction:

reduce根据我们提供的函数作为第一个参数,我们可以做各种很酷的事情。如果我们用“列表连接”替换“和”,用“空列表”替换“零”,我们得到(浅)copy函数:

def myCopy(list):
    return reduce(lambda memo, e: memo + [e], list, [])

myCopy(range(10))
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

If we add a transformfunction as another parameter to copy, and apply it before concatenating, we get map:

如果我们将一个transform函数作为另一个参数添加到copy,并在连接之前应用它,我们得到map

def myMap(transform, list):
    return reduce(lambda memo, e: memo + [transform(e)], list, [])

myMap(lambda x: x*2, range(10))
> [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

If we add a predicatefunction that takes eas a parameter and returns a boolean, and use it to decide whether or not to concatenate, we get filter:

如果我们添加一个predicate函数e作为参数并返回一个布尔值,并使用它来决定是否连接,我们得到filter

def myFilter(predicate, list):
    return reduce(lambda memo, e: memo + [e] if predicate(e) else memo, list, [])

myFilter(lambda x: x%2==0, range(10))
> [0, 2, 4, 6, 8]

mapand filterare sort of unfancy ways of writing list comprehensions —?we could also have said [x*2 for x in range(10)]or [x for x in range(10) if x%2==0]. There's no corresponding list comprehension syntax for reduce, because reduceisn't required to return a list at all (as we saw with sum, earlier, which Python also happens to offer as a built-in function).

map并且filter是编写列表推导式的一种奇怪的方式——我们也可以说[x*2 for x in range(10)][x for x in range(10) if x%2==0]。没有对应的列表解析语法reduce,因为reduce根本不需要返回列表(正如我们之前看到的sum,Python 也恰好作为内置函数提供)。

It turns out that for computing a running sum, the list-building abilities of reduceare exactly what we want, and probably the most elegant way to solve this problem, despite its reputation (along with lambda) as something of an un-pythonic shibboleth. The version of reducethat leaves behind copies of its old values as it runs is called reductionsor scanl[1], and it looks like this:

事实证明,对于计算运行总和, 的列表构建能力reduce正是我们想要的,并且可能是解决这个问题的最优雅的方法,尽管它的声誉(以及lambda)作为某种非 Pythonic shibboleth 的东西。该版本在运行时reduce留下其旧值的副本称为reductionsscanl[1],它看起来像这样:

def reductions(function, list, initial):
    return reduce(lambda memo, e: memo + [function(memo[-1], e)], list, [initial])

So equipped, we can now define:

如此配备,我们现在可以定义:

def running_sum(list):
    first, rest = list[0], list[1:]
    return reductions(lambda memo, e: memo + e, rest, first)

running_sum(range(10))
> [0, 1, 3, 6, 10, 15, 21, 28, 36, 45]

While conceptually elegant, this precise approach fares poorly in practice with Python. Because Python's list.append()mutates a list in place but doesn't return it, we can't use it effectively in a lambda, and have to use the +operator instead. This constructs a whole new list, which takes time proportional to the length of the accumulated list so far (that is, an O(n) operation). Since we're already inside the O(n) forloop of reducewhen we do this, the overall time complexity compounds to O(n2).

虽然在概念上很优雅,但这种精确的方法在 Python 的实践中表现不佳。因为 Python 会list.append()在原地改变一个列表但不返回它,所以我们不能在 lambda 中有效地使用它,而必须改用+运算符。这将构建一个全新的列表,其花费的时间与迄今为止累积列表的长度成正比(即 O(n) 操作)。由于我们已经在执行此操作的 O(n)for循环中reduce,因此整体时间复杂度复合为 O(n 2)。

In a language like Ruby[2], where array.push ereturns the mutated array, the equivalent runs in O(n) time:

在像 Ruby [2]这样的语言中,array.push e返回 mutated array,等效的运行时间为 O(n):

class Array
  def reductions(initial, &proc)
    self.reduce [initial] do |memo, e|
      memo.push proc.call(memo.last, e)
    end
  end
end

def running_sum(enumerable)
  first, rest = enumerable.first, enumerable.drop(1)
  rest.reductions(first, &:+)
end

running_sum (0...10)
> [0, 1, 3, 6, 10, 15, 21, 28, 36, 45]

same in JavaScript[2], whose array.push(e)returns e(not array), but whose anonymous functions allow us to include multiple statements, which we can use to separately specify a return value:

与 JavaScript [2]相同,其array.push(e)返回值e(不是array),但其匿名函数允许我们包含多个语句,我们可以使用这些语句来单独指定返回值:

function reductions(array, callback, initial) {
    return array.reduce(function(memo, e) {
        memo.push(callback(memo[memo.length - 1], e));
        return memo;
    }, [initial]);
}

function runningSum(array) {
    var first = array[0], rest = array.slice(1);
    return reductions(rest, function(memo, e) {
        return x + y;
    }, first);
}

function range(start, end) {
    return(Array.apply(null, Array(end-start)).map(function(e, i) {
        return start + i;
    }
}

runningSum(range(0, 10));
> [0, 1, 3, 6, 10, 15, 21, 28, 36, 45]

So, how can we solve this while retaining the conceptual simplicity of a reductionsfunction that we just pass lambda x, y: x + yto in order to create the running sum function? Let's rewrite reductionsprocedurally. We can fix the accidentally quadraticproblem, and while we're at it, pre-allocate the result list to avoid heap thrashing[3]:

那么,我们如何解决这个问题,同时保留reductions我们刚刚传递的函数的概念简单性,lambda x, y: x + y以便创建运行 sum 函数?让我们按reductions程序重写。我们可以修复意外的二次问题,并且在解决这个问题的同时,预先分配结果列表以避免堆抖动[3]

def reductions(function, list, initial):
    result = [None] * len(list)
    result[0] = initial
    for i in range(len(list)):
        result[i] = function(result[i-1], list[i])
    return result

def running_sum(list):
    first, rest = list[0], list[1:]
    return reductions(lambda memo, e: memo + e, rest, first)

running_sum(range(0,10))
> [0, 1, 3, 6, 10, 15, 21, 28, 36, 45]

This is the sweet spot for me: O(n) performance, and the optimized procedural code is tucked away under a meaningful name where it can be re-used the next time you need to write a function that accumulates intermediate values into a list.

这对我来说是最佳点:O(n) 性能,并且优化的过程代码隐藏在一个有意义的名称下,下次您需要编写一个将中间值累积到列表中的函数时可以重新使用它。

  1. The names reduce/reductionscome from the LISP tradition, foldl/scanlfrom the ML tradition, and injectfrom the Smalltalk tradition.
  2. Python's Listand Ruby's Arrayare both implementations of an automatically resizing data structure known as a "dynamic array" (or std::vectorin C++). JavaScript's Arrayis a little more baroque, but behaves identically provided you don't assign to out of bounds indices or mutate Array.length.
  3. The dynamic array that forms the backing store of the list in the Python runtime will resize itself every time the list's length crosses a power of two. Resizing a list means allocating a new list on the heap of twice the size of the old one, copying the contents of the old list into the new one, and returning the old list's memory to the system. This is an O(n) operation, but because it happens less and less frequently as the list grows larger and larger, the time complexity of appending to a list works out to O(1) in the average case. However, the "hole" left by the old list can sometimes be difficult to recycle, depending on its position in the heap. Even with garbage collection and a robust memory allocator, pre-allocating an array of known size can save the underlying systems some work. In an embedded environment without the benefit of an OS, this kind of micro-management becomes very important.
  1. 名称reduce/reductions来自 LISP 传统、foldl/scanl来自 ML 传统和injectSmalltalk 传统。
  2. PythonList和 RubyArray都是自动调整大小的数据结构的实现,称为“动态数组”(或std::vector在 C++ 中)。JavaScriptArray有点巴洛克风格,但如果您不分配越界索引或 mutate ,则其行为相同Array.length
  3. 在 Python 运行时中形成列表后备存储的动态数组将在每次列表长度超过 2 的幂时调整自身大小。调整列表大小意味着在堆上分配一个两倍于旧列表大小的新列表,将旧列表的内容复制到新列表中,并将旧列表的内存返回给系统。这是一个 O(n) 操作,但是因为随着列表变得越来越大,它发生的频率越来越低,所以在平均情况下,附加到列表的时间复杂度为 O(1)。但是,旧列表留下的“洞”有时很难回收,这取决于它在堆中的位置。即使使用垃圾收集和强大的内存分配器,预先分配已知大小的数组也可以为底层系统节省一些工作。