python lambda 与列表理解性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1632902/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 22:44:38  来源:igfitidea点击:

lambda versus list comprehension performance

pythonlambdasetlist-comprehension

提问by TallPaul

I recently posted a question using a lambda function and in a reply someone had mentioned lambda is going out of favor, to use list comprehensions instead. I am relatively new to Python. I ran a simple test:

我最近发布了一个使用 lambda 函数的问题,在回复中有人提到 lambda 不再受欢迎,而是使用列表推导式。我对 Python 比较陌生。我进行了一个简单的测试:

import time

S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in S for y in T if x==y]
time2 = time.time()
print 'time diff [x for x in S for y in T if x==y]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N
#
#
#http://snipt.net/voyeg3r/python-intersect-lists/
time1 = time.time()
N = [val for val in S if val in T]
time2 = time.time()
print 'time diff [val for val in S if val in T]=', time2-time1
#print N
#
#
time1 = time.time()
N= list(set(S) & set(T))
time2 = time.time()
print 'time diff list(set(S) & set(T))=', time2-time1
#print N  #the results will be unordered as compared to the other ways!!!
#
#
time1 = time.time()
N=[]
for x in S:
    for y in T:
        if x==y:
            N.append(x)
time2 = time.time()
print 'time diff using traditional for loop', time2-time1
#print N

They all print the same N so I commented that print stmt out (except the last way it's unordered), but the resulting time differences were interesting over repeated tests as seen in this one example:

它们都打印相同的 N,所以我评论说打印 stmt 输出(除了最后一种无序的方式),但由此产生的时间差异在重复测试中很有趣,如本示例所示:

time diff [x for x in S for y in T if x==y]= 54.875
time diff filter(lambda x:x in S,T)= 0.391000032425
time diff [val for val in S if val in T]= 12.6089999676
time diff list(set(S) & set(T))= 0.125
time diff using traditional for loop 54.7970001698

So while I find list comprehensions on the whole easier to read, there seems to be some performance issues at least in this example.

因此,虽然我发现列表推导式整体上更易于阅读,但至少在此示例中似乎存在一些性能问题。

So, two questions:

所以,两个问题:

  1. Why is lambda etc being pushed aside?

  2. For the list comprehension ways, is there a more efficient implementation and how would you KNOW it's more efficient without testing? I mean, lambda/map/filter was supposed to be less efficient because of the extra function calls, but it seems to be MORE efficient.

  1. 为什么 lambda 等被推到一边?

  2. 对于列表理解方式,是否有更有效的实现,您如何知道它在没有测试的情况下更有效?我的意思是,由于额外的函数调用,lambda/map/filter 应该效率较低,但似乎效率更高。

Paul

保罗

回答by Greg Hewgill

Your tests are doing very different things. With S being 1M elements and T being 300:

您的测试正在做非常不同的事情。S 是 1M 个元素,T 是 300:

[x for x in S for y in T if x==y]= 54.875

This option does 300M equality comparisons.

此选项进行 300M 次相等比较。

 

 

filter(lambda x:x in S,T)= 0.391000032425

This option does 300 linear searches through S.

此选项通过 S 执行 300 次线性搜索。

 

 

[val for val in S if val in T]= 12.6089999676

This option does 1M linear searches through T.

此选项通过 T 执行 1M 线性搜索。

 

 

list(set(S) & set(T))= 0.125

This option does two set constructions and one set intersection.

此选项执行两组构造和一组交集。



The differences in performance between these options is much more related to the algorithms each one is using, ratherthan any difference between list comprehensions and lambda.

这些选项之间的性能差异更多地与每个选项使用的算法相关,不是列表推导式和lambda.

回答by Omnifarious

When I fix your code so that the list comprehension and the call to filterare actually doing the same work things change a whole lot:

当我修复你的代码以便列表理解和调用filter实际上做同样的工作时,事情发生了很大的变化:

import time

S=[x for x in range(1000000)]
T=[y**2 for y in range(300)]
#
#
time1 = time.time()
N=[x for x in T if x in S]
time2 = time.time()
print 'time diff [x for x in T if x in S]=', time2-time1
#print N
#
#
time1 = time.time()
N=filter(lambda x:x in S,T)
time2 = time.time()
print 'time diff filter(lambda x:x in S,T)=', time2-time1
#print N

Then the output is more like:

然后输出更像是:

time diff [x for x in T if x in S]= 0.414485931396
time diff filter(lambda x:x in S,T)= 0.466315984726

So the list comprehension has a time that's generally pretty close to and usually less than the lambda expression.

因此,列表推导式的时间通常非常接近并通常小于 lambda 表达式。

The reason lambda expressions are being phased out is that many people think they are a lot less readable than list comprehensions. I sort of reluctantly agree.

lambda 表达式被淘汰的原因是许多人认为它们的可读性比列表推导式低得多。我有点不情愿地同意。

回答by steveha

Q: Why is lambda etc being pushed aside?

问:为什么 lambda 等被推到一边?

A: List comprehensions and generator expressions are generally considered to be a nice mix of power and readability. The pure functional-programming style where you use map(), reduce(), and filter()with functions (often lambdafunctions) is considered not as clear. Also, Python has added built-in functions that nicely handle all the major uses for reduce().

A:列表推导式和生成器表达式通常被认为是强大和可读性的完美结合。使用map()reduce()filter()with 函数(通常是lambda函数)的纯函数式编程风格被认为不太清楚。此外,Python 还添加了内置函数,可以很好地处理reduce().

Suppose you wanted to sum a list. Here are two ways of doing it.

假设您想对一个列表求和。这里有两种方法。

lst = range(10)
print reduce(lambda x, y: x + y, lst)

print sum(lst)

Sign me up as a fan of sum()and not a fan of reduce()to solve this problem. Here's another, similar problem:

将我注册为解决此问题的粉丝sum()而不是粉丝reduce()。这是另一个类似的问题:

lst = range(10)
print reduce(lambda x, y: bool(x or y), lst)

print any(lst)

Not only is the any()solution easier to understand, but it's also much faster; it has short-circuit evaluation, such that it will stop evaluating as soon as it has found any true value. The reduce()has to crank through the entire list. This performance difference would be stark if the list was a million items long, and the first item evaluated true. By the way, any()was added in Python 2.5; if you don't have it, here is a version for older versions of Python:

any()解决方案不仅更容易理解,而且速度也更快;它具有短路评估,因此一旦发现任何真实值,它就会停止评估。在reduce()有通过曲柄整个列表。如果列表有 100 万个项目,并且第一个项目评估为真,则这种性能差异将非常明显。顺便any()说一句,是在 Python 2.5 中添加的;如果您没有,这里有一个适用于旧版 Python 的版本:

def any(iterable):
    for x in iterable:
        if x:
            return True
    return False

Suppose you wanted to make a list of squares of even numbers from some list.

假设您想从某个列表中创建一个偶数平方列表。

lst = range(10)
print map(lambda x: x**2, filter(lambda x: x % 2 == 0, lst))

print [x**2 for x in lst if x % 2 == 0]

Now suppose you wanted to sum that list of squares.

现在假设您想对该平方列表求和。

lst = range(10)
print sum(map(lambda x: x**2, filter(lambda x: x % 2 == 0, lst)))

# list comprehension version of the above
print sum([x**2 for x in lst if x % 2 == 0])

# generator expression version; note the lack of '[' and ']'
print sum(x**2 for x in lst if x % 2 == 0)

The generator expression actually just returns an iterable object. sum()takes the iterable and pulls values from it, one by one, summing as it goes, until all the values are consumed. This is the most efficient way you can solve this problem in Python. In contrast, the map()solution, and the equivalent solution with a list comprehension inside the call to sum(), must first build a list; this list is then passed to sum(), used once, and discarded. The time to build the list and then delete it again is just wasted. (EDIT: and note that the version with both mapand filtermust build twolists, one built by filterand one built by map; bothlists are discarded.) (EDIT: But in Python 3.0 and newer, map() and filter() are now both "lazy" and produce an iterator instead of a list; so this point is less true than it used to be. Also, in Python 2.x you were able to use itertools.imap() and itertools.ifilter() for iterator-based map and filter. But I continue to prefer the generator expression solutions over any map/filter solutions.)

生成器表达式实际上只是返回一个可迭代对象。 sum()获取可迭代对象并从中提取值,一个接一个,不断求和,直到所有值都被消耗掉。这是在 Python 中解决此问题的最有效方法。相比之下,map()解决方案,以及在调用中具有列表推导式的等效解决方案sum(),必须首先构建一个列表;然后将此列表传递给sum(),使用一次,然后丢弃。建立列表然后再次删除它的时间只是浪费了。(编辑:并注意带有map和的版本filter必须构建两个列表,一个是由构建的,一个是由filter构建的map两者都是列表被丢弃。)(编辑:但是在 Python 3.0 和更新版本中,map() 和 filter() 现在都是“懒惰的”并且产生一个迭代器而不是一个列表;所以这点不像以前那么真实。另外,在 Python 2.x 中,您可以将 itertools.imap() 和 itertools.ifilter() 用于基于迭代器的映射和过滤器。但与任何映射/过滤器解决方案相比,我仍然更喜欢生成器表达式解决方案。)

By composing map(), filter(), and reduce()in combination with lambdafunctions, you can do many powerful things. But Python has idiomatic ways to solve the same problems which are simultaneously better performing and easier to read and understand.

通过组合map()filter()、 以及reduce()lambda函数的组合,您可以做许多强大的事情。但是 Python 有解决相同问题的惯用方法,这些方法同时性能更好,更易于阅读和理解。

回答by Alex Martelli

Many people have already pointed out that you're comparing apples with oranges, etc, etc. But I think nobody's shown how to a really simple comparison -- list comprehension vs map plus lambda with little else to get in the way -- and that might be:

很多人已经指出你在比较苹果和橙子等等。但我认为没有人展示过如何进行真正简单的比较——列表理解与地图加 lambda 几乎没有其他障碍——而且可能:

$ python -mtimeit -s'L=range(1000)' 'map(lambda x: x+1, L)'
1000 loops, best of 3: 328 usec per loop
$ python -mtimeit -s'L=range(1000)' '[x+1 for x in L]'
10000 loops, best of 3: 129 usec per loop

Here, you can see very sharply the cost of lambda -- about 200 microseconds, which in the case of a sufficiently simple operation such as this one swamps the operation itself.

在这里,您可以非常清晰地看到 lambda 的成本——大约 200 微秒,在这样一个足够简单的操作的情况下,这会淹没操作本身。

Numbers are very similar with filter of course, since the problem is notfilter or map, but rather the lambda itself:

数字当然与 filter 非常相似,因为问题不是filter 或 map,而是 lambda 本身:

$ python -mtimeit -s'L=range(1000)' '[x for x in L if not x%7]'
10000 loops, best of 3: 162 usec per loop
$ python -mtimeit -s'L=range(1000)' 'filter(lambda x: not x%7, L)'
1000 loops, best of 3: 334 usec per loop

No doubt the fact that lambda can be less clear, or its weird connection with Sparta (Spartans had a Lambda, for "Lakedaimon", painted on their shields -- this suggests that lambda is pretty dictatorial and bloody;-) have at least as much to do with its slowly falling out of fashion, as its performance costs. But the latter are quite real.

毫无疑问,lambda 可能不太清楚,或者它与斯巴达的奇怪联系(斯巴达人有一个 Lambda,代表“Lakedaimon”,画在他们的盾牌上——这表明 lambda 是非常专制和血腥的;-) 至少有由于其性能成本,它慢慢地过时了。但后者是相当真实的。

回答by ChristopheD

First of all, test like this:

首先,像这样测试:

import timeit

S=[x for x in range(10000)]
T=[y**2 for y in range(30)]

print "v1", timeit.Timer('[x for x in S for y in T if x==y]',
             'from __main__ import S,T').timeit(100)
print "v2", timeit.Timer('filter(lambda x:x in S,T)',
             'from __main__ import S,T').timeit(100)
print "v3", timeit.Timer('[val for val in T if val in S]',
             'from __main__ import S,T').timeit(100)
print "v4", timeit.Timer('list(set(S) & set(T))',
             'from __main__ import S,T').timeit(100)

And basically you are doing different things each time you test. When you would rewrite the list-comprehension for example as

基本上,您每次测试时都在做不同的事情。例如,当您将列表理解重写为

[val for val in T if val in S]

performance would be on par with the 'lambda/filter' construct.

性能将与 'lambda/filter' 结构相提并论。

回答by John La Rooy

Sets are the correct solution for this. However try swapping S and T and see how long it takes!

集合是解决这个问题的正确方法。但是尝试交换 S 和 T,看看需要多长时间!

filter(lambda x:x in T,S)

$ python -m timeit -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' 'filter(lambda x:x in S,T)'
10 loops, best of 3: 485 msec per loop
$ python -m timeit -r1 -n1 -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' 'filter(lambda x:x in T,S)'
1 loops, best of 1: 19.6 sec per loop

So you see that the order of S and T are quite important

所以你看S和T的顺序很重要

Changing the order of the list comprehension to match the filter gives

更改列表理解的顺序以匹配过滤器给出

$ python -m timeit  -s'S=[x for x in range(1000000)];T=[y**2 for y in range(300)]' '[x for x in T if x in S]'
10 loops, best of 3: 441 msec per loop

So if fact the list comprehension is slightly faster than the lambda on my computer

因此,如果事实上列表理解比我计算机上的 lambda 略快

回答by Ants Aasma

Your list comprehension and lambda are doing different things, the list comprehension matching the lambda would be [val for val in T if val in S].

您的列表理解和 lambda 正在做不同的事情,匹配 lambda 的列表理解将是[val for val in T if val in S].

Efficiency isn't the reason why list comprehension are preferred (while they actually are slightly faster in almost all cases). The reason why they are preferred is readability.

效率并不是首选列表理解的原因(尽管它们实际上在几乎所有情况下都略快)。首选它们的原因是可读性。

Try it with smaller loop body and larger loops, like make T a set, and iterate over S. In that case on my machine the list comprehension is nearly twice as fast.

尝试使用较小的循环体和较大的循环,例如将 T 设为一个集合,然后迭代 S。在这种情况下,在我的机器上,列表理解的速度几乎是其两倍。

回答by Jochen Ritzel

Your profiling is done wrong. Take a look the timeit moduleand try again.

你的分析做错了。查看timeit 模块并重试。

lambdadefines anonymous functions. Their main problem is that many people don't know the whole python library and use them to re-implement functions that are already in the operator, functoolsetc module ( and much faster ).

lambda定义匿名函数。他们的主要问题是很多人不知道整个 python 库,并使用它们来重新实现operator, functoolsetc 模块中已经存在的函数(而且速度要快得多)。

List comprehensions have nothing to do with lambda. They are equivalent to the standard filterand mapfunctions from functional languages. LCs are preferred because they can be used as generators too, not to mention readability.

列表推导式与lambda. 它们相当于函数式语言的标准filtermap函数。LC 是首选,因为它们也可以用作生成器,更不用说可读性了。

回答by Harriv

This is pretty fast:

这非常快:

def binary_search(a, x, lo=0, hi=None):
    if hi is None:
        hi = len(a)
    while lo < hi:
        mid = (lo+hi)//2
        midval = a[mid]
        if midval < x:
            lo = mid+1
        elif midval > x: 
            hi = mid
        else:
            return mid
    return -1

time1 = time.time()
N = [x for x in T if binary_search(S, x) >= 0]
time2 = time.time()
print 'time diff binary search=', time2-time1

Simply: less comparisions, less time.

简单地说:更少的比较,更少的时间。

回答by RedGlyph

List comprehensions can make a bigger difference if you have to process your filtered results. In your case you just build a list, but if you had to do something like this:

如果您必须处理过滤结果,列表推导式可以产生更大的不同。在您的情况下,您只需构建一个列表,但如果您必须执行以下操作:

n = [f(i) for i in S if some_condition(i)]

you would gain from LC optimization over this:

您将从 LC 优化中获益:

n = map(f, filter(some_condition(i), S))

simply because the latter has to build an intermediate list (or tuple, or string, depending on the nature of S). As a consequence you will also notice a different impact on the memory used by each method, the LC will keep lower.

仅仅是因为后者必须构建一个中间列表(或元组或字符串,取决于 S 的性质)。因此,您还会注意到对每种方法使用的内存的不同影响,LC 将保持较低。

The lambda in itself does not matter.

lambda 本身并不重要。