什么时候不适合使用 python 生成器?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/245792/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
When is not a good time to use python generators?
提问by David Eyk
This is rather the inverse of What can you use Python generator functions for?: python generators, generator expressions, and the itertools
module are some of my favorite features of python these days. They're especially useful when setting up chains of operations to perform on a big pile of data--I often use them when processing DSV files.
这与您可以使用 Python 生成器函数做什么相反?:python 生成器、生成器表达式和itertools
模块是我最近最喜欢的 Python 的一些特性。它们在设置操作链以对大量数据执行时特别有用——我在处理 DSV 文件时经常使用它们。
So when is it nota good time to use a generator, or a generator expression, or an itertools
function?
那么什么时候不适合使用生成器、生成器表达式或itertools
函数呢?
- When should I prefer
zip()
overitertools.izip()
, or range()
overxrange()
, or[x for x in foo]
over(x for x in foo)
?
- 我什么时候应该更喜欢
zip()
了itertools.izip()
,或者 range()
结束xrange()
, 或[x for x in foo]
结束了(x for x in foo)
?
Obviously, we eventually need to "resolve" a generator into actual data, usually by creating a list or iterating over it with a non-generator loop. Sometimes we just need to know the length. This isn't what I'm asking.
显然,我们最终需要将生成器“解析”为实际数据,通常是通过创建一个列表或使用非生成器循环对其进行迭代。有时我们只需要知道长度。这不是我要问的。
We use generators so that we're not assigning new lists into memory for interim data. This especially makes sense for large datasets. Does it make sense for small datasets too? Is there a noticeable memory/cpu trade-off?
我们使用生成器,这样我们就不会为临时数据分配新列表到内存中。这对于大型数据集尤其有意义。它对小数据集也有意义吗?是否存在明显的内存/CPU 权衡?
I'm especially interested if anyone has done some profiling on this, in light of the eye-opening discussion of list comprehension performance vs. map() and filter(). (alt link)
鉴于对列表理解性能与 map() 和 filter()的令人大开眼界的讨论,如果有人对此进行了一些分析,我特别感兴趣。(替代链接)
采纳答案by Raymond Hettinger
Use a list instead of a generator when:
在以下情况下使用列表而不是生成器:
1) You need to access the data multipletimes (i.e. cache the results instead of recomputing them):
1)您需要访问数据的多个时间(即高速缓存的结果,而不是重新计算它们的):
for i in outer: # used once, okay to be a generator or return a list
for j in inner: # used multiple times, reusing a list is better
...
2) You need random access(or any access other than forward sequential order):
2)您需要随机访问(或除前向顺序以外的任何访问):
for i in reversed(data): ... # generators aren't reversible
s[i], s[j] = s[j], s[i] # generators aren't indexable
3) You need to joinstrings (which requires two passes over the data):
3)您需要加入字符串(这需要对数据进行两次传递):
s = ''.join(data) # lists are faster than generators in this use case
4) You are using PyPywhich sometimes can't optimize generator code as much as it can with normal function calls and list manipulations.
4) 您使用的PyPy有时无法像正常函数调用和列表操作那样优化生成器代码。
回答by Ryan Ginstrom
In general, don't use a generator when you need list operations, like len(), reversed(), and so on.
一般来说,当你需要列表操作时不要使用生成器,比如 len()、reversed() 等等。
There may also be times when you don't want lazy evaluation (e.g. to do all the calculation up front so you can release a resource). In that case, a list expression might be better.
有时您也可能不想要惰性求值(例如,预先进行所有计算以便释放资源)。在这种情况下,列表表达式可能更好。
回答by Jerub
Profile, Profile, Profile.
简介,简介,简介。
Profiling your code is the only way to know if what you're doing has any effect at all.
分析您的代码是了解您所做的事情是否有任何影响的唯一方法。
Most usages of xrange, generators, etc are over static size, small datasets. It's only when you get to large datasets that it really makes a difference. range() vs. xrange() is mostly just a matter of making the code look a tiny little bit more ugly, and not losing anything, and maybe gaining something.
xrange、生成器等的大多数用途都超过静态大小、小数据集。只有当您处理大型数据集时,它才会真正发挥作用。range() 与 xrange() 主要只是让代码看起来更难看一点,并且不会丢失任何东西,并且可能会获得一些东西。
Profile, Profile, Profile.
简介,简介,简介。
回答by Steven Huwig
You should never favor zip
over izip
, range
over xrange
, or list comprehensions over generator comprehensions. In Python 3.0 range
has xrange
-like semantics and zip
has izip
-like semantics.
你永远不应该偏爱zip
于izip
, range
overxrange
或 list comprehensions 而不是 generator comprehensions。在 Python 3.0 中range
有xrange
-like 语义和zip
has izip
-like 语义。
List comprehensions are actually clearer like list(frob(x) for x in foo)
for those times you need an actual list.
列表推导式实际上更清晰,就像list(frob(x) for x in foo)
在您需要实际列表的时候一样。
回答by monkut
As you mention, "This especially makes sense for large datasets", I think this answers your question.
正如您提到的,“这对于大型数据集尤其有意义”,我认为这回答了您的问题。
If your not hitting any walls, performance-wise, you can still stick to lists and standard functions. Then when you run into problems with performance make the switch.
如果您没有碰到任何障碍,在性能方面,您仍然可以坚持使用列表和标准功能。然后,当您遇到性能问题时,请进行切换。
As mentioned by @u0b34a0f6ae in the comments, however, using generators at the start can make it easier for you to scale to larger datasets.
然而,正如@u0b34a0f6ae 在评论中所提到的,在开始时使用生成器可以让您更轻松地扩展到更大的数据集。
回答by Ryan Ginstrom
Regarding performance: if using psyco, lists can be quite a bit faster than generators. In the example below, lists are almost 50% faster when using psyco.full()
关于性能:如果使用 psyco,列表可能比生成器快很多。在下面的示例中,列表在使用 psyco.full() 时快了近 50%
import psyco
import time
import cStringIO
def time_func(func):
"""The amount of time it requires func to run"""
start = time.clock()
func()
return time.clock() - start
def fizzbuzz(num):
"""That algorithm we all know and love"""
if not num % 3 and not num % 5:
return "%d fizz buzz" % num
elif not num % 3:
return "%d fizz" % num
elif not num % 5:
return "%d buzz" % num
return None
def with_list(num):
"""Try getting fizzbuzz with a list comprehension and range"""
out = cStringIO.StringIO()
for fibby in [fizzbuzz(x) for x in range(1, num) if fizzbuzz(x)]:
print >> out, fibby
return out.getvalue()
def with_genx(num):
"""Try getting fizzbuzz with generator expression and xrange"""
out = cStringIO.StringIO()
for fibby in (fizzbuzz(x) for x in xrange(1, num) if fizzbuzz(x)):
print >> out, fibby
return out.getvalue()
def main():
"""
Test speed of generator expressions versus list comprehensions,
with and without psyco.
"""
#our variables
nums = [10000, 100000]
funcs = [with_list, with_genx]
# try without psyco 1st
print "without psyco"
for num in nums:
print " number:", num
for func in funcs:
print func.__name__, time_func(lambda : func(num)), "seconds"
print
# now with psyco
print "with psyco"
psyco.full()
for num in nums:
print " number:", num
for func in funcs:
print func.__name__, time_func(lambda : func(num)), "seconds"
print
if __name__ == "__main__":
main()
Results:
结果:
without psyco
number: 10000
with_list 0.0519102208309 seconds
with_genx 0.0535933367509 seconds
number: 100000
with_list 0.542204280744 seconds
with_genx 0.557837353115 seconds
with psyco
number: 10000
with_list 0.0286369007033 seconds
with_genx 0.0513424889137 seconds
number: 100000
with_list 0.335414877839 seconds
with_genx 0.580363490491 seconds
回答by minty
You should prefer list comprehensions if you need to keep the values around for something else later and the size of your set is not too large.
如果您以后需要将这些值保留在其他地方并且您的集合的大小不太大,则您应该更喜欢列表推导式。
For example: you are creating a list that you will loop over several times later in your program.
例如:您正在创建一个列表,稍后您将在程序中多次循环该列表。
To some extent you can think of generators as a replacement for iteration (loops) vs. list comprehensions as a type of data structure initialization. If you want to keep the data structure then use list comprehensions.
在某种程度上,您可以将生成器视为迭代(循环)与列表推导式的替代品,作为一种数据结构初始化。如果要保留数据结构,请使用列表推导式。
回答by Jason Baker
As far as performance is concerned, I can't think of any times that you would want to use a list over a generator.
就性能而言,我想不出任何时候您会想要在生成器上使用列表。
回答by Jeremy Cantrell
I've never found a situation where generators would hinder what you're trying to do. There are, however, plenty of instances where using generators would not help you any more than not using them.
我从未发现生成器会阻碍您尝试做的事情的情况。但是,在很多情况下,使用生成器对您的帮助与不使用它们相比无济于事。
For example:
例如:
sorted(xrange(5))
Does not offer any improvement over:
不提供任何改进:
sorted(range(5))