如何克隆 Python 生成器对象?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4945155/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to clone a Python generator object?
提问by Paulo Freitas
Consider this scenario:
考虑这个场景:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
walk = os.walk('/home')
for root, dirs, files in walk:
for pathname in dirs+files:
print os.path.join(root, pathname)
for root, dirs, files in walk:
for pathname in dirs+files:
print os.path.join(root, pathname)
I know that this example is kinda redundant, but you should consider that we need to use the same walkdata more than once. I've a benchmark scenario and the use of same walkdata is mandatory to get helpful results.
我知道这个例子有点多余,但你应该考虑到我们需要walk多次使用相同的数据。我有一个基准场景,必须使用相同的walk数据才能获得有用的结果。
I've tried walk2 = walkto clone and use in the second iteration, but it didn't work. The question is... How can I copy it? Is it ever possible?
我尝试walk2 = walk在第二次迭代中进行克隆和使用,但没有奏效。问题是……我怎样才能复制它?有可能吗?
Thank you in advance.
先感谢您。
采纳答案by Sven Marnach
You can use itertools.tee():
您可以使用itertools.tee():
walk, walk2 = itertools.tee(walk)
Note that this might "need significant extra storage", as the documentation points out.
请注意,正如文档指出的那样,这可能“需要大量额外的存储空间”。
回答by S.Lott
Define a function
定义一个函数
def walk_home():
for r in os.walk('/home'):
yield r
Or even this
甚至这个
def walk_home():
return os.walk('/home')
Both are used like this:
两者都是这样使用的:
for root, dirs, files in walk_home():
for pathname in dirs+files:
print os.path.join(root, pathname)
回答by shang
If you know you are going to iterate through the whole generator for every usage, you will probably get the best performance by unrolling the generator to a list and using the list multiple times.
如果您知道每次使用都将遍历整个生成器,那么通过将生成器展开到列表并多次使用该列表,您可能会获得最佳性能。
walk = list(os.walk('/home'))
walk = list(os.walk('/home'))
回答by Six
This answer aims to extend/elaborate on what the other answers have expressed. The solution will necessarily vary depending on what exactlyyou aim to achieve.
该答案旨在扩展/详细说明其他答案所表达的内容。这取决于该解决方案将必然改变正是你的目标实现。
If you want to iterate over the exact same result of os.walkmultiple times, you will need to initialize a list from the os.walkiterable's items (i.e. walk = list(os.walk(path))).
如果您想os.walk多次迭代完全相同的结果,您将需要从os.walk可迭代的项目(即walk = list(os.walk(path)))初始化一个列表。
If you must guarantee the data remains the same, that is probably your only option. However, there are several scenarios in which this is not possible or desirable.
如果您必须保证数据保持不变,那可能是您唯一的选择。但是,在某些情况下这是不可能或不可取的。
- It will not be possible to
list()an iterable if the output is of sufficient size (i.e. attempting tolist()an entire filesystem may freeze your computer). - It is not desirable to
list()an iterable if you wish to acquire "fresh" data prior to each use.
list()如果输出足够大(即尝试list()整个文件系统可能会冻结您的计算机),则无法进行迭代。list()如果您希望在每次使用之前获取“新鲜”数据,那么迭代是不可取的。
In the event that list()is not suitable, you will need to run your generator on demand. Note that generators are extinguised after each use, so this poses a slight problem. In order to "rerun" your generator multiple times, you can use the following pattern:
如果list()不合适,您将需要按需运行生成器。请注意,发电机在每次使用后都会熄灭,因此这会带来一个小问题。为了多次“重新运行”您的生成器,您可以使用以下模式:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
class WalkMaker:
def __init__(self, path):
self.path = path
def __iter__(self):
for root, dirs, files in os.walk(self.path):
for pathname in dirs + files:
yield os.path.join(root, pathname)
walk = WalkMaker('/home')
for path in walk:
pass
# do something...
for path in walk:
pass
The aforementioned design pattern will allow you to keep your code DRY.
上述设计模式将允许您保持代码干燥。
回答by Rob Truxal
This is a good usecase for functools.partial()to make a quick generator-factory:
这是functools.partial()创建快速生成器工厂的好用例:
from functools import partial
import os
walk_factory = partial(os.walk, '/home')
walk1, walk2, walk3 = walk_factory(), walk_factory(), walk_factory()
What functools.partial()does is hard to describe with human-words, but this^ is what it's for.
什么functools.partial()是很难用人类语言来描述的,但这就是它的用途。
It partiallyfills out function-params without executing that function. Consequently it acts as a function/generator factory.
它在不执行该函数的情况下部分填写了函数参数。因此,它充当函数/生成器工厂。
回答by Erik Aronesty
This "Python Generator Listeners" code allows you to have many listeners on a single generator, like os.walk, and even have someone "chime in" later.
这个“Python 生成器侦听器”代码允许您在单个生成器上拥有多个侦听器,例如os.walk,甚至可以稍后让某人“加入”。
def walkme(): os.walk('/home')
def walkme(): os.walk('/home')
m1 = Muxer(walkme) m2 = Muxer(walkme)
m1 = Muxer(walkme) m2 = Muxer(walkme)
then m1 and m2 can run in threads even and process at their leisure.
然后 m1 和 m2 甚至可以在线程中运行并在空闲时进行处理。
See: https://gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3
参见:https: //gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3
import queue
from threading import Lock
from collections import namedtuple
class Muxer():
Entry = namedtuple('Entry', 'genref listeners, lock')
already = {}
top_lock = Lock()
def __init__(self, func, restart=False):
self.restart = restart
self.func = func
self.queue = queue.Queue()
with self.top_lock:
if func not in self.already:
self.already[func] = self.Entry([func()], [], Lock())
ent = self.already[func]
self.genref = ent.genref
self.lock = ent.lock
self.listeners = ent.listeners
self.listeners.append(self)
def __iter__(self):
return self
def __next__(self):
try:
e = self.queue.get_nowait()
except queue.Empty:
with self.lock:
try:
e = self.queue.get_nowait()
except queue.Empty:
try:
e = next(self.genref[0])
for other in self.listeners:
if not other is self:
other.queue.put(e)
except StopIteration:
if self.restart:
self.genref[0] = self.func()
raise
return e
def __del__(self):
with self.top_lock:
try:
self.listeners.remove(self)
except ValueError:
pass
if not self.listeners and self.func in self.already:
del self.already[self.func]

