如何克隆 Python 生成器对象？

Question

提问by Paulo Freitas

Consider this scenario:

考虑这个场景：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

walk = os.walk('/home')

for root, dirs, files in walk:
    for pathname in dirs+files:
        print os.path.join(root, pathname)

for root, dirs, files in walk:
    for pathname in dirs+files:
        print os.path.join(root, pathname)

I know that this example is kinda redundant, but you should consider that we need to use the same walkdata more than once. I've a benchmark scenario and the use of same walkdata is mandatory to get helpful results.

我知道这个例子有点多余，但你应该考虑到我们需要walk多次使用相同的数据。我有一个基准场景，必须使用相同的walk数据才能获得有用的结果。

I've tried walk2 = walkto clone and use in the second iteration, but it didn't work. The question is... How can I copy it? Is it ever possible?

我尝试walk2 = walk在第二次迭代中进行克隆和使用，但没有奏效。问题是……我怎样才能复制它？有可能吗？

Thank you in advance.

先感谢您。

Answer 1

采纳答案by Sven Marnach

You can use itertools.tee():

您可以使用itertools.tee()：

walk, walk2 = itertools.tee(walk)

Note that this might "need significant extra storage", as the documentation points out.

请注意，正如文档指出的那样，这可能“需要大量额外的存储空间”。

Answer 2

回答by S.Lott

Define a function

定义一个函数

 def walk_home():
     for r in os.walk('/home'):
         yield r

Or even this

甚至这个

def walk_home():
    return os.walk('/home')

Both are used like this:

两者都是这样使用的：

for root, dirs, files in walk_home():
    for pathname in dirs+files:
        print os.path.join(root, pathname)

Answer 3

回答by shang

If you know you are going to iterate through the whole generator for every usage, you will probably get the best performance by unrolling the generator to a list and using the list multiple times.

如果您知道每次使用都将遍历整个生成器，那么通过将生成器展开到列表并多次使用该列表，您可能会获得最佳性能。

walk = list(os.walk('/home'))

Answer 4

回答by Six

This answer aims to extend/elaborate on what the other answers have expressed. The solution will necessarily vary depending on what exactlyyou aim to achieve.

该答案旨在扩展/详细说明其他答案所表达的内容。这取决于该解决方案将必然改变正是你的目标实现。

If you want to iterate over the exact same result of os.walkmultiple times, you will need to initialize a list from the os.walkiterable's items (i.e. walk = list(os.walk(path))).

如果您想os.walk多次迭代完全相同的结果，您将需要从os.walk可迭代的项目（即walk = list(os.walk(path))）初始化一个列表。

If you must guarantee the data remains the same, that is probably your only option. However, there are several scenarios in which this is not possible or desirable.

如果您必须保证数据保持不变，那可能是您唯一的选择。但是，在某些情况下这是不可能或不可取的。

It will not be possible to list()an iterable if the output is of sufficient size (i.e. attempting to list()an entire filesystem may freeze your computer).
It is not desirable to list()an iterable if you wish to acquire "fresh" data prior to each use.

list()如果输出足够大（即尝试list()整个文件系统可能会冻结您的计算机），则无法进行迭代。
list()如果您希望在每次使用之前获取“新鲜”数据，那么迭代是不可取的。

In the event that list()is not suitable, you will need to run your generator on demand. Note that generators are extinguised after each use, so this poses a slight problem. In order to "rerun" your generator multiple times, you can use the following pattern:

如果list()不合适，您将需要按需运行生成器。请注意，发电机在每次使用后都会熄灭，因此这会带来一个小问题。为了多次“重新运行”您的生成器，您可以使用以下模式：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os

class WalkMaker:
    def __init__(self, path):
        self.path = path
    def __iter__(self):
        for root, dirs, files in os.walk(self.path):
            for pathname in dirs + files:
                yield os.path.join(root, pathname)

walk = WalkMaker('/home')

for path in walk:
    pass

# do something...

for path in walk:
    pass

The aforementioned design pattern will allow you to keep your code DRY.

上述设计模式将允许您保持代码干燥。

Answer 5

回答by Rob Truxal

This is a good usecase for functools.partial()to make a quick generator-factory:

这是functools.partial()创建快速生成器工厂的好用例：

from functools import partial
import os

walk_factory = partial(os.walk, '/home')

walk1, walk2, walk3 = walk_factory(), walk_factory(), walk_factory()

What functools.partial()does is hard to describe with human-words, but this^ is what it's for.

什么functools.partial()是很难用人类语言来描述的，但这就是它的用途。

It partiallyfills out function-params without executing that function. Consequently it acts as a function/generator factory.

它在不执行该函数的情况下部分填写了函数参数。因此，它充当函数/生成器工厂。

Answer 6

回答by Erik Aronesty

This "Python Generator Listeners" code allows you to have many listeners on a single generator, like os.walk, and even have someone "chime in" later.

这个“Python 生成器侦听器”代码允许您在单个生成器上拥有多个侦听器，例如os.walk，甚至可以稍后让某人“加入”。

def walkme(): os.walk('/home')

m1 = Muxer(walkme) m2 = Muxer(walkme)

then m1 and m2 can run in threads even and process at their leisure.

然后 m1 和 m2 甚至可以在线程中运行并在空闲时进行处理。

See: https://gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3

参见：https: //gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3

import queue
from threading import Lock
from collections import namedtuple

class Muxer():
    Entry = namedtuple('Entry', 'genref listeners, lock')

    already = {}
    top_lock = Lock()

    def __init__(self, func, restart=False):
        self.restart = restart
        self.func = func
        self.queue = queue.Queue()

        with self.top_lock:
            if func not in self.already:
                self.already[func] = self.Entry([func()], [], Lock())
            ent = self.already[func]

        self.genref = ent.genref
        self.lock = ent.lock
        self.listeners = ent.listeners

        self.listeners.append(self)

    def __iter__(self):
        return self

    def __next__(self):
        try:
            e = self.queue.get_nowait()
        except queue.Empty:
            with self.lock:
                try:
                    e = self.queue.get_nowait()
                except queue.Empty:
                    try:
                        e = next(self.genref[0])
                        for other in self.listeners:
                            if not other is self:
                                other.queue.put(e)
                    except StopIteration:
                        if self.restart:
                            self.genref[0] = self.func()
                        raise
        return e

    def __del__(self):
        with self.top_lock:
            try:
                self.listeners.remove(self)
            except ValueError:
                pass
            if not self.listeners and self.func in self.already:
                del self.already[self.func]

如何克隆 Python 生成器对象？

提问by Paulo Freitas

采纳答案by Sven Marnach

回答by S.Lott

回答by shang

回答by Six

回答by Rob Truxal

回答by Erik Aronesty

相关推荐

最近更新

标签

如何克隆 Python 生成器对象？

提问by Paulo Freitas

采纳答案by Sven Marnach

回答by S.Lott

回答by shang

回答by Six

回答by Rob Truxal

回答by Erik Aronesty

相关推荐

Python 中的字符串到字典

使用python ZipFile从zip中提取文件而不保留结构？

从 Python 列表中的每个数字中减去一个值？

Python 如何遍历目录中的文件？

相关推荐

最近更新

标签