Python for 循环和迭代器行为

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29403401/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:31:43  来源:igfitidea点击:

Python for loop and iterator behavior

pythoniterator

提问by Matteo

I wanted to understand a bit more about iterators, so please correct me if I'm wrong.

我想了解更多关于iterators,所以如果我错了,请纠正我。

An iterator is an object which has a pointer to the next object and is read as a buffer or stream (i.e. a linked list). They're particularly efficient cause all they do is tell you what is next by references instead of using indexing.

迭代器是一个对象,它具有指向下一个对象的指针,并作为缓冲区或流(即链表)读取。它们特别有效,因为它们所做的只是通过引用而不是使用索引告诉您下一步是什么。

However I still don't understand why is the following behavior happening:

但是我仍然不明白为什么会发生以下行为:

In [1]: iter = (i for i in range(5))

In [2]: for _ in iter:
   ....:     print _
   ....:     
0
1
2
3
4

In [3]: for _ in iter:
   ....:     print _
   ....:     

In [4]: 

After a first loop through the iterator (In [2]) it's as if it was consumed and left empty, so the second loop (In [3]) prints nothing.

在通过迭代器 ( In [2])的第一个循环之后,就好像它被消耗了并留空了,因此第二个循环 ( In [3]) 什么也不打印。

However I never assigned a new value to the itervariable.

但是,我从未为iter变量分配过新值。

What is really happening under the hood of the forloop?

for循环背后到底发生了什么?

采纳答案by Rick supports Monica

Your suspicion is correct: the iterator has been consumed.

您的怀疑是正确的:迭代器已被消耗。

In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once.

实际上,您的迭代器是一个generator,它是一个只能迭代一次的对象。

type((i for i in range(5))) # says it's type generator 

def another_generator():
    yield 1 # the yield expression makes it a generator, not a function

type(another_generator()) # also a generator

The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator:

它们高效的原因与“通过引用”告诉您下一步是什么无关。它们是高效的,因为它们只根据请求生成下一个项目;所有项目都不是一次生成的。事实上,你可以有一个无限生成器:

def my_gen():
    while True:
        yield 1 # again: yield means it is a generator, not a function

for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop!

Some other corrections to help improve your understanding:

一些其他更正有助于提高您的理解:

  • The generator is not a pointer, and does not behave like a pointer as you might be familiar with in other languages.
  • One of the differences from other languages: as said above, each result of the generator is generated on the fly. The next result is not produced until it is requested.
  • The keyword combination forinaccepts an iterable object as its second argument.
  • The iterable object can be a generator, as in your example case, but it can also be any other iterable object, such as a list, or dict, or a strobject (string), or a user-defined type that provides the required functionality.
  • The iterfunctionis applied to the object to get an iterator (by the way: don't use iteras a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's __iter__methodis called (which is, for the most part, all the iterfunction does anyway; __iter__is one of Python's so-called "magic methods").
  • If the call to __iter__is successful, the function next()is applied to the iterable object over and over again, in a loop, and the first variable supplied to forinis assigned to the result of the next()function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__method, which is another "magic method".
  • The forloop ends when next()raises the StopIterationexception (which usually happens when the iterable does not have another object to yield when next()is called).
  • 生成器不是指针,它的行为不像您在其他语言中可能熟悉的指针。
  • 与其他语言的区别之一:如上所述,生成器的每个结果都是即时生成的。直到被请求时才会产生下一个结果。
  • 关键字组合forin接受一个可迭代对象作为其第二个参数。
  • 可迭代对象可以是生成器,如您的示例情况,但它也可以是任何其他可迭代对象,例如list、 或dict、 或str对象(字符串),或提供所需功能的用户定义类型。
  • iter函数应用于对象以获取迭代器(顺便说一句:不要iter像您所做的那样在 Python 中用作变量名 - 它是关键字之一)。实际上,更准确地说,对象的__iter__方法被调用(在大多数情况下,iter函数所做的一切都是如此;__iter__是 Python 所谓的“魔术方法”之一)。
  • 如果调用__iter__成功,则函数next()会在循环中一遍又一遍地应用于可迭代对象,并将提供给的第一个变量forin分配给next()函数的结果。(请记住:可迭代对象可以是生成器,也可以是容器对象的迭代器,或任何其他可迭代对象。)实际上,更准确地说:它调用迭代器对象的__next__方法,这是另一个“魔术方法”。
  • for当循环结束next()引发StopIteration异常(当可迭代不具有另一个目的是产生时通常发生next()被调用)。

You can "manually" implement a forloop in python this way (probably not perfect, but close enough):

您可以通过for这种方式在 python 中“手动”实现一个循环(可能不完美,但足够接近):

try:
    temp = iterable.__iter__()
except AttributeError():
    raise TypeError("'{}' object is not iterable".format(type(iterable).__name__))
else:
    while True:
        try:
            _ = temp.__next__()
        except StopIteration:
            break
        except AttributeError:
            raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__))
        # this is the "body" of the for loop
        continue

There is pretty much no difference between the above and your example code.

上面的代码和你的示例代码几乎没有区别。

Actually, the more interesting part of a forloop is not the for, but the in. Using inby itself produces a different effect than forin, but it is very useful to understand what indoes with its arguments, since forinimplements very similar behavior.

实际上,for循环中更有趣的部分不是for,而是in。单独使用in会产生与 不同的效果forin,但了解in其参数的作用非常有用,因为forin实现了非常相似的行为。

  • When used by itself, the inkeyword first calls the object's __contains__method, which is yet another "magic method" (note that this step is skipped when using forin). Using inby itself on a container, you can do things like this:

    1 in [1, 2, 3] # True
    'He' in 'Hello' # True
    3 in range(10) # True
    'eH' in 'Hello'[::-1] # True
    
  • If the iterable object is NOT a container (i.e. it doesn't have a __contains__method), innext tries to call the object's __iter__method. As was said previously: the __iter__method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic function next()on1. A generator is just one type of iterator.

  • If the call to __iter__is successful, the inkeyword applies the function next()to the iterable object over and over again. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__method).
  • If the object doesn't have a __iter__method to return an iterator, inthen falls back on the old-style iteration protocol using the object's __getitem__method2.
  • If all of the above attempts fail, you'll get a TypeErrorexception.
  • in关键字单独使用时,首先调用对象的__contains__方法,这又是一个“神奇的方法”(注意,使用时跳过了这一步forin)。使用in一个容器本身,你可以做这样的事情:

    1 in [1, 2, 3] # True
    'He' in 'Hello' # True
    3 in range(10) # True
    'eH' in 'Hello'[::-1] # True
    
  • 如果可迭代对象不是容器(即它没有__contains__方法),则innext 尝试调用该对象的__iter__方法。如前所述:该__iter__方法返回在 Python 中称为iterator 的内容。基本上,迭代器是一个可以next()1上使用内置泛型函数的对象。生成器只是一种迭代器。

  • 如果调用__iter__成功,in关键字会next()一遍又一遍地将该函数应用于可迭代对象。(请记住:可迭代对象可以是生成器、容器对象的迭代器或任何其他可迭代对象。)实际上,更准确地说:它调用迭代器对象的__next__方法)。
  • 如果对象没有__iter__返回迭代器的方法,in则使用对象的__getitem__方法2回退到旧式迭代协议。
  • 如果以上所有尝试都失败,您将收到一个TypeError异常

If you wish to create your own object type to iterate over (i.e, you can use forin, or just in, on it), it's useful to know about the yieldkeyword, which is used in generators(as mentioned above).

如果您希望创建自己的对象类型以进行迭代(即,您可以在其上使用forin或仅使用in),那么了解生成器中yield使用的关键字(如上所述)会很有用。

class MyIterable():
    def __iter__(self):
        yield 1

m = MyIterable()
for _ in m: print(_) # 1
1 in m # True    

The presence of yieldturns a function or method into a generator instead of a regular function/method. You don't need the __next__method if you use a generator (it brings __next__along with it automatically).

的存在yield将一个函数或方法变成了一个生成器,而不是一个常规的函数/方法。__next__如果您使用生成器(它会__next__自动带来),则不需要该方法。

If you wish to create your own container object type (i.e, you can use inon it by itself, but NOT forin), you just need the __contains__method.

如果您希望创建自己的容器对象类型(即,您可以单独使用in它,但不能使用它forin),您只需要该__contains__方法。

class MyUselessContainer():
    def __contains__(self, obj):
        return True

m = MyUselessContainer()
1 in m # True
'Foo' in m # True
TypeError in m # True
None in m # True


1Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the __next__and __iter__methods must be correctlyimplemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the ___next__method is actually next(no underscores) in Python 2.

1请注意,要成为迭代器,对象必须实现迭代器协议。这仅意味着必须正确实现__next____iter__方法(生成器“免费”提供此功能,因此您在使用它们时无需担心)。另请注意,该方法实际上(无下划线)在 Python 2 中___next__next

2See this answerfor the different ways to create iterable classes.

2有关 创建可迭代类的不同方法,请参阅此答案

回答by Marcin

For loop basically calls the nextmethod of an object that is applied to (__next__in Python 3).

For 循环基本上调用next应用于(__next__在 Python 3 中)的对象的方法。

You can simulate this simply by doing:

您可以简单地通过执行以下操作来模拟:

iter = (i for i in range(5))

print(next(iter))
print(next(iter))  
print(next(iter))  
print(next(iter))  
print(next(iter)) 

# this prints 1 2 3 4 

At this point there is no next element in the input object. So doing this:

此时输入对象中没有下一个元素。所以这样做:

print(next(iter))  

Will result in StopIterationexception thrown. At this point forwill stop. And iterator can be any objectwhich will respond to the next()function and throws the exception when there are no more elements. It does not have to be any pointer or reference (there are no such things in python anyway in C/C++ sense), linked list, etc.

会导致StopIteration抛出异常。此时for会停止。迭代器可以是任何对象,它会响应next()函数并在没有更多元素时抛出异常。它不必是任何指针或引用(无论如何在 C/C++ 意义上,python 中没有这样的东西)、链表等。

回答by MadMan2064

There is an iterator protocol in python that defines how the forstatement will behave with lists and dicts, and other things that can be looped over.

python中有一个迭代器协议,它定义了for语句将如何处理列表和字典,以及其他可以循环的东西。

It's in the python docs hereand here.

它位于此处此处的 python 文档中。

The way the iterator protocol works typically is in the form of a python generator. We yielda value as long as we have a value until we reach the end and then we raise StopIteration

迭代器协议的工作方式通常采用 python 生成器的形式。我们yield有一个价值,只要我们有一个价值,直到我们到达终点然后我们加注StopIteration

So let's write our own iterator:

因此,让我们编写自己的迭代器:

def my_iter():
    yield 1
    yield 2
    yield 3
    raise StopIteration()

for i in my_iter():
    print i

The result is:

结果是:

1
2
3

A couple of things to note about that. The my_iter is a function. my_iter() returns an iterator.

有几点需要注意。my_iter 是一个函数。my_iter() 返回一个迭代器。

If I had written using iterator like this instead:

如果我使用迭代器这样写:

j = my_iter()    #j is the iterator that my_iter() returns
for i in j:
    print i  #this loop runs until the iterator is exhausted

for i in j:
    print i  #the iterator is exhausted so we never reach this line

And the result is the same as above. The iter is exhausted by the time we enter the second for loop.

结果和上面一样。当我们进入第二个 for 循环时,迭代器已经耗尽。

But that's rather simplistic what about something more complicated? Perhaps maybe in a loop why not?

但这很简单,那么更复杂的事情呢?也许也许在一个循环中为什么不呢?

def capital_iter(name):
    for x in name:
        yield x.upper()
    raise StopIteration()

for y in capital_iter('bobert'):
    print y

And when it runs, we use the iterator on the string type (which is built into iter). This in turn, allows us run a for loop on it, and yield the results until we are done.

当它运行时,我们在字符串类型(内置于iter)上使用迭代器。反过来,这允许我们在其上运行 for 循环,并在我们完成之前产生结果。

B
O
B
E
R
T

So now this begs the question, so what happens between yields in the iterator?

所以现在这就引出了一个问题,那么迭代器中的产量之间会发生什么?

j = capital_iter("bobert")
print i.next()
print i.next()
print i.next()

print("Hey there!")

print i.next()
print i.next()
print i.next()

print i.next()  #Raises StopIteration

The answer is the function is paused at the yield waiting for the next call to next().

答案是函数在 yield 处暂停,等待下一次调用 next()。

B
O
B
Hey There!
E
R
T
Traceback (most recent call last):
  File "", line 13, in 
    StopIteration

回答by Abhijit

Concept 1

概念 1

All generators are iterators but all iterators are not generator

所有生成器都是迭代器,但所有迭代器都不是生成器

Concept 2

概念 2

An iterator is an object with a next (Python 2) or next(Python 3) method.

迭代器是具有 next (Python 2) 或next(Python 3) 方法的对象。

Concept 3

概念 3

Quoting from wiki GeneratorsGenerators functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.

引自 wiki GeneratorsGenerators 函数允许您声明一个行为类似于迭代器的函数,即它可以在 for 循环中使用。

In your case

在你的情况下

>>> it = (i for i in range(5))
>>> type(it)
<type 'generator'>
>>> callable(getattr(it, 'iter', None))
False
>>> callable(getattr(it, 'next', None))
True

回答by Ethan Furman

Some additional details about the behaviour of iter()with __getitem__classes that lack their own __iter__method.

关于缺少自己方法的iter()with__getitem__类的行为的一些附加细节__iter__



Before __iter__there was __getitem__. If the __getitem__works with ints from 0- len(obj)-1, then iter()supports these objects. It will construct a new iterator that repeatedly calls __getitem__with 0, 1, 2, ...until it gets an IndexError, which it converts to a StopIteration.

之前__iter____getitem__。如果__getitem__ints from 0- 一起使用len(obj)-1,则iter()支持这些对象。这将构建一个新的迭代反复调用__getitem__012...直到它得到的IndexError,将其转换成一个StopIteration

See this answerfor more details of the different ways to create an iterator.

有关创建迭代器的不同方法的更多详细信息,请参阅此答案

回答by drewteriyaki

Excerpt from the Python Practice book:

摘自Python 实践书



5. Iterators & Generators

5. 迭代器和生成器

5.1. Iterators

5.1. 迭代器

We use for statement for looping over a list.

我们使用 for 语句来循环列表。

>>> for i in [1, 2, 3, 4]:
...     print i,
...
1
2
3
4

If we use it with a string, it loops over its characters.

如果我们将它与字符串一起使用,它会循环遍历它的字符。

>>> for c in "python":
...     print c
...
p
y
t
h
o
n

If we use it with a dictionary, it loops over its keys.

如果我们将它与字典一起使用,它会遍历它的键。

>>> for k in {"x": 1, "y": 2}:
...     print k
...
y
x

If we use it with a file, it loops over lines of the file.

如果我们将它与文件一起使用,它会遍历文件的行。

>>> for line in open("a.txt"):
...     print line,
...
first line
second line

So there are many types of objects which can be used with a for loop. These are called iterable objects.

因此,有许多类型的对象可以与 for 循环一起使用。这些被称为可迭代对象。

There are many functions which consume these iterables.

有许多函数会消耗这些可迭代对象。

>>> ",".join(["a", "b", "c"])
'a,b,c'
>>> ",".join({"x": 1, "y": 2})
'y,x'
>>> list("python")
['p', 'y', 't', 'h', 'o', 'n']
>>> list({"x": 1, "y": 2})
['y', 'x']

5.1.1. The Iteration Protocol

5.1.1. 迭代协议

The built-in function iter takes an iterable object and returns an iterator.

内置函数 iter 接受一个可迭代对象并返回一个迭代器。

    >>> x = iter([1, 2, 3])
>>> x
<listiterator object at 0x1004ca850>
>>> x.next()
1
>>> x.next()
2
>>> x.next()
3
>>> x.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

StopIteration

停止迭代

Each time we call the next method on the iterator gives us the next element. If there are no more elements, it raises a StopIteration.

每次我们在迭代器上调用 next 方法都会给我们下一个元素。如果没有更多元素,它会引发一个 StopIteration。

Iterators are implemented as classes. Here is an iterator that works like built-in xrange function.

迭代器被实现为类。这是一个类似于内置 xrange 函数的迭代器。

class yrange:
    def __init__(self, n):
        self.i = 0
        self.n = n

    def __iter__(self):
        return self

    def next(self):
        if self.i < self.n:
            i = self.i
            self.i += 1
            return i
        else:
            raise StopIteration()

The itermethod is what makes an object iterable. Behind the scenes, the iter function calls itermethod on the given object.

ITER方法是什么使一个对象迭代。在幕后,iter 函数调用给定对象的iter方法。

The return value of iteris an iterator. It should have a next method and raise StopIteration when there are no more elements.

iter的返回值是一个迭代器。它应该有一个 next 方法并在没有更多元素时引发 StopIteration 。

Lets try it out:

让我们试试看:

>>> y = yrange(3)
>>> y.next()
0
>>> y.next()
1
>>> y.next()
2
>>> y.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 14, in next

StopIteration

停止迭代

Many built-in functions accept iterators as arguments.

许多内置函数接受迭代器作为参数。

>>> list(yrange(5))
[0, 1, 2, 3, 4]
>>> sum(yrange(5))
10

In the above case, both the iterable and iterator are the same object. Notice that the itermethod returned self. It need not be the case always.

在上面的例子中,iterable 和 iterator 都是同一个对象。请注意,iter方法返回了 self。不必总是如此。

class zrange:
    def __init__(self, n):
        self.n = n

    def __iter__(self):
        return zrange_iter(self.n)

class zrange_iter:
    def __init__(self, n):
        self.i = 0
        self.n = n

    def __iter__(self):
        # Iterators are iterables too.
        # Adding this functions to make them so.
        return self

    def next(self):
        if self.i < self.n:
            i = self.i
            self.i += 1
            return i
        else:
            raise StopIteration()

If both iteratable and iterator are the same object, it is consumed in a single iteration.

如果 iterable 和 iterator 是同一个对象,则在一次迭代中使用它。

>>> y = yrange(5)
>>> list(y)
[0, 1, 2, 3, 4]
>>> list(y)
[]
>>> z = zrange(5)
>>> list(z)
[0, 1, 2, 3, 4]
>>> list(z)
[0, 1, 2, 3, 4]

5.2. Generators

5.2. 发电机

Generators simplifies creation of iterators. A generator is a function that produces a sequence of results instead of a single value.

生成器简化了迭代器的创建。生成器是一个函数,它产生一系列结果而不是单个值。

def yrange(n):
   i = 0
    while i < n:
        yield i
        i += 1

Each time the yield statement is executed the function generates a new value.

每次执行 yield 语句时,该函数都会生成一个新值。

>>> y = yrange(3)
>>> y
<generator object yrange at 0x401f30>
>>> y.next()
0
>>> y.next()
1
>>> y.next()
2
>>> y.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

StopIteration

停止迭代

So a generator is also an iterator. You don't have to worry about the iterator protocol.

所以生成器也是迭代器。您不必担心迭代器协议。

The word “generator” is confusingly used to mean both the function that generates and what it generates. In this chapter, I'll use the word “generator” to mean the generated object and “generator function” to mean the function that generates it.

“生成器”一词被混淆地用于表示生成的函数和它生成的内容。在本章中,我将使用“generator”一词来表示生成的对象,“generator function”表示生成它的函数。

Can you think about how it is working internally?

你能考虑一下它在内部是如何运作的吗?

When a generator function is called, it returns a generator object without even beginning execution of the function. When next method is called for the first time, the function starts executing until it reaches yield statement. The yielded value is returned by the next call.

当一个生成器函数被调用时,它甚至没有开始执行函数就返回一个生成器对象。第一次调用 next 方法时,函数开始执行,直到到达 yield 语句。下一次调用返回产生的值。

The following example demonstrates the interplay between yield and call to next method on generator object.

以下示例演示了 yield 和调用生成器对象上的 next 方法之间的相互作用。

>>> def foo():
...     print "begin"
...     for i in range(3):
...         print "before yield", i
...         yield i
...         print "after yield", i
...     print "end"
...
>>> f = foo()
>>> f.next()
begin
before yield 0
0
>>> f.next()
after yield 0
before yield 1
1
>>> f.next()
after yield 1
before yield 2
2
>>> f.next()
after yield 2
end
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

StopIteration

停止迭代

Lets see an example:

让我们看一个例子:

def integers():
    """Infinite sequence of integers."""
    i = 1
    while True:
        yield i
        i = i + 1

def squares():
    for i in integers():
        yield i * i

def take(n, seq):
    """Returns first n values from the given sequence."""
    seq = iter(seq)
    result = []
    try:
        for i in range(n):
            result.append(seq.next())
    except StopIteration:
        pass
    return result

print take(5, squares()) # prints [1, 4, 9, 16, 25]