Python for 循环和迭代器行为
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29403401/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python for loop and iterator behavior
提问by Matteo
I wanted to understand a bit more about iterators
, so please correct me if I'm wrong.
我想了解更多关于iterators
,所以如果我错了,请纠正我。
An iterator is an object which has a pointer to the next object and is read as a buffer or stream (i.e. a linked list). They're particularly efficient cause all they do is tell you what is next by references instead of using indexing.
迭代器是一个对象,它具有指向下一个对象的指针,并作为缓冲区或流(即链表)读取。它们特别有效,因为它们所做的只是通过引用而不是使用索引告诉您下一步是什么。
However I still don't understand why is the following behavior happening:
但是我仍然不明白为什么会发生以下行为:
In [1]: iter = (i for i in range(5))
In [2]: for _ in iter:
....: print _
....:
0
1
2
3
4
In [3]: for _ in iter:
....: print _
....:
In [4]:
After a first loop through the iterator (In [2]
) it's as if it was consumed and left empty, so the second loop (In [3]
) prints nothing.
在通过迭代器 ( In [2]
)的第一个循环之后,就好像它被消耗了并留空了,因此第二个循环 ( In [3]
) 什么也不打印。
However I never assigned a new value to the iter
variable.
但是,我从未为iter
变量分配过新值。
What is really happening under the hood of the for
loop?
for
循环背后到底发生了什么?
采纳答案by Rick supports Monica
Your suspicion is correct: the iterator has been consumed.
您的怀疑是正确的:迭代器已被消耗。
In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once.
实际上,您的迭代器是一个generator,它是一个只能迭代一次的对象。
type((i for i in range(5))) # says it's type generator
def another_generator():
yield 1 # the yield expression makes it a generator, not a function
type(another_generator()) # also a generator
The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator:
它们高效的原因与“通过引用”告诉您下一步是什么无关。它们是高效的,因为它们只根据请求生成下一个项目;所有项目都不是一次生成的。事实上,你可以有一个无限生成器:
def my_gen():
while True:
yield 1 # again: yield means it is a generator, not a function
for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop!
Some other corrections to help improve your understanding:
一些其他更正有助于提高您的理解:
- The generator is not a pointer, and does not behave like a pointer as you might be familiar with in other languages.
- One of the differences from other languages: as said above, each result of the generator is generated on the fly. The next result is not produced until it is requested.
- The keyword combination
for
in
accepts an iterable object as its second argument. - The iterable object can be a generator, as in your example case, but it can also be any other iterable object, such as a
list
, ordict
, or astr
object (string), or a user-defined type that provides the required functionality. - The
iter
functionis applied to the object to get an iterator (by the way: don't useiter
as a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's__iter__
methodis called (which is, for the most part, all theiter
function does anyway;__iter__
is one of Python's so-called "magic methods"). - If the call to
__iter__
is successful, the functionnext()
is applied to the iterable object over and over again, in a loop, and the first variable supplied tofor
in
is assigned to the result of thenext()
function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's__next__
method, which is another "magic method". - The
for
loop ends whennext()
raises theStopIteration
exception (which usually happens when the iterable does not have another object to yield whennext()
is called).
- 生成器不是指针,它的行为不像您在其他语言中可能熟悉的指针。
- 与其他语言的区别之一:如上所述,生成器的每个结果都是即时生成的。直到被请求时才会产生下一个结果。
- 关键字组合
for
in
接受一个可迭代对象作为其第二个参数。 - 可迭代对象可以是生成器,如您的示例情况,但它也可以是任何其他可迭代对象,例如
list
、 或dict
、 或str
对象(字符串),或提供所需功能的用户定义类型。 - 该
iter
函数应用于对象以获取迭代器(顺便说一句:不要iter
像您所做的那样在 Python 中用作变量名 - 它是关键字之一)。实际上,更准确地说,对象的__iter__
方法被调用(在大多数情况下,iter
函数所做的一切都是如此;__iter__
是 Python 所谓的“魔术方法”之一)。 - 如果调用
__iter__
成功,则函数next()
会在循环中一遍又一遍地应用于可迭代对象,并将提供给的第一个变量for
in
分配给next()
函数的结果。(请记住:可迭代对象可以是生成器,也可以是容器对象的迭代器,或任何其他可迭代对象。)实际上,更准确地说:它调用迭代器对象的__next__
方法,这是另一个“魔术方法”。 - 在
for
当循环结束next()
引发StopIteration
异常(当可迭代不具有另一个目的是产生时通常发生next()
被调用)。
You can "manually" implement a for
loop in python this way (probably not perfect, but close enough):
您可以通过for
这种方式在 python 中“手动”实现一个循环(可能不完美,但足够接近):
try:
temp = iterable.__iter__()
except AttributeError():
raise TypeError("'{}' object is not iterable".format(type(iterable).__name__))
else:
while True:
try:
_ = temp.__next__()
except StopIteration:
break
except AttributeError:
raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__))
# this is the "body" of the for loop
continue
There is pretty much no difference between the above and your example code.
上面的代码和你的示例代码几乎没有区别。
Actually, the more interesting part of a for
loop is not the for
, but the in
. Using in
by itself produces a different effect than for
in
, but it is very useful to understand what in
does with its arguments, since for
in
implements very similar behavior.
实际上,for
循环中更有趣的部分不是for
,而是in
。单独使用in
会产生与 不同的效果for
in
,但了解in
其参数的作用非常有用,因为for
in
实现了非常相似的行为。
When used by itself, the
in
keyword first calls the object's__contains__
method, which is yet another "magic method" (note that this step is skipped when usingfor
in
). Usingin
by itself on a container, you can do things like this:1 in [1, 2, 3] # True 'He' in 'Hello' # True 3 in range(10) # True 'eH' in 'Hello'[::-1] # True
If the iterable object is NOT a container (i.e. it doesn't have a
__contains__
method),in
next tries to call the object's__iter__
method. As was said previously: the__iter__
method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic functionnext()
on1. A generator is just one type of iterator.- If the call to
__iter__
is successful, thein
keyword applies the functionnext()
to the iterable object over and over again. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's__next__
method). - If the object doesn't have a
__iter__
method to return an iterator,in
then falls back on the old-style iteration protocol using the object's__getitem__
method2. - If all of the above attempts fail, you'll get a
TypeError
exception.
in
关键字单独使用时,首先调用对象的__contains__
方法,这又是一个“神奇的方法”(注意,使用时跳过了这一步for
in
)。使用in
一个容器本身,你可以做这样的事情:1 in [1, 2, 3] # True 'He' in 'Hello' # True 3 in range(10) # True 'eH' in 'Hello'[::-1] # True
如果可迭代对象不是容器(即它没有
__contains__
方法),则in
next 尝试调用该对象的__iter__
方法。如前所述:该__iter__
方法返回在 Python 中称为iterator 的内容。基本上,迭代器是一个可以next()
在1上使用内置泛型函数的对象。生成器只是一种迭代器。- 如果调用
__iter__
成功,in
关键字会next()
一遍又一遍地将该函数应用于可迭代对象。(请记住:可迭代对象可以是生成器、容器对象的迭代器或任何其他可迭代对象。)实际上,更准确地说:它调用迭代器对象的__next__
方法)。 - 如果对象没有
__iter__
返回迭代器的方法,in
则使用对象的__getitem__
方法2回退到旧式迭代协议。 - 如果以上所有尝试都失败,您将收到一个
TypeError
异常。
If you wish to create your own object type to iterate over (i.e, you can use for
in
, or just in
, on it), it's useful to know about the yield
keyword, which is used in generators(as mentioned above).
如果您希望创建自己的对象类型以进行迭代(即,您可以在其上使用for
in
或仅使用in
),那么了解生成器中yield
使用的关键字(如上所述)会很有用。
class MyIterable():
def __iter__(self):
yield 1
m = MyIterable()
for _ in m: print(_) # 1
1 in m # True
The presence of yield
turns a function or method into a generator instead of a regular function/method. You don't need the __next__
method if you use a generator (it brings __next__
along with it automatically).
的存在yield
将一个函数或方法变成了一个生成器,而不是一个常规的函数/方法。__next__
如果您使用生成器(它会__next__
自动带来),则不需要该方法。
If you wish to create your own container object type (i.e, you can use in
on it by itself, but NOT for
in
), you just need the __contains__
method.
如果您希望创建自己的容器对象类型(即,您可以单独使用in
它,但不能使用它for
in
),您只需要该__contains__
方法。
class MyUselessContainer():
def __contains__(self, obj):
return True
m = MyUselessContainer()
1 in m # True
'Foo' in m # True
TypeError in m # True
None in m # True
1Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the __next__
and __iter__
methods must be correctlyimplemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the ___next__
method is actually next
(no underscores) in Python 2.
1请注意,要成为迭代器,对象必须实现迭代器协议。这仅意味着必须正确实现__next__
和__iter__
方法(生成器“免费”提供此功能,因此您在使用它们时无需担心)。另请注意,该方法实际上(无下划线)在 Python 2 中。___next__
next
2See this answerfor the different ways to create iterable classes.
2有关 创建可迭代类的不同方法,请参阅此答案。
回答by Marcin
For loop basically calls the next
method of an object that is applied to (__next__
in Python 3).
For 循环基本上调用next
应用于(__next__
在 Python 3 中)的对象的方法。
You can simulate this simply by doing:
您可以简单地通过执行以下操作来模拟:
iter = (i for i in range(5))
print(next(iter))
print(next(iter))
print(next(iter))
print(next(iter))
print(next(iter))
# this prints 1 2 3 4
At this point there is no next element in the input object. So doing this:
此时输入对象中没有下一个元素。所以这样做:
print(next(iter))
Will result in StopIteration
exception thrown. At this point for
will stop. And iterator can be any objectwhich will respond to the next()
function and throws the exception when there are no more elements. It does not have to be any pointer or reference (there are no such things in python anyway in C/C++ sense), linked list, etc.
会导致StopIteration
抛出异常。此时for
会停止。迭代器可以是任何对象,它会响应next()
函数并在没有更多元素时抛出异常。它不必是任何指针或引用(无论如何在 C/C++ 意义上,python 中没有这样的东西)、链表等。
回答by MadMan2064
There is an iterator protocol in python that defines how the for
statement will behave with lists and dicts, and other things that can be looped over.
python中有一个迭代器协议,它定义了for
语句将如何处理列表和字典,以及其他可以循环的东西。
It's in the python docs hereand here.
The way the iterator protocol works typically is in the form of a python generator. We yield
a value as long as we have a value until we reach the end and then we raise StopIteration
迭代器协议的工作方式通常采用 python 生成器的形式。我们yield
有一个价值,只要我们有一个价值,直到我们到达终点然后我们加注StopIteration
So let's write our own iterator:
因此,让我们编写自己的迭代器:
def my_iter():
yield 1
yield 2
yield 3
raise StopIteration()
for i in my_iter():
print i
The result is:
结果是:
1
2
3
A couple of things to note about that. The my_iter is a function. my_iter() returns an iterator.
有几点需要注意。my_iter 是一个函数。my_iter() 返回一个迭代器。
If I had written using iterator like this instead:
如果我使用迭代器这样写:
j = my_iter() #j is the iterator that my_iter() returns
for i in j:
print i #this loop runs until the iterator is exhausted
for i in j:
print i #the iterator is exhausted so we never reach this line
And the result is the same as above. The iter is exhausted by the time we enter the second for loop.
结果和上面一样。当我们进入第二个 for 循环时,迭代器已经耗尽。
But that's rather simplistic what about something more complicated? Perhaps maybe in a loop why not?
但这很简单,那么更复杂的事情呢?也许也许在一个循环中为什么不呢?
def capital_iter(name):
for x in name:
yield x.upper()
raise StopIteration()
for y in capital_iter('bobert'):
print y
And when it runs, we use the iterator on the string type (which is built into iter). This in turn, allows us run a for loop on it, and yield the results until we are done.
当它运行时,我们在字符串类型(内置于iter)上使用迭代器。反过来,这允许我们在其上运行 for 循环,并在我们完成之前产生结果。
B
O
B
E
R
T
So now this begs the question, so what happens between yields in the iterator?
所以现在这就引出了一个问题,那么迭代器中的产量之间会发生什么?
j = capital_iter("bobert")
print i.next()
print i.next()
print i.next()
print("Hey there!")
print i.next()
print i.next()
print i.next()
print i.next() #Raises StopIteration
The answer is the function is paused at the yield waiting for the next call to next().
答案是函数在 yield 处暂停,等待下一次调用 next()。
B
O
B
Hey There!
E
R
T
Traceback (most recent call last):
File "", line 13, in
StopIteration
回答by Abhijit
Concept 1
概念 1
All generators are iterators but all iterators are not generator
所有生成器都是迭代器,但所有迭代器都不是生成器
Concept 2
概念 2
An iterator is an object with a next (Python 2) or next(Python 3) method.
迭代器是具有 next (Python 2) 或next(Python 3) 方法的对象。
Concept 3
概念 3
Quoting from wiki GeneratorsGenerators functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.
引自 wiki GeneratorsGenerators 函数允许您声明一个行为类似于迭代器的函数,即它可以在 for 循环中使用。
In your case
在你的情况下
>>> it = (i for i in range(5))
>>> type(it)
<type 'generator'>
>>> callable(getattr(it, 'iter', None))
False
>>> callable(getattr(it, 'next', None))
True
回答by Ethan Furman
Some additional details about the behaviour of iter()
with __getitem__
classes that lack their own __iter__
method.
关于缺少自己方法的iter()
with__getitem__
类的行为的一些附加细节__iter__
。
Before __iter__
there was __getitem__
. If the __getitem__
works with int
s from 0
- len(obj)-1
, then iter()
supports these objects. It will construct a new iterator that repeatedly calls __getitem__
with 0
, 1
, 2
, ...
until it gets an IndexError
, which it converts to a StopIteration
.
之前__iter__
有__getitem__
。如果__getitem__
与int
s from 0
- 一起使用len(obj)-1
,则iter()
支持这些对象。这将构建一个新的迭代反复调用__getitem__
用0
,1
,2
,...
直到它得到的IndexError
,将其转换成一个StopIteration
。
See this answerfor more details of the different ways to create an iterator.
有关创建迭代器的不同方法的更多详细信息,请参阅此答案。
回答by drewteriyaki
Excerpt from the Python Practice book:
摘自Python 实践书:
5. Iterators & Generators
5. 迭代器和生成器
5.1. Iterators
5.1. 迭代器
We use for statement for looping over a list.
我们使用 for 语句来循环列表。
>>> for i in [1, 2, 3, 4]:
... print i,
...
1
2
3
4
If we use it with a string, it loops over its characters.
如果我们将它与字符串一起使用,它会循环遍历它的字符。
>>> for c in "python":
... print c
...
p
y
t
h
o
n
If we use it with a dictionary, it loops over its keys.
如果我们将它与字典一起使用,它会遍历它的键。
>>> for k in {"x": 1, "y": 2}:
... print k
...
y
x
If we use it with a file, it loops over lines of the file.
如果我们将它与文件一起使用,它会遍历文件的行。
>>> for line in open("a.txt"):
... print line,
...
first line
second line
So there are many types of objects which can be used with a for loop. These are called iterable objects.
因此,有许多类型的对象可以与 for 循环一起使用。这些被称为可迭代对象。
There are many functions which consume these iterables.
有许多函数会消耗这些可迭代对象。
>>> ",".join(["a", "b", "c"])
'a,b,c'
>>> ",".join({"x": 1, "y": 2})
'y,x'
>>> list("python")
['p', 'y', 't', 'h', 'o', 'n']
>>> list({"x": 1, "y": 2})
['y', 'x']
5.1.1. The Iteration Protocol
5.1.1. 迭代协议
The built-in function iter takes an iterable object and returns an iterator.
内置函数 iter 接受一个可迭代对象并返回一个迭代器。
>>> x = iter([1, 2, 3])
>>> x
<listiterator object at 0x1004ca850>
>>> x.next()
1
>>> x.next()
2
>>> x.next()
3
>>> x.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
停止迭代
Each time we call the next method on the iterator gives us the next element. If there are no more elements, it raises a StopIteration.
每次我们在迭代器上调用 next 方法都会给我们下一个元素。如果没有更多元素,它会引发一个 StopIteration。
Iterators are implemented as classes. Here is an iterator that works like built-in xrange function.
迭代器被实现为类。这是一个类似于内置 xrange 函数的迭代器。
class yrange:
def __init__(self, n):
self.i = 0
self.n = n
def __iter__(self):
return self
def next(self):
if self.i < self.n:
i = self.i
self.i += 1
return i
else:
raise StopIteration()
The itermethod is what makes an object iterable. Behind the scenes, the iter function calls itermethod on the given object.
该ITER方法是什么使一个对象迭代。在幕后,iter 函数调用给定对象的iter方法。
The return value of iteris an iterator. It should have a next method and raise StopIteration when there are no more elements.
iter的返回值是一个迭代器。它应该有一个 next 方法并在没有更多元素时引发 StopIteration 。
Lets try it out:
让我们试试看:
>>> y = yrange(3)
>>> y.next()
0
>>> y.next()
1
>>> y.next()
2
>>> y.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 14, in next
StopIteration
停止迭代
Many built-in functions accept iterators as arguments.
许多内置函数接受迭代器作为参数。
>>> list(yrange(5))
[0, 1, 2, 3, 4]
>>> sum(yrange(5))
10
In the above case, both the iterable and iterator are the same object. Notice that the itermethod returned self. It need not be the case always.
在上面的例子中,iterable 和 iterator 都是同一个对象。请注意,iter方法返回了 self。不必总是如此。
class zrange:
def __init__(self, n):
self.n = n
def __iter__(self):
return zrange_iter(self.n)
class zrange_iter:
def __init__(self, n):
self.i = 0
self.n = n
def __iter__(self):
# Iterators are iterables too.
# Adding this functions to make them so.
return self
def next(self):
if self.i < self.n:
i = self.i
self.i += 1
return i
else:
raise StopIteration()
If both iteratable and iterator are the same object, it is consumed in a single iteration.
如果 iterable 和 iterator 是同一个对象,则在一次迭代中使用它。
>>> y = yrange(5)
>>> list(y)
[0, 1, 2, 3, 4]
>>> list(y)
[]
>>> z = zrange(5)
>>> list(z)
[0, 1, 2, 3, 4]
>>> list(z)
[0, 1, 2, 3, 4]
5.2. Generators
5.2. 发电机
Generators simplifies creation of iterators. A generator is a function that produces a sequence of results instead of a single value.
生成器简化了迭代器的创建。生成器是一个函数,它产生一系列结果而不是单个值。
def yrange(n):
i = 0
while i < n:
yield i
i += 1
Each time the yield statement is executed the function generates a new value.
每次执行 yield 语句时,该函数都会生成一个新值。
>>> y = yrange(3)
>>> y
<generator object yrange at 0x401f30>
>>> y.next()
0
>>> y.next()
1
>>> y.next()
2
>>> y.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
停止迭代
So a generator is also an iterator. You don't have to worry about the iterator protocol.
所以生成器也是迭代器。您不必担心迭代器协议。
The word “generator” is confusingly used to mean both the function that generates and what it generates. In this chapter, I'll use the word “generator” to mean the generated object and “generator function” to mean the function that generates it.
“生成器”一词被混淆地用于表示生成的函数和它生成的内容。在本章中,我将使用“generator”一词来表示生成的对象,“generator function”表示生成它的函数。
Can you think about how it is working internally?
你能考虑一下它在内部是如何运作的吗?
When a generator function is called, it returns a generator object without even beginning execution of the function. When next method is called for the first time, the function starts executing until it reaches yield statement. The yielded value is returned by the next call.
当一个生成器函数被调用时,它甚至没有开始执行函数就返回一个生成器对象。第一次调用 next 方法时,函数开始执行,直到到达 yield 语句。下一次调用返回产生的值。
The following example demonstrates the interplay between yield and call to next method on generator object.
以下示例演示了 yield 和调用生成器对象上的 next 方法之间的相互作用。
>>> def foo():
... print "begin"
... for i in range(3):
... print "before yield", i
... yield i
... print "after yield", i
... print "end"
...
>>> f = foo()
>>> f.next()
begin
before yield 0
0
>>> f.next()
after yield 0
before yield 1
1
>>> f.next()
after yield 1
before yield 2
2
>>> f.next()
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
停止迭代
Lets see an example:
让我们看一个例子:
def integers():
"""Infinite sequence of integers."""
i = 1
while True:
yield i
i = i + 1
def squares():
for i in integers():
yield i * i
def take(n, seq):
"""Returns first n values from the given sequence."""
seq = iter(seq)
result = []
try:
for i in range(n):
result.append(seq.next())
except StopIteration:
pass
return result
print take(5, squares()) # prints [1, 4, 9, 16, 25]