在 Python 中获取迭代器中的元素数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3345785/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:34:03  来源:igfitidea点击:

Getting number of elements in an iterator in Python

pythoniterator

提问by

Is there an efficient way to know how many elements are in an iterator in Python, in general, without iterating through each and counting?

有没有一种有效的方法来知道 Python 中的迭代器中有多少元素,一般来说,无需遍历每个元素并进行计数?

采纳答案by Tomasz Wysocki

No. It's not possible.

不,这是不可能的。

Example:

例子:

import random

def gen(n):
    for i in xrange(n):
        if random.randint(0, 1) == 0:
            yield i

iterator = gen(10)

Length of iteratoris unknown until you iterate through it.

的长度iterator是未知的,直到您遍历它为止。

回答by Daenyth

No, any method will require you to resolve every result. You can do

不,任何方法都需要您解决每个结果。你可以做

iter_length = len(list(iterable))

but running that on an infinite iterator will of course never return. It also will consume the iterator and it will need to be reset if you want to use the contents.

但是在无限迭代器上运行它当然永远不会返回。它还将消耗迭代器,如果您想使用其中的内容,则需要对其进行重置。

Telling us what real problem you're trying to solve might help us find you a better way to accomplish your actual goal.

告诉我们您要解决的实际问题可能会帮助我们找到更好的方法来实现您的实际目标。

Edit: Using list()will read the whole iterable into memory at once, which may be undesirable. Another way is to do

编辑:使用list()会将整个可迭代对象一次读入内存,这可能是不可取的。另一种方法是做

sum(1 for _ in iterable)

as another person posted. That will avoid keeping it in memory.

作为另一个人发布。这将避免将其保留在内存中。

回答by John Howard

This code should work:

此代码应该工作:

>>> iter = (i for i in range(50))
>>> sum(1 for _ in iter)
50

Although it does iterate through each item and count them, it is the fastest way to do so.

尽管它确实遍历每个项目并对其进行计数,但这是最快的方法。

It also works for when the iterator has no item:

它也适用于迭代器没有项目的情况:

>>> sum(1 for _ in range(0))
0

Of course, it runs forever for an infinite input, so remember that iterators can be infinite:

当然,对于无限输入,它永远运行,所以请记住迭代器可以是无限的:

>>> sum(1 for _ in itertools.count())
[nothing happens, forever]

Also, be aware that the iterator will be exhaustedby doing this, and further attempts to use it will see no elements. That's an unavoidable consequence of the Python iterator design. If you want to keep the elements, you'll have to store them in a list or something.

另外,请注意,这样做会耗尽迭代器,进一步尝试使用它时将看不到元素。这是 Python 迭代器设计不可避免的结果。如果要保留元素,则必须将它们存储在列表或其他内容中。

回答by Jesus Ramos

An iterator is just an object which has a pointer to the next object to be read by some kind of buffer or stream, it's like a LinkedList where you don't know how many things you have until you iterate through them. Iterators are meant to be efficient because all they do is tell you what is next by references instead of using indexing (but as you saw you lose the ability to see how many entries are next).

迭代器只是一个对象,它有一个指向下一个要由某种缓冲区或流读取的对象的指针,它就像一个 LinkedList,在那里你不知道你有多少东西,直到你遍历它们。迭代器旨在高效,因为它们所做的只是通过引用而不是使用索引告诉您接下来是什么(但正如您所见,您无法查看接下来有多少条目)。

回答by Wayne Werner

There are two ways to get the length of "something" on a computer.

有两种方法可以在计算机上获得“某物”的长度。

The first way is to store a count - this requires anything that touches the file/data to modify it (or a class that only exposes interfaces -- but it boils down to the same thing).

第一种方法是存储一个计数——这需要任何接触文件/数据的东西来修改它(或者一个只公开接口的类——但它归结为同样的事情)。

The other way is to iterate over it and count how big it is.

另一种方法是迭代它并计算它有多大。

回答by badp

Kinda. You couldcheck the __length_hint__method, but be warned that (at least up to Python 3.4, as gsnedders helpfully points out) it's a undocumented implementation detail(following message in thread), that could very well vanish or summon nasal demons instead.

有点。您可以检查该__length_hint__方法,但请注意(至少到 Python 3.4,正如 gsnedders 所指出的那样)这是一个未记录的实现细节线程中的以下消息),它很可能会消失或召唤鼻恶魔。

Otherwise, no. Iterators are just an object that only expose the next()method. You can call it as many times as required and they may or may not eventually raise StopIteration. Luckily, this behaviour is most of the time transparent to the coder. :)

否则,没有。迭代器只是一个只公开next()方法的对象。您可以根据需要多次调用它,它们最终可能会也可能不会加注StopIteration。幸运的是,这种行为大多数时候对编码人员是透明的。:)

回答by tom10

It's common practice to put this type of information in the file header, and for pysam to give you access to this. I don't know the format, but have you checked the API?

将这种类型的信息放在文件头中是一种常见的做法,并且 pysam 允许您访问它。我不知道格式,但你检查过API吗?

As others have said, you can't know the length from the iterator.

正如其他人所说,您无法知道迭代器的长度。

回答by Kevin Jacobs

Regarding your original question, the answer is still that there is no way in general to know the length of an iterator in Python.

关于您的原始问题,答案仍然是通常无法知道 Python 中迭代器的长度。

Given that you question is motivated by an application of the pysam library, I can give a more specific answer: I'm a contributer to PySAM and the definitive answer is that SAM/BAM files do not provide an exact count of aligned reads. Nor is this information easily available from a BAM index file. The best one can do is to estimate the approximate number of alignments by using the location of the file pointer after reading a number of alignments and extrapolating based on the total size of the file. This is enough to implement a progress bar, but not a method of counting alignments in constant time.

鉴于您的问题是由 pysam 库的应用程序激发的,我可以给出更具体的答案:我是 PySAM 的贡献者,最终答案是 SAM/BAM 文件不提供对齐读取的精确计数。从 BAM 索引文件中也不容易获得此信息。最好的办法是在读取多个对齐并根据文件的总大小进行推断后,通过使用文件指针的位置来估计对齐的近似数量。这足以实现进度条,但不是在恒定时间内计算对齐的方法。

回答by zuo

You cannot (except the type of a particular iterator implements some specific methods that make it possible).

您不能(除非特定迭代器的类型实现了一些使其成为可能的特定方法)。

Generally, you may count iterator items only by consuming the iterator. One of probably the most efficient ways:

通常,您只能通过使用迭代器来计算迭代器项。可能是最有效的方法之一:

import itertools
from collections import deque

def count_iter_items(iterable):
    """
    Consume an iterable not reading it into memory; return the number of items.
    """
    counter = itertools.count()
    deque(itertools.izip(iterable, counter), maxlen=0)  # (consume at C speed)
    return next(counter)

(For Python 3.x replace itertools.izipwith zip).

(对于 Python 3.x 替换itertools.izipzip)。

回答by FCAlive

This is against the very definition of an iterator, which is a pointer to an object, plus information about how to get to the next object.

这与迭代器的定义背道而驰,迭代器是一个指向对象的指针,加上有关如何到达下一个对象的信息。

An iterator does not know how many more times it will be able to iterate until terminating. This could be infinite, so infinity might be your answer.

迭代器不知道在终止之前它还能迭代多少次。这可能是无限的,所以无限可能是你的答案。