python 在python中每n个项目拆分一个生成器/可迭代(splitEvery)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1915170/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 23:20:37  来源:igfitidea点击:

split a generator/iterable every n items in python (splitEvery)

pythoniteratorsplit

提问by James Brooks

I'm trying to write the Haskel function 'splitEvery' in Python. Here is it's definition:

我正在尝试用 Python 编写 Haskel 函数“splitEvery”。这是它的定义:

splitEvery :: Int -> [e] -> [[e]]
    @'splitEvery' n@ splits a list into length-n pieces.  The last
    piece will be shorter if @n@ does not evenly divide the length of
    the list.

The basic version of this works fine, but I want a version that works with generator expressions, lists, and iterators. And, if there is a generator as an input it should return a generator as an output!

这个的基本版本工作正常,但我想要一个可以与生成器表达式、列表和迭代器一起使用的版本。并且,如果有一个生成器作为输入,它应该返回一个生成器作为输出!

Tests

测试

# should not enter infinite loop with generators or lists
splitEvery(itertools.count(), 10)
splitEvery(range(1000), 10)

# last piece must be shorter if n does not evenly divide
assert splitEvery(5, range(9)) == [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

# should give same correct results with generators
tmp = itertools.islice(itertools.count(), 10)
assert list(splitEvery(5, tmp)) == [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

Current Implementation

当前实施

Here is the code I currently have but it doesn't work with a simple list.

这是我目前拥有的代码,但它不适用于简单列表。

def splitEvery_1(n, iterable):
    res = list(itertools.islice(iterable, n))
    while len(res) != 0:
        yield res
        res = list(itertools.islice(iterable, n))

This one doesn't work with a generator expression (thanks to jellybean for fixing it):

这个不适用于生成器表达式(感谢 jellybean 修复它):

def splitEvery_2(n, iterable): 
    return [iterable[i:i+n] for i in range(0, len(iterable), n)]

There has to be a simple piece of code that does the splitting. I know I could just have different functions but it seems like it should be and easy thing to do. I'm probably getting stuck on an unimportant problem but it's really bugging me.

必须有一段简单的代码来进行拆分。我知道我可以有不同的功能,但似乎应该很容易做到。我可能陷入了一个不重要的问题,但这确实困扰着我。



It is similar to grouper from http://docs.python.org/library/itertools.html#itertools.groupbybut I don't want it to fill extra values.

它类似于http://docs.python.org/library/itertools.html#itertools.groupby 中的grouper,但我不希望它填充额外的值。

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

It does mention a method that truncates the last value. This isn't what I want either.

它确实提到了一种截断最后一个值的方法。这也不是我想要的。

The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using izip(*[iter(s)]*n).

可迭代对象的从左到右的评估顺序是有保证的。这使得使用 izip(*[iter(s)]*n) 将数据系列聚类为 n 长度组的习语成为可能。

list(izip(*[iter(range(9))]*5)) == [[0, 1, 2, 3, 4]]
# should be [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

回答by Roberto Bonvallet

from itertools import islice

def split_every(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield piece
        piece = list(islice(i, n))

Some tests:

一些测试:

>>> list(split_every(5, range(9)))
[[0, 1, 2, 3, 4], [5, 6, 7, 8]]

>>> list(split_every(3, (x**2 for x in range(20))))
[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81, 100, 121], [144, 169, 196], [225, 256, 289], [324, 361]]

>>> [''.join(s) for s in split_every(6, 'Hello world')]
['Hello ', 'world']

>>> list(split_every(100, []))
[]

回答by Elliot Cameron

Here's a quick one-liner version. Like Haskell's, it is lazy.

这是一个快速的单行版本。像 Haskell 一样,它很懒惰。

from itertools import islice, takewhile, repeat
split_every = (lambda n, it:
    takewhile(bool, (list(islice(it, n)) for _ in repeat(None))))

This requires that you use iterbeforecalling split_every.

这要求您iter调用之前使用split_every

Example:

例子:

list(split_every(5, iter(xrange(9))))
[[0, 1, 2, 3, 4], [5, 6, 7, 8]]

Although not a one-liner, the version below doesn't require that you call iterwhich can be a common pitfall.

虽然不是单行,但下面的版本不需要您调用iter,这可能是一个常见的陷阱。

from itertools import islice, takewhile, repeat

def split_every(n, iterable):
    """
    Slice an iterable into chunks of n elements
    :type n: int
    :type iterable: Iterable
    :rtype: Iterator
    """
    iterator = iter(iterable)
    return takewhile(bool, (list(islice(iterator, n)) for _ in repeat(None)))

(Thanks to @eli-korvigo for improvements.)

(感谢@eli-korvigo 的改进。)

回答by pylang

more_itertoolshas a chunkedfunction:

more_itertools有一个chunked功能:

import more_itertools as mit


list(mit.chunked(range(9), 5))
# [[0, 1, 2, 3, 4], [5, 6, 7, 8]]

回答by acushner

building off of the accepted answer and employing a lesser-known use of iter(that, when passed a second arg, it calls the first until it receives the second), you can do this really easily:

建立接受的答案并使用鲜为人知的使用iter(即,当传递第二个 arg 时,它调用第一个直到收到第二个),您可以非常轻松地做到这一点:

python3:

蟒蛇3:

from itertools import islice

def split_every(n, iterable):
    iterable = iter(iterable)
    yield from iter(lambda: list(islice(iterable, n)), [])

python2:

蟒蛇2:

def split_every(n, iterable):
    iterable = iter(iterable)
    for chunk in iter(lambda: list(islice(iterable, n)), []):
        yield chunk

回答by Andrey Cizov

A one-liner, inlineable solution to this (supports v2/v3, iterators, uses standard library and a single generator comprehension):

对此的单行、内联解决方案(支持 v2/v3、迭代器、使用标准库和单个生成器理解):

import itertools
def split_groups(iter_in, group_size):
     return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))

回答by Ashley Waite

I came across this as I'm trying to chop up batches too, but doing it on a generator from a stream, so most of the solutions here aren't applicable, or don't work in python 3.

我也遇到了这个问题,因为我也试图切碎批次,但是在来自流的生成器上执行此操作,因此这里的大多数解决方案都不适用,或者在 python 3 中不起作用。

For people still stumbling upon this, here's a general solution using itertools:

对于仍然遇到这个问题的人,这里有一个使用 itertools 的通用解决方案:

from itertools import islice, chain

def iter_in_slices(iterator, size=None):
    while True:
        slice_iter = islice(iterator, size)
        # If no first object this is how StopIteration is triggered
        peek = next(slice_iter)
        # Put the first object back and return slice
        yield chain([peek], slice_iter)

回答by fortran

I think thosequestionsare almost equal

我认为这些问题几乎是平等的

Changing a little bit to crop the last, I think a good solution for the generator case would be:

稍微改变一下以裁剪最后一个,我认为发电机情况的一个很好的解决方案是:

from itertools import *
def iter_grouper(n, iterable):
    it = iter(iterable)
    item = itertools.islice(it, n)
    while item:
        yield item
        item = itertools.islice(it, n)

for the object that supports slices (lists, strings, tuples), we can do:

对于支持切片(列表、字符串、元组)的对象,我们可以这样做:

def slice_grouper(n, sequence):
   return [sequence[i:i+n] for i in range(0, len(sequence), n)]

now it's just a matter of dispatching the correct method:

现在只需发送正确的方法即可:

def grouper(n, iter_or_seq):
    if hasattr(iter_or_seq, "__getslice__"):
        return slice_grouper(n, iter_or_seq)
    elif hasattr(iter_or_seq, "__iter__"):
        return iter_grouper(n, iter_or_seq)

I think you could polish it a little bit more :-)

我想你可以再润色一点:-)

回答by justhalf

This is an answer that works for both list and generator:

这是一个适用于列表和生成器的答案:

from itertools import count, groupby
def split_every(size, iterable):
    c = count()
    for k, g in groupby(iterable, lambda x: next(c)//size):
        yield list(g) # or yield g if you want to output a generator

回答by Johannes Charra

Why not do it like this? Looks almost like your splitEvery_2function.

为什么不这样做呢?看起来几乎像你的splitEvery_2功能。

def splitEveryN(n, it):
    return [it[i:i+n] for i in range(0, len(it), n)]

Actually it only takes away the unnecessary step interval from the slice in your solution. :)

实际上,它只会从解决方案中的切片中去除不必要的步骤间隔。:)

回答by Hamish Grubijan

Here is how you deal with list vs iterator:

以下是您如何处理列表与迭代器:

def isList(L): # Implement it somehow - returns True or false
...
return (list, lambda x:x)[int(islist(L))](result)