Python 如何快速将字典拆分为多个字典

Question

提问by badc0re

I have found a solution but it is really slow:

我找到了一个解决方案，但它真的很慢：

def chunks(self,data, SIZE=10000):
    for i in xrange(0, len(data), SIZE):
        yield dict(data.items()[i:i+SIZE])

Do you have any ideas without using external modules (numpy and etc.)

在不使用外部模块（numpy 等）的情况下，您有什么想法吗？

Answer 1

采纳答案by thefourtheye

Since the dictionary is so big, it would be better to keep all the items involved to be just iterators and generators, like this

由于字典太大，最好将所有涉及的项目保留为迭代器和生成器，就像这样

from itertools import islice

def chunks(data, SIZE=10000):
    it = iter(data)
    for i in xrange(0, len(data), SIZE):
        yield {k:data[k] for k in islice(it, SIZE)}

Sample run:

示例运行：

for item in chunks({i:i for i in xrange(10)}, 3):
    print item

Output

输出

{0: 0, 1: 1, 2: 2}
{3: 3, 4: 4, 5: 5}
{8: 8, 6: 6, 7: 7}
{9: 9}

Answer 2

回答by ndpu

Another method is iterators zipping:

另一种方法是迭代器压缩：

>>> from itertools import izip_longest, ifilter
>>> d = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6, 'g':7, 'h':8}

Create a list with copies of dict iterators (number of copies is number of elements in result dicts). By passing each iterator from chunkslist to izip_longestyou will get needed number of elements from source dict (ifilterused to remove Nonefrom zip results). With generator expression you can lower memory usage:

创建一个包含 dict 迭代器副本的列表（副本数是结果 dict 中的元素数）。通过将chunks列表中的每个迭代器传递给izip_longest您，您将从源字典中获得所需数量的元素（ifilter用于None从 zip 结果中删除）。使用生成器表达式，您可以降低内存使用量：

>>> chunks = [d.iteritems()]*3
>>> g = (dict(ifilter(None, v)) for v in izip_longest(*chunks))
>>> list(g)
[{'a': 1, 'c': 3, 'b': 2},
 {'e': 5, 'd': 4, 'g': 7},
 {'h': 8, 'f': 6}]

Answer 3

回答by gies0r

import numpy as np
chunk_size = 3
chunked_data = [[k, v] for k, v in d.items()]
chunked_data = np.array_split(chunked_data, chunk_size)

Afterwards you have ndarraywhich is iterable like this:

之后，您可以ndarray像这样迭代：

for chunk in chunked_data:
    for key, value in chunk:
        print(key)
        print(value)

Which could be re-assigned to a list of dicts using a simple for loop.

可以使用简单的 for 循环将其重新分配给字典列表。

Answer 4

回答by Pratibha Gupta

This code takes a large dictionary and splits it into a list of small dictionaries. max_limit variable is to tell maximum number of key-value pairs allowed in a sub-dictionary. This code doesn't take much effort to break the dictionary, just one complete parsing over the dictionary object.

这段代码需要一个大字典并将其拆分为一个小字典列表。max_limit 变量是告诉子字典中允许的最大键值对数。这段代码不需要花费太多精力来破解字典，只需对字典对象进行一次完整的解析。

import copy
def split_dict_to_multiple(input_dict, max_limit=200):
"""Splits dict into multiple dicts with given maximum size. 
Returns a list of dictionaries."""
chunks = []
curr_dict ={}
for k, v in input_dict.items():
    if len(curr_dict.keys()) < max_limit:
        curr_dict.update({k: v})
    else:
        chunks.append(copy.deepcopy(curr_dict))
        curr_dict = {k: v}
# update last curr_dict
chunks.append(curr_dict)
return chunks

Python 如何快速将字典拆分为多个字典

提问by badc0re

采纳答案by thefourtheye

回答by ndpu

回答by gies0r

回答by Pratibha Gupta

相关推荐

最近更新

标签

Python 如何快速将字典拆分为多个字典

提问by badc0re

采纳答案by thefourtheye

回答by ndpu

回答by gies0r

回答by Pratibha Gupta

相关推荐

Python Logistic 回归中正则化强度的倒数是多少？它应该如何影响我的代码？

Python 在flask应用程序启动后运行代码

Python 编码约定“块前连续缩进错误：由 pylint 发现

(python) [Errno 11001] getaddrinfo 失败

相关推荐

最近更新

标签