Python 如何快速将字典拆分为多个字典

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22878743/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:52:27  来源:igfitidea点击:

How to split dictionary into multiple dictionaries fast

pythonpython-2.7dictionary

提问by badc0re

I have found a solution but it is really slow:

我找到了一个解决方案,但它真的很慢:

def chunks(self,data, SIZE=10000):
    for i in xrange(0, len(data), SIZE):
        yield dict(data.items()[i:i+SIZE])

Do you have any ideas without using external modules (numpy and etc.)

在不使用外部模块(numpy 等)的情况下,您有什么想法吗?

采纳答案by thefourtheye

Since the dictionary is so big, it would be better to keep all the items involved to be just iterators and generators, like this

由于字典太大,最好将所有涉及的项目保留为迭代器和生成器,就像这样

from itertools import islice

def chunks(data, SIZE=10000):
    it = iter(data)
    for i in xrange(0, len(data), SIZE):
        yield {k:data[k] for k in islice(it, SIZE)}

Sample run:

示例运行:

for item in chunks({i:i for i in xrange(10)}, 3):
    print item

Output

输出

{0: 0, 1: 1, 2: 2}
{3: 3, 4: 4, 5: 5}
{8: 8, 6: 6, 7: 7}
{9: 9}

回答by ndpu

Another method is iterators zipping:

另一种方法是迭代器压缩:

>>> from itertools import izip_longest, ifilter
>>> d = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6, 'g':7, 'h':8}

Create a list with copies of dict iterators (number of copies is number of elements in result dicts). By passing each iterator from chunkslist to izip_longestyou will get needed number of elements from source dict (ifilterused to remove Nonefrom zip results). With generator expression you can lower memory usage:

创建一个包含 dict 迭代器副本的列表(副本数是结果 dict 中的元素数)。通过将chunks列表中的每个迭代器传递给izip_longest您,您将从源字典中获得所需数量的元素(ifilter用于None从 zip 结果中删除)。使用生成器表达式,您可以降低内存使用量:

>>> chunks = [d.iteritems()]*3
>>> g = (dict(ifilter(None, v)) for v in izip_longest(*chunks))
>>> list(g)
[{'a': 1, 'c': 3, 'b': 2},
 {'e': 5, 'd': 4, 'g': 7},
 {'h': 8, 'f': 6}]

回答by gies0r

import numpy as np
chunk_size = 3
chunked_data = [[k, v] for k, v in d.items()]
chunked_data = np.array_split(chunked_data, chunk_size)

Afterwards you have ndarraywhich is iterable like this:

之后,您可以ndarray像这样迭代:

for chunk in chunked_data:
    for key, value in chunk:
        print(key)
        print(value)

Which could be re-assigned to a list of dicts using a simple for loop.

可以使用简单的 for 循环将其重新分配给字典列表。

回答by Pratibha Gupta

This code takes a large dictionary and splits it into a list of small dictionaries. max_limit variable is to tell maximum number of key-value pairs allowed in a sub-dictionary. This code doesn't take much effort to break the dictionary, just one complete parsing over the dictionary object.

这段代码需要一个大字典并将其拆分为一个小字典列表。max_limit 变量是告诉子字典中允许的最大键值对数。这段代码不需要花费太多精力来破解字典,只需对字典对象进行一次完整的解析。

import copy
def split_dict_to_multiple(input_dict, max_limit=200):
"""Splits dict into multiple dicts with given maximum size. 
Returns a list of dictionaries."""
chunks = []
curr_dict ={}
for k, v in input_dict.items():
    if len(curr_dict.keys()) < max_limit:
        curr_dict.update({k: v})
    else:
        chunks.append(copy.deepcopy(curr_dict))
        curr_dict = {k: v}
# update last curr_dict
chunks.append(curr_dict)
return chunks