Python group by
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3749512/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python group by
提问by Hellnar
Assume that I have a set of data pair where index 0is the value and index 1is the type:
假设我有一组数据对,其中索引 0是值,索引 1是类型:
input = [
('11013331', 'KAT'),
('9085267', 'NOT'),
('5238761', 'ETH'),
('5349618', 'ETH'),
('11788544', 'NOT'),
('962142', 'ETH'),
('7795297', 'ETH'),
('7341464', 'ETH'),
('9843236', 'KAT'),
('5594916', 'ETH'),
('1550003', 'ETH')
]
I want to group them by their type (by the 1st indexed string) as such:
我想按它们的类型(按第一个索引字符串)对它们进行分组,如下所示:
result = [
{
type:'KAT',
items: ['11013331', '9843236']
},
{
type:'NOT',
items: ['9085267', '11788544']
},
{
type:'ETH',
items: ['5238761', '962142', '7795297', '7341464', '5594916', '1550003']
}
]
How can I achieve this in an efficient way?
我怎样才能以有效的方式实现这一目标?
采纳答案by kennytm
Do it in 2 steps. First, create a dictionary.
分两步完成。首先,创建一个字典。
>>> input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')]
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for v, k in input: res[k].append(v)
...
Then, convert that dictionary into the expected format.
然后,将该字典转换为预期的格式。
>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}]
It is also possible with itertools.groupby but it requires the input to be sorted first.
itertools.groupby 也可以,但它需要先对输入进行排序。
>>> sorted_input = sorted(input, key=itemgetter(1))
>>> groups = groupby(sorted_input, key=itemgetter(1))
>>> [{'type':k, 'items':[x[0] for x in v]} for k, v in groups]
[{'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}]
Note both of these do not respect the original order of the keys. You need an OrderedDict if you need to keep the order.
请注意,这两个都不尊重键的原始顺序。如果您需要保留订单,则需要 OrderedDict。
>>> from collections import OrderedDict
>>> res = OrderedDict()
>>> for v, k in input:
... if k in res: res[k].append(v)
... else: res[k] = [v]
...
>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}]
回答by PaulMcG
Python's built-in itertoolsmodule actually has a groupbyfunction , but for that the elements to be grouped must first be sorted such that the elements to be grouped are contiguous in the list:
Python 的内置itertools模块实际上有一个groupbyfunction ,但为此必须先对要分组的元素进行排序,以便要分组的元素在列表中是连续的:
from operator import itemgetter
sortkeyfn = itemgetter(1)
input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'),
('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'),
('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')]
input.sort(key=sortkeyfn)
Now input looks like:
现在输入看起来像:
[('5238761', 'ETH'), ('5349618', 'ETH'), ('962142', 'ETH'), ('7795297', 'ETH'),
('7341464', 'ETH'), ('5594916', 'ETH'), ('1550003', 'ETH'), ('11013331', 'KAT'),
('9843236', 'KAT'), ('9085267', 'NOT'), ('11788544', 'NOT')]
groupbyreturns a sequence of 2-tuples, of the form (key, values_iterator). What we want is to turn this into a list of dicts where the 'type' is the key, and 'items' is a list of the 0'th elements of the tuples returned by the values_iterator. Like this:
groupby返回形式为 的 2 元组序列(key, values_iterator)。我们想要的是把它变成一个 dicts 列表,其中 'type' 是键,'items' 是 values_iterator 返回的元组的第 0 个元素的列表。像这样:
from itertools import groupby
result = []
for key,valuesiter in groupby(input, key=sortkeyfn):
result.append(dict(type=key, items=list(v[0] for v in valuesiter)))
Now resultcontains your desired dict, as stated in your question.
现在result包含您想要的字典,如您的问题所述。
You might consider, though, just making a single dict out of this, keyed by type, and each value containing the list of values. In your current form, to find the values for a particular type, you'll have to iterate over the list to find the dict containing the matching 'type' key, and then get the 'items' element from it. If you use a single dict instead of a list of 1-item dicts, you can find the items for a particular type with a single keyed lookup into the master dict. Using groupby, this would look like:
不过,您可能会考虑仅从中制作一个单独的 dict,按类型键入,每个值都包含值列表。在您当前的表单中,要查找特定类型的值,您必须遍历列表以查找包含匹配“type”键的 dict,然后从中获取“items”元素。如果您使用单个 dict 而不是 1 项 dict 的列表,则可以通过对主 dict 的单键查找来找到特定类型的项。使用groupby,这看起来像:
result = {}
for key,valuesiter in groupby(input, key=sortkeyfn):
result[key] = list(v[0] for v in valuesiter)
resultnow contains this dict (this is similar to the intermediate resdefaultdict in @KennyTM's answer):
result现在包含这个字典(这类似于res@KennyTM 的答案中的中间defaultdict):
{'NOT': ['9085267', '11788544'],
'ETH': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'],
'KAT': ['11013331', '9843236']}
(If you want to reduce this to a one-liner, you can:
(如果您想将其减少为单行,您可以:
result = dict((key,list(v[0] for v in valuesiter)
for key,valuesiter in groupby(input, key=sortkeyfn))
or using the newfangled dict-comprehension form:
或使用新奇的 dict-comprehension 形式:
result = {key:list(v[0] for v in valuesiter)
for key,valuesiter in groupby(input, key=sortkeyfn)}
回答by mmj
The following function will quickly (no sortingrequired) group tuples of any length by a key having any index:
以下函数将通过具有任何索引的键快速(无需排序)对任何长度的元组进行分组:
# given a sequence of tuples like [(3,'c',6),(7,'a',2),(88,'c',4),(45,'a',0)],
# returns a dict grouping tuples by idx-th element - with idx=1 we have:
# if merge is True {'c':(3,6,88,4), 'a':(7,2,45,0)}
# if merge is False {'c':((3,6),(88,4)), 'a':((7,2),(45,0))}
def group_by(seqs,idx=0,merge=True):
d = dict()
for seq in seqs:
k = seq[idx]
v = d.get(k,tuple()) + (seq[:idx]+seq[idx+1:] if merge else (seq[:idx]+seq[idx+1:],))
d.update({k:v})
return d
In the case of your question, the index of key you want to group by is 1, therefore:
对于您的问题,您要分组的键的索引为 1,因此:
group_by(input,1)
gives
给
{'ETH': ('5238761','5349618','962142','7795297','7341464','5594916','1550003'),
'KAT': ('11013331', '9843236'),
'NOT': ('9085267', '11788544')}
which is not exactly the output you asked for, but might as well suit your needs.
这不完全是您要求的输出,但也可能适合您的需求。
回答by akiva
回答by akiva
result = []
# Make a set of your "types":
input_set = set([tpl[1] for tpl in input])
>>> set(['ETH', 'KAT', 'NOT'])
# Iterate over the input_set
for type_ in input_set:
# a dict to gather things:
D = {}
# filter all tuples from your input with the same type as type_
tuples = filter(lambda tpl: tpl[1] == type_, input)
# write them in the D:
D["type"] = type_
D["itmes"] = [tpl[0] for tpl in tuples]
# append D to results:
result.append(D)
result
>>> [{'itmes': ['9085267', '11788544'], 'type': 'NOT'}, {'itmes': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'itmes': ['11013331', '9843236'], 'type': 'KAT'}]
回答by ronen
This answer is similar to @PaulMcG's answerbut doesn't require sorting the input.
这个答案类似于@PaulMcG 的答案,但不需要对输入进行排序。
For those into functional programming, groupBycan be written in one line (not including imports!), and unlike itertools.groupbyit doesn't require the input to be sorted:
对于那些进入函数式编程的人,groupBy可以写在一行中(不包括导入!),不像itertools.groupby它不需要对输入进行排序:
from functools import reduce # import needed for python3; builtin in python2
from collections import defaultdict
def groupBy(key, seq):
return reduce(lambda grp, val: grp[key(val)].append(val) or grp, seq, defaultdict(list))
(The reason for ... or grpin the lambdais that for this reduce()to work, the lambdaneeds to return its first argument; because list.append()always returns Nonethe orwill always return grp. I.e. it's a hack to get around python's restriction that a lambda can only evaluate a single expression.)
(原因... or grp的lambda是,为了这个reduce()工作中,lambda需要返回它的第一个参数,因为list.append()总是返回None的or总是会返回grp。也就是说,它是一个黑客绕过Python的限制,即在拉姆达只能计算一个表达式。)
This returns a dict whose keys are found by evaluating the given function and whose values are a list of the original items in the original order. For the OP's example, calling this as groupBy(lambda pair: pair[1], input)will return this dict:
这将返回一个 dict,其键是通过评估给定函数找到的,其值是原始顺序中原始项目的列表。对于 OP 的示例,调用 asgroupBy(lambda pair: pair[1], input)将返回此字典:
{'KAT': [('11013331', 'KAT'), ('9843236', 'KAT')],
'NOT': [('9085267', 'NOT'), ('11788544', 'NOT')],
'ETH': [('5238761', 'ETH'), ('5349618', 'ETH'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('5594916', 'ETH'), ('1550003', 'ETH')]}
And as per @PaulMcG's answerthe OP's requested format can be found by wrapping that in a list comprehension. So this will do it:
根据@PaulMcG 的回答,可以通过将其包装在列表理解中来找到 OP 请求的格式。所以这会做到:
result = {key: [pair[0] for pair in values],
for key, values in groupBy(lambda pair: pair[1], input).items()}

