python 如何按值过滤字典?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1241029/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:45:46  来源:igfitidea点击:

How to filter a dictionary by value?

pythondictionary

提问by Triptych

Newbie question here, so please bear with me.

新手问题在这里,所以请耐心等待。

Let's say I have a dictionary looking like this:

假设我有一本看起来像这样的字典:

a = {"2323232838": ("first/dir", "hello.txt"),
     "2323221383": ("second/dir", "foo.txt"),
     "3434221": ("first/dir", "hello.txt"),
     "32232334": ("first/dir", "hello.txt"),
     "324234324": ("third/dir", "dog.txt")}

I want all values that are equal to each other to be moved into another dictionary.

我希望将所有彼此相等的值移动到另一个字典中。

matched = {"2323232838": ("first/dir", "hello.txt"),
           "3434221":    ("first/dir", "hello.txt"),
           "32232334":   ("first/dir", "hello.txt")}

And the remaining unmatched items should be looking like this:

其余未匹配的项目应如下所示:

remainder = {"2323221383": ("second/dir", "foo.txt"),
             "324234324":  ("third/dir", "dog.txt")}

Thanks in advance, and if you provide an example, please comment it as much as possible.

提前致谢,如果您提供示例,请尽可能多地评论。

回答by Triptych

The code below will result in two variables, matchesand remainders. matchesis an array of dictionaries, in which matching items from the original dictionary will have a corresponding element. remainderwill contain, as in your example, a dictionary containing all the unmatched items.

下面的代码将产生两个变量,matchesremainders. matches是一个字典数组,其中来自原始字典的匹配项将具有相应的元素。 remainder将包含,如在您的示例中,包含所有不匹配项的字典。

Note that in your example, there is only one set of matching values: ('first/dir', 'hello.txt'). If there were more than one set, each would have a corresponding entry in matches.

请注意,在您的示例中,只有一组匹配值:('first/dir', 'hello.txt'). 如果有多个集合,则每个集合在matches.

import itertools

# Original dict
a = {"2323232838": ("first/dir", "hello.txt"),
     "2323221383": ("second/dir", "foo.txt"),
     "3434221": ("first/dir", "hello.txt"),
     "32232334": ("first/dir", "hello.txt"),
     "324234324": ("third/dir", "dog.txt")}

# Convert dict to sorted list of items
a = sorted(a.items(), key=lambda x:x[1])

# Group by value of tuple
groups = itertools.groupby(a, key=lambda x:x[1])

# Pull out matching groups of items, and combine items   
# with no matches back into a single dictionary
remainder = []
matched   = []

for key, group in groups:
   group = list(group)
   if len(group) == 1:
      remainder.append( group[0] )
   else:
      matched.append( dict(group) )
else:
   remainder = dict(remainder)

Output:

输出:

>>> matched
[
  {
    '3434221':    ('first/dir', 'hello.txt'), 
    '2323232838': ('first/dir', 'hello.txt'), 
    '32232334':   ('first/dir', 'hello.txt')
  }
]

>>> remainder
{
  '2323221383': ('second/dir', 'foo.txt'), 
  '324234324':  ('third/dir', 'dog.txt')
}

As a newbie, you're probably being introduced to a few unfamiliar concepts in the code above. Here are some links:

作为新手,您可能会在上面的代码中接触到一些不熟悉的概念。以下是一些链接:

回答by S.Lott

What you're asking for is called an "Inverted Index" -- the distinct items are recorded just once with a list of keys.

您所要求的称为“倒排索引”——不同的项目只用一个键列表记录一次。

>>> from collections import defaultdict
>>> a = {"2323232838": ("first/dir", "hello.txt"),
...      "2323221383": ("second/dir", "foo.txt"),
...      "3434221": ("first/dir", "hello.txt"),
...      "32232334": ("first/dir", "hello.txt"),
...      "324234324": ("third/dir", "dog.txt")}
>>> invert = defaultdict( list )
>>> for key, value in a.items():
...     invert[value].append( key )
... 
>>> invert
defaultdict(<type 'list'>, {('first/dir', 'hello.txt'): ['3434221', '2323232838', '32232334'], ('second/dir', 'foo.txt'): ['2323221383'], ('third/dir', 'dog.txt'): ['324234324']})

The inverted dictionary has the original values associated with a list of 1 or more keys.

倒排字典具有与 1 个或多个键的列表相关联的原始值。

Now, to get your revised dictionaries from this.

现在,从这里获取您修订的词典。

Filtering:

过滤:

>>> [ invert[multi] for multi in invert if len(invert[multi]) > 1 ]
[['3434221', '2323232838', '32232334']]
>>> [ invert[uni] for uni in invert if len(invert[uni]) == 1 ]
[['2323221383'], ['324234324']]

Expanding

扩大

>>> [ (i,multi) for multi in invert if len(invert[multi]) > 1 for i in invert[multi] ]
[('3434221', ('first/dir', 'hello.txt')), ('2323232838', ('first/dir', 'hello.txt')), ('32232334', ('first/dir', 'hello.txt'))]
>>> dict( (i,multi) for multi in invert if len(invert[multi]) > 1 for i in invert[multi] )
{'3434221': ('first/dir', 'hello.txt'), '2323232838': ('first/dir', 'hello.txt'), '32232334': ('first/dir', 'hello.txt')}

A similar (but simpler) treatment works for the items which occur once.

类似(但更简单)的处理适用于出现一次的项目。

回答by Avihu Turzion

Iterating over a dictionary is no different from iterating over a list in python:

迭代字典与迭代 Python 中的列表没有区别:

for key in dic:
    print("dic[%s] = %s" % (key, dic[key]))

This will print all of the keys and values of your dictionary.

这将打印字典的所有键和值。

回答by buster

I assume that your unique id will be the key.
Probably not very beautiful, but returns a dict with your unique values:

我假设您的唯一 ID 将是关键。
可能不是很漂亮,但会返回一个带有您独特值的字典:

>>> dict_ = {'1': ['first/dir', 'hello.txt'],
'3': ['first/dir', 'foo.txt'], 
'2': ['second/dir', 'foo.txt'], 
'4': ['second/dir', 'foo.txt']}  
>>> dict((v[0]+v[1],k) for k,v in dict_.iteritems())  
{'second/dir/foo.txt': '4', 'first/dir/hello.txt': '1', 'first/dir/foo.txt': '3'}  

I've seen you updated your post:

我看到你更新了你的帖子:

>>> a
{'324234324': ('third/dir', 'dog.txt'), 
'2323221383': ('second/dir', 'foo.txt'), 
'3434221': ('first/dir', 'hello.txt'), 
'2323232838': ('first/dir', 'hello.txt'), 
'32232334': ('first/dir', 'hello.txt')}
>>> dict((v[0]+"/"+v[1],k) for k,v in a.iteritems())
{'second/dir/foo.txt': '2323221383', 
'first/dir/hello.txt': '32232334', 
'third/dir/dog.txt': '324234324'}

回答by SilentGhost

if you know what value you want to filter out:

如果您知道要过滤掉的值:

known_tuple = 'first/dir','hello.txt'
b = {k:v for k, v in a.items() if v == known_tuple}

then awould become:

然后a会变成:

a = dict(a.items() - b.items())

this is py3k notation, but I'm sure something similar can be implemented in legacy versions. If you don't know what the known_tupleis, then you'd need to first find it out. for example like this:

这是 py3k 符号,但我确信可以在旧版本中实现类似的东西。如果你不知道它known_tuple是什么,那么你需要先找到它。例如像这样:

c = list(a.values())
for i in set(c):
    c.remove(i)
known_tuple = c[0]