python 如何按值过滤字典?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1241029/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to filter a dictionary by value?
提问by Triptych
Newbie question here, so please bear with me.
新手问题在这里,所以请耐心等待。
Let's say I have a dictionary looking like this:
假设我有一本看起来像这样的字典:
a = {"2323232838": ("first/dir", "hello.txt"),
"2323221383": ("second/dir", "foo.txt"),
"3434221": ("first/dir", "hello.txt"),
"32232334": ("first/dir", "hello.txt"),
"324234324": ("third/dir", "dog.txt")}
I want all values that are equal to each other to be moved into another dictionary.
我希望将所有彼此相等的值移动到另一个字典中。
matched = {"2323232838": ("first/dir", "hello.txt"),
"3434221": ("first/dir", "hello.txt"),
"32232334": ("first/dir", "hello.txt")}
And the remaining unmatched items should be looking like this:
其余未匹配的项目应如下所示:
remainder = {"2323221383": ("second/dir", "foo.txt"),
"324234324": ("third/dir", "dog.txt")}
Thanks in advance, and if you provide an example, please comment it as much as possible.
提前致谢,如果您提供示例,请尽可能多地评论。
回答by Triptych
The code below will result in two variables, matches
and remainders
. matches
is an array of dictionaries, in which matching items from the original dictionary will have a corresponding element. remainder
will contain, as in your example, a dictionary containing all the unmatched items.
下面的代码将产生两个变量,matches
和remainders
. matches
是一个字典数组,其中来自原始字典的匹配项将具有相应的元素。 remainder
将包含,如在您的示例中,包含所有不匹配项的字典。
Note that in your example, there is only one set of matching values: ('first/dir', 'hello.txt')
. If there were more than one set, each would have a corresponding entry in matches
.
请注意,在您的示例中,只有一组匹配值:('first/dir', 'hello.txt')
. 如果有多个集合,则每个集合在matches
.
import itertools
# Original dict
a = {"2323232838": ("first/dir", "hello.txt"),
"2323221383": ("second/dir", "foo.txt"),
"3434221": ("first/dir", "hello.txt"),
"32232334": ("first/dir", "hello.txt"),
"324234324": ("third/dir", "dog.txt")}
# Convert dict to sorted list of items
a = sorted(a.items(), key=lambda x:x[1])
# Group by value of tuple
groups = itertools.groupby(a, key=lambda x:x[1])
# Pull out matching groups of items, and combine items
# with no matches back into a single dictionary
remainder = []
matched = []
for key, group in groups:
group = list(group)
if len(group) == 1:
remainder.append( group[0] )
else:
matched.append( dict(group) )
else:
remainder = dict(remainder)
Output:
输出:
>>> matched
[
{
'3434221': ('first/dir', 'hello.txt'),
'2323232838': ('first/dir', 'hello.txt'),
'32232334': ('first/dir', 'hello.txt')
}
]
>>> remainder
{
'2323221383': ('second/dir', 'foo.txt'),
'324234324': ('third/dir', 'dog.txt')
}
As a newbie, you're probably being introduced to a few unfamiliar concepts in the code above. Here are some links:
作为新手,您可能会在上面的代码中接触到一些不熟悉的概念。以下是一些链接:
回答by S.Lott
What you're asking for is called an "Inverted Index" -- the distinct items are recorded just once with a list of keys.
您所要求的称为“倒排索引”——不同的项目只用一个键列表记录一次。
>>> from collections import defaultdict
>>> a = {"2323232838": ("first/dir", "hello.txt"),
... "2323221383": ("second/dir", "foo.txt"),
... "3434221": ("first/dir", "hello.txt"),
... "32232334": ("first/dir", "hello.txt"),
... "324234324": ("third/dir", "dog.txt")}
>>> invert = defaultdict( list )
>>> for key, value in a.items():
... invert[value].append( key )
...
>>> invert
defaultdict(<type 'list'>, {('first/dir', 'hello.txt'): ['3434221', '2323232838', '32232334'], ('second/dir', 'foo.txt'): ['2323221383'], ('third/dir', 'dog.txt'): ['324234324']})
The inverted dictionary has the original values associated with a list of 1 or more keys.
倒排字典具有与 1 个或多个键的列表相关联的原始值。
Now, to get your revised dictionaries from this.
现在,从这里获取您修订的词典。
Filtering:
过滤:
>>> [ invert[multi] for multi in invert if len(invert[multi]) > 1 ]
[['3434221', '2323232838', '32232334']]
>>> [ invert[uni] for uni in invert if len(invert[uni]) == 1 ]
[['2323221383'], ['324234324']]
Expanding
扩大
>>> [ (i,multi) for multi in invert if len(invert[multi]) > 1 for i in invert[multi] ]
[('3434221', ('first/dir', 'hello.txt')), ('2323232838', ('first/dir', 'hello.txt')), ('32232334', ('first/dir', 'hello.txt'))]
>>> dict( (i,multi) for multi in invert if len(invert[multi]) > 1 for i in invert[multi] )
{'3434221': ('first/dir', 'hello.txt'), '2323232838': ('first/dir', 'hello.txt'), '32232334': ('first/dir', 'hello.txt')}
A similar (but simpler) treatment works for the items which occur once.
类似(但更简单)的处理适用于出现一次的项目。
回答by Avihu Turzion
Iterating over a dictionary is no different from iterating over a list in python:
迭代字典与迭代 Python 中的列表没有区别:
for key in dic:
print("dic[%s] = %s" % (key, dic[key]))
This will print all of the keys and values of your dictionary.
这将打印字典的所有键和值。
回答by buster
I assume that your unique id will be the key.
Probably not very beautiful, but returns a dict with your unique values:
我假设您的唯一 ID 将是关键。
可能不是很漂亮,但会返回一个带有您独特值的字典:
>>> dict_ = {'1': ['first/dir', 'hello.txt'],
'3': ['first/dir', 'foo.txt'],
'2': ['second/dir', 'foo.txt'],
'4': ['second/dir', 'foo.txt']}
>>> dict((v[0]+v[1],k) for k,v in dict_.iteritems())
{'second/dir/foo.txt': '4', 'first/dir/hello.txt': '1', 'first/dir/foo.txt': '3'}
I've seen you updated your post:
我看到你更新了你的帖子:
>>> a
{'324234324': ('third/dir', 'dog.txt'),
'2323221383': ('second/dir', 'foo.txt'),
'3434221': ('first/dir', 'hello.txt'),
'2323232838': ('first/dir', 'hello.txt'),
'32232334': ('first/dir', 'hello.txt')}
>>> dict((v[0]+"/"+v[1],k) for k,v in a.iteritems())
{'second/dir/foo.txt': '2323221383',
'first/dir/hello.txt': '32232334',
'third/dir/dog.txt': '324234324'}
回答by SilentGhost
if you know what value you want to filter out:
如果您知道要过滤掉的值:
known_tuple = 'first/dir','hello.txt'
b = {k:v for k, v in a.items() if v == known_tuple}
then a
would become:
然后a
会变成:
a = dict(a.items() - b.items())
this is py3k notation, but I'm sure something similar can be implemented in legacy versions.
If you don't know what the known_tuple
is, then you'd need to first find it out. for example like this:
这是 py3k 符号,但我确信可以在旧版本中实现类似的东西。如果你不知道它known_tuple
是什么,那么你需要先找到它。例如像这样:
c = list(a.values())
for i in set(c):
c.remove(i)
known_tuple = c[0]