在 Python 中将两个字典相交

Question

提问by nicole

I am working on a search program over an inverted index. The index itself is a dictionary whose keys are terms and whose values are themselves dictionaries of short documents, with ID numbers as keys and their text content as values.

我正在研究一个倒排索引的搜索程序。索引本身是一个字典，其键是术语，其值本身就是短文档的字典，以 ID 号作为键，将它们的文本内容作为值。

To perform an 'AND' search for two terms, I thus need to intersect their postings lists (dictionaries). What is a clear (not necessarily overly clever) way to do this in Python? I started out by trying it the long way with iter:

为了对两个术语执行“AND”搜索，因此我需要将它们的发布列表（字典）相交。在 Python 中执行此操作的明确（不一定过于聪明）的方法是什么？我开始尝试它很长的路要走iter：

p1 = index[term1]  
p2 = index[term2]
i1 = iter(p1)
i2 = iter(p2)
while ...  # not sure of the 'iter != end 'syntax in this case
...

Answer 1

采纳答案by James

You can easily calculate the intersection of sets, so create sets from the keys and use them for the intersection:

您可以轻松计算集合的交集，因此从键创建集合并将它们用于交集：

keys_a = set(dict_a.keys())
keys_b = set(dict_b.keys())
intersection = keys_a & keys_b # '&' operator is used for set intersection

Answer 2

回答by Eric Urban

Just wrap the dictionary instances with a simple class that gets both of the values you want

只需用一个简单的类包装字典实例，该类即可获得您想要的两个值

class DictionaryIntersection(object):
    def __init__(self,dictA,dictB):
        self.dictA = dictA
        self.dictB = dictB

    def __getitem__(self,attr):
        if attr not in self.dictA or attr not in self.dictB:
            raise KeyError('Not in both dictionaries,key: %s' % attr)

        return self.dictA[attr],self.dictB[attr]

x = {'foo' : 5, 'bar' :6}
y = {'bar' : 'meow' , 'qux' : 8}

z = DictionaryIntersection(x,y)

print z['bar']

Answer 3

回答by Phillip Cloud

A little known fact is that you don't need to construct sets to do this:

一个鲜为人知的事实是，您不需要构造sets 来执行此操作：

In Python 2:

在 Python 2 中：

In [78]: d1 = {'a': 1, 'b': 2}

In [79]: d2 = {'b': 2, 'c': 3}

In [80]: d1.viewkeys() & d2.viewkeys()
Out[80]: {'b'}

In Python 3 replace viewkeyswith keys; the same applies to viewvaluesand viewitems.

在 Python 3 中替换viewkeys为keys; 这同样适用于viewvalues和viewitems。

From the documentation of viewitems:

从文档viewitems：

In [113]: d1.viewitems??
Type:       builtin_function_or_method
String Form:<built-in method viewitems of dict object at 0x64a61b0>
Docstring:  D.viewitems() -> a set-like object providing a view on D's items

For larger dicts this also slightly faster than constructing sets and then intersecting them:

对于较大的dicts，这也比构建sets 然后将它们相交略快：

In [122]: d1 = {i: rand() for i in range(10000)}

In [123]: d2 = {i: rand() for i in range(10000)}

In [124]: timeit d1.viewkeys() & d2.viewkeys()
1000 loops, best of 3: 714 μs per loop

In [125]: %%timeit
s1 = set(d1)
s2 = set(d2)
res = s1 & s2

1000 loops, best of 3: 805 μs per loop

For smaller `dict`s `set` construction is faster:

In [126]: d1 = {'a': 1, 'b': 2}

In [127]: d2 = {'b': 2, 'c': 3}

In [128]: timeit d1.viewkeys() & d2.viewkeys()
1000000 loops, best of 3: 591 ns per loop

In [129]: %%timeit
s1 = set(d1)
s2 = set(d2)
res = s1 & s2

1000000 loops, best of 3: 477 ns per loop

We're comparing nanoseconds here, which may or may not matter to you. In any case, you get back a set, so using viewkeys/keyseliminates a bit of clutter.

我们在这里比较纳秒，这对您来说可能重要也可能无关紧要。在任何情况下，您都会返回 a set，因此使用viewkeys/keys消除了一些混乱。

Answer 4

回答by emnoor

In [1]: d1 = {'a':1, 'b':4, 'f':3}

In [2]: d2 = {'a':1, 'b':4, 'd':2}

In [3]: d = {x:d1[x] for x in d1 if x in d2}

In [4]: d
Out[4]: {'a': 1, 'b': 4}

Answer 5

回答by thodnev

Okay, here is a generalized version of code above in Python3. It is optimized to use comprehensions and set-like dict views which are fast enough.

好的，这是上面 Python3 代码的通用版本。它被优化为使用足够快的理解和类似集合的字典视图。

Function intersects arbitrary many dicts and returns a dict with common keys and a set of common values for each common key:

函数与任意多个 dict 相交，并返回一个带有公共键的 dict 和每个公共键的一组公共值：

def dict_intersect(*dicts):
    comm_keys = dicts[0].keys()
    for d in dicts[1:]:
        # intersect keys first
        comm_keys &= d.keys()
    # then build a result dict with nested comprehension
    result = {key:{d[key] for d in dicts} for key in comm_keys}
    return result

Usage example:

用法示例：

a = {1: 'ba', 2: 'boon', 3: 'spam', 4:'eggs'}
b = {1: 'ham', 2:'baboon', 3: 'sausages'}
c = {1: 'more eggs', 3: 'cabbage'}

res = dict_intersect(a, b, c)
# Here is res (the order of values may vary) :
# {1: {'ham', 'more eggs', 'ba'}, 3: {'spam', 'sausages', 'cabbage'}}

Here the dict values must be hashable, if they aren't you could simply change set parentheses { } to list [ ]:

这里的 dict 值必须是可散列的，如果不是，您可以简单地将设置括号 {} 更改为列表 []：

result = {key:[d[key] for d in dicts] for key in comm_keys}

Answer 6

回答by dccsillag

In Python 3, you can use

在 Python 3 中，您可以使用

intersection = dict(dict1.items() & dict2.items())
union = dict(dict1.items() | dict2.items())
difference = dict(dict1.items() ^ dict2.items())

Answer 7

回答by WloHu

Your question isn't precise enough to give single answer.

您的问题不够精确，无法给出单一答案。

1. Key Intersection

1. 关键路口

If you want to intersect IDs from posts (credits to James) do:

如果你想ID从帖子（詹姆斯的学分）中将s相交，请执行以下操作：

common_ids = p1.keys() & p2.keys()

However if you want to iterate documents you have to consider which post has a priority, I assume it's p1. To iterate documents for common_ids, collections.ChainMapwill be most useful:

但是，如果您想迭代文档，则必须考虑哪个帖子具有优先级，我认为它是p1. 迭代的文档common_ids，collections.ChainMap将是最有用的：

from collections import ChainMap
intersection = {id: document
                for id, document in ChainMap(p1, p2)
                if id in common_ids}
for id, document in intersection:
    ...

Or if you don't want to create separate intersectiondictionary:

或者，如果您不想创建单独的intersection字典：

from collections import ChainMap
posts = ChainMap(p1, p2)
for id in common_ids:
    document = posts[id]

2. Items Intersection

2. 物品交集

If you want to intersect itemsof both posts, which means to match IDs and documents, use code below (credits to DCPY). However this is only useful if you're looking for duplicates in terms.

如果您想将两个帖子的项目相交，这意味着要匹配IDs 和文档，请使用下面的代码（归功于 DCPY）。但是，这仅在您要查找重复项时才有用。

duplicates = dict(p1.items() & p2.items())
for id, document in duplicates:
    ...

3. Iterate over `p1`'AND' `p2`.

3. 迭代`p1`'AND' `p2`。

In case when by "'AND' search" and using iteryou meant to search bothposts then again collections.ChainMapis the best to iterate over (almost) all items in multiple posts:

如果通过“ 'AND'搜索”并使用iter您打算搜索两个帖子，那么collections.ChainMap最好再次迭代（几乎）多个帖子中的所有项目：

from collections import ChainMap
for id, document in ChainMap(p1, p2):
    ...

Answer 8

回答by Aaron Goldman

def two_keys(term_a, term_b, index):
    doc_ids = set(index[term_a].keys()) & set(index[term_b].keys())
    doc_store = index[term_a] # index[term_b] would work also
    return {doc_id: doc_store[doc_id] for doc_id in doc_ids}

def n_keys(terms, index):
    doc_ids = set.intersection(*[set(index[term].keys()) for term in terms])
    doc_store = index[term[0]]
    return {doc_id: doc_store[doc_id] for doc_id in doc_ids}

In [0]: index = {'a': {1: 'a b'}, 
                 'b': {1: 'a b'}}

In [1]: two_keys('a','b', index)
Out[1]: {1: 'a b'}

In [2]: n_keys(['a','b'], index)
Out[2]: {1: 'a b'}

I would recommend changing your index from

我建议您将索引从

index = {term: {doc_id: doc}}

to two indexes one for the terms and then a separate index to hold the values

到两个索引，一个是术语，然后是一个单独的索引来保存值

term_index = {term: set([doc_id])}
doc_store = {doc_id: doc}

that way you don't store multiple copies of the same data

这样你就不会存储相同数据的多个副本

在 Python 中将两个字典相交

提问by nicole

采纳答案by James

回答by Eric Urban

回答by Phillip Cloud

回答by emnoor

回答by thodnev

回答by dccsillag

回答by WloHu

1. Key Intersection

1. 关键路口

2. Items Intersection

2. 物品交集

3. Iterate over `p1`'AND' `p2`.

3. 迭代`p1`'AND' `p2`。

回答by Aaron Goldman

相关推荐

最近更新

标签

在 Python 中将两个字典相交

提问by nicole

采纳答案by James

回答by Eric Urban

回答by Phillip Cloud

回答by emnoor

回答by thodnev

回答by dccsillag

回答by WloHu

1. Key Intersection

1. 关键路口

2. Items Intersection

2. 物品交集

3. Iterate over p1'AND' p2.

3. 迭代p1'AND' p2。

回答by Aaron Goldman

相关推荐

为什么我在运行 Python 程序时收到“sh: 1: Syntax error: Unterminatedquoted string”？

Python 如何将数字转换为字母表？

Python sklearn 问题：在进行回归时发现样本数量不一致的数组

Python 将 HTML 表格转换为 JSON

相关推荐

最近更新

标签

3. Iterate over `p1`'AND' `p2`.

3. 迭代`p1`'AND' `p2`。