python中多个集合的并集

Question

提问by Tapojyoti Mandal

[[1, '34', '44'], [1, '40', '30', '41'], [1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']]

I have a list of lists. My aim is to check whether any one sublist has anything in common with other sublists(excluding the first index object to compare). If it has anything in common then unify those sublists.

我有一个列表列表。我的目的是检查任何一个子列表是否与其他子列表有任何共同点（不包括要比较的第一个索引对象）。如果它有任何共同点，则统一这些子列表。

For example, for this example my final answer should be something like:

例如，对于这个例子，我的最终答案应该是这样的：

[[1, '34, '44', '40' '30', '41', '42', '43']]

I can understand that I should convert the sublists to sets and then use union() and intersection() operation. But what I am stuck with is to how to compare each set/sublist. I can't run a loop over the list and compare each sublist one by one as the contents of the list would be modified and this would lead to error.

我可以理解我应该将子列表转换为集合，然后使用 union() 和intersection() 操作。但我坚持的是如何比较每个集合/子列表。我无法对列表运行循环并逐一比较每个子列表，因为列表的内容会被修改，这会导致错误。

What I want to know is there any efficient method to compare all the sublists(converted to sets) and get union of them?

我想知道是否有任何有效的方法来比较所有子列表（转换为集合）并获得它们的并集？

Answer 1

回答by Ami Tavory

Using the unpacking operator *:

使用解包运算符*：

>> list(set.union(*map(set, a)))
[1, '44', '30', '42', '43', '40', '41', '34']

(Thanks Raymond Hettinger for the comment!)

（感谢 Raymond Hettinger 的评论！）

(Note that

（注意

set.union(*tup)

will unpack to

将解压到

set.union(tup[0], tup[1], ... tup[n - 1])

)

Answer 2

回答by Ajay

In [20]: s
Out[20]: 
[[1, '34', '44'],
 [1, '40', '30', '41'],
 [1, '41', '40', '42'],
 [1, '42', '41', '43'],
 [1, '43', '42', '44'],
 [1, '44', '34', '43']]
In [31]: list({x for _list in s for x in _list})
Out[31]: [1, '44', '30', '42', '43', '40', '41', '34']

Update:

更新：

Thanks for the comments

感谢您的评论

Answer 3

回答by Arpit Goyal

You can use itertools to perform this action. Let us assume that your list has a variable name A

您可以使用 itertools 来执行此操作。让我们假设您的列表有一个变量名称 A

import itertools

single_list_with_all_values = list(itertools.chain(*A))
single_list_with_all_values.sort()

print set(single_list_with_all_values)

Answer 4

回答by Azurtree

>>> big = [[1, '34', '44'], [1, '40', '30', '41'], [1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']]
>>> set(reduce ( lambda l,a : l + a, big))
set([1, '44', '30', '42', '43', '40', '41', '34'])

And if you really want a list of a list as a final result

如果你真的想要一个列表作为最终结果

>>>>[list(set(reduce ( lambda l,a : l + a, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]

And if you don't like recoding a lambda function for the list addition :

如果您不喜欢为列表添加重新编码 lambda 函数：

>>>>[list(set(reduce ( list.__add__, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]

EDIT: after your recommendation about using itertools.chain instead of list.__add__ I ran a timeit for both with the original variable used by the original poster.

编辑：在您建议使用 itertools.chain 而不是 list.__add__ 之后，我使用原始海报使用的原始变量为两者运行了时间。

It seems that timeit times list.__add__ around 2.8s and itertools.chain around 3.5 seconds.

似乎 timeit 时间 list.__add__ 大约 2.8 秒和 itertools.chain 大约 3.5 秒。

I checked on this page and yes, you were right with the itertools.chain contains a from_iterable method that grants a huge performance boost. see below with list.__add__, itertools.chain and itertools.chain.from_iterable.

我在这个页面上检查过，是的，你是对的 itertools.chain 包含一个 from_iterable 方法，可以极大地提升性能。请参阅下面的 list.__add__、itertools.chain 和 itertools.chain.from_iterable。

>>> timeit.timeit("[list(set(reduce ( list.__add__, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
16.051744650801993
>>> timeit.timeit("[list(set(reduce ( itertools.chain, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
54.721315866467194
>>> timeit.timeit("list(set(itertools.chain.from_iterable(big)))", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
0.040056066849501804

Thank you very much for your advises :)

非常感谢您的建议:)

Answer 5

回答by Raymond Hettinger

The itertoolsmodule makes short work of this problem:

该itertools模块使得这个问题的短期工作：

>>> from itertools import chain
>>> list(set(chain.from_iterable(d)))
[1, '41', '42', '43', '40', '34', '30', '44']

Another way to do it is to unpack the list into separate arguments for union():

另一种方法是将列表解包为union() 的单独参数：

>>> list(set().union(*d))
[1, '41', '42', '43', '40', '34', '30', '44']

The latter way eliminates all duplicates and doesn't require that the inputs first be converted to sets. Also, it doesn't require an import.

后一种方法消除了所有重复项，并且不需要首先将输入转换为集合。此外，它不需要导入。

Answer 6

回答by dermen

I personally like the readability of reduce, paired with a simple conditional function, something like

我个人喜欢的可读性reduce，搭配一个简单的条件函数，比如

somelists = [[1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']] # your original lists
somesets = map(set,somelists) #your lists as sets

def condition(s1,s2): # condition to apply recursively to the sets
    if s1.intersection(s2):
        return s1.union(s2)
reduce( condition,somesets)
#{1, '30', '34', '40', '41', '42', '43', '44'}

Of course you can cast this result to a 2d list if you desire list([reduce(...

当然，如果您愿意，您可以将此结果转换为 2d 列表 list([reduce(...

I will note that this is something like 3x slower than the chain.fromiterableanswer.

我会注意到这比chain.fromiterable答案慢了 3 倍。

Answer 7

回答by PeterFoster

from functools import reduce

out = list(reduce(set.union, iterable))

as long as at least the first the element of iterableis a set. Otherwise,

只要至少第一个元素iterable是一个集合。除此以外，

out = list(reduce(set.union, iterable[1:], set(iterable[0])))

python中多个集合的并集

提问by Tapojyoti Mandal

回答by Ami Tavory

回答by Ajay

回答by Arpit Goyal

回答by Azurtree

回答by Raymond Hettinger

回答by dermen

回答by PeterFoster

相关推荐

最近更新

标签

python中多个集合的并集

提问by Tapojyoti Mandal

回答by Ami Tavory

回答by Ajay

回答by Arpit Goyal

回答by Azurtree

回答by Raymond Hettinger

回答by dermen

回答by PeterFoster

相关推荐

Python 如何过滤对象以在 Django 中进行计数注释？

Python 如何将列和行的 Pandas DataFrame 子集转换为 numpy 数组？

Python 如果与黑白图像一起使用，OpenCV findContours() 会抱怨

Python 将 csv 转换为 xlsx

相关推荐

最近更新

标签