Python 如何验证一个列表是否是另一个列表的子集?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16579085/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:04:40  来源:igfitidea点击:

How can I verify if one list is a subset of another?

pythonlist

提问by IUnknown

I need to verify if a list is a subset of another - a boolean return is all I seek.

我需要验证一个列表是否是另一个列表的子集 - 我只寻求布尔返回。

Is testing equality on the smaller list after an intersection the fastest way to do this? Performance is of utmost importance given the number of datasets that need to be compared.

在交叉路口后测试较小列表上的相等性是最快的方法吗?考虑到需要比较的数据集数量,性能至关重要。

Adding further facts based on discussions:

根据讨论添加更多事实:

  1. Will either of the lists be the same for many tests? It does as one of them is a static lookup table.

  2. Does it need to be a list? It does not - the static lookup table can be anything that performs best. The dynamic one is a dict from which we extract the keys to perform a static lookup on.

  1. 对于许多测试,这两个列表中的任何一个是否相同?其中之一是静态查找表。

  2. 必须是清单吗?它不是 - 静态查找表可以是任何性能最好的东西。动态是一个字典,我们从中提取键来执行静态查找。

What would be the optimal solution given the scenario?

给定场景的最佳解决方案是什么?

采纳答案by Yann Vernier

The performant function Python provides for this is set.issubset. It does have a few restrictions that make it unclear if it's the answer to your question, however.

Python 为此提供的高性能函数是set.issubset。但是,它确实有一些限制,因此不清楚它是否是您问题的答案。

A list may contain items multiple times and has a specific order. A set does not. To achieve high performance sets work on hashableobjects only.

一个列表可能多次包含项目并具有特定的顺序。一套没有。为了实现高性能集,仅适用于可散列对象。

Are you asking about subset or subsequence (which means you'll want a string search algorithm)? Will either of the lists be the same for many tests? What are the datatypes contained in the list? And for that matter, does it need to be a list?

您是在询问子集还是子序列(这意味着您需要一个字符串搜索算法)?对于许多测试,这两个列表中的任何一个是否相同?列表中包含哪些数据类型?就此而言,它是否需要是一个列表?

Your other post intersect a dict and listmade the types clearer and did get a recommendation to use dictionary key views for their set-like funcitonality. In that case it was known to work because dictionary keys behave like a set (so much so that before we had sets in Python we used dictionaries). One wonders how the issue got less specific in three hours.

你的另一篇文章与字典和列表相交,使类型更清晰,并且确实得到了使用字典键视图来实现类似集合的功能的建议。在那种情况下,它是有效的,因为字典键的行为就像一个集合(以至于在我们在 Python 中使用集合之前,我们使用字典)。有人想知道这个问题是如何在三个小时内变得不那么具体的。

回答by jamylak

Assuming the items are hashable

假设项目是可散列的

>>> from collections import Counter
>>> not Counter([1, 2]) - Counter([1])
False
>>> not Counter([1, 2]) - Counter([1, 2])
True
>>> not Counter([1, 2, 2]) - Counter([1, 2])
False

If you don't care about duplicate items eg. [1, 2, 2]and [1, 2]then just use:

如果您不关心重复的项目,例如。[1, 2, 2]并且[1, 2]然后只需使用:

>>> set([1, 2, 2]).issubset([1, 2])
True

Is testing equality on the smaller list after an intersection the fastest way to do this?

在交叉路口后测试较小列表上的相等性是最快的方法吗?

.issubsetwill be the fastest way to do it. Checking the length before testing issubsetwill not improve speed because you still have O(N + M) items to iterate through and check.

.issubset将是最快的方法。在测试之前检查长度issubset不会提高速度,因为您仍然有 O(N + M) 个项目要迭代和检查。

回答by Yulan Liu

>>> a = [1, 3, 5]
>>> b = [1, 3, 5, 8]
>>> c = [3, 5, 9]
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False

>>> a = ['yes', 'no', 'hmm']
>>> b = ['yes', 'no', 'hmm', 'well']
>>> c = ['sorry', 'no', 'hmm']
>>> 
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False

回答by DevPlayer

If you are asking if one list is "contained" in another list then:

如果您询问一个列表是否“包含”在另一个列表中,则:

>>>if listA in listB: return True

If you are asking if each element in listA has an equal number of matching elements in listB try:

如果您询问 listA 中的每个元素在 listB 中是否具有相同数量的匹配元素,请尝试:

all(True if listA.count(item) <= listB.count(item) else False for item in listA)

回答by voidnologo

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

all(x in two for x in one)

Explanation: Generator creating booleans by looping through list onechecking if that item is in list two. all()returns Trueif every item is truthy, else False.

说明:生成器通过循环one检查该项目是否在 list 中来创建布尔值two。 如果每个项目都是真实的,则返回,否则all()返回。TrueFalse

There is also an advantage that allreturn False on the first instance of a missing element rather than having to process every item.

还有一个优点是all在丢失元素的第一个实例上返回 False 而不是必须处理每个项目。

回答by SuperNova

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(x in two for x in one) == set([True])

If list1 is in list 2:

如果 list1 在列表 2 中:

  • (x in two for x in one)generates a list of True.

  • when we do a set(x in two for x in one)has only one element (True).

  • (x in two for x in one)生成一个列表True

  • 当我们做一个set(x in two for x in one)只有一个元素(真)。

回答by Hamza Rashid

Pardon me if I am late to the party. ;)

如果我参加聚会迟到了,请原谅我。;)

To check if one set Ais subset of set B, Pythonhas A.issubset(B)and A <= B. It works on setonly and works great BUTthe complexity of internal implementation is unknown. Reference: https://docs.python.org/2/library/sets.html#set-objects

要检查一个set A是否是 的子集set BPython具有A.issubset(B)A <= B。它set仅适用于并且效果很好,内部实现的复杂性是未知的。参考:https: //docs.python.org/2/library/sets.html#set-objects

I came up with an algorithm to check if list Ais a subset of list Bwith following remarks.

我想出了一个算法来检查是否list Alist B具有以下评论的子集。

  • To reduce complexity of finding subset, I find it appropriate to sortboth lists first before comparing elements to qualify for subset.
  • It helped me to breakthe loopwhen value of element of second list B[j]is greater than value of element of first list A[i].
  • last_index_jis used to start loopover list Bwhere it last left off. It helps avoid starting comparisons from the start of list B(which is, as you might guess unnecessary, to start list Bfrom index 0in subsequent iterations.)
  • Complexity will be O(n ln n)each for sorting both lists and O(n)for checking for subset.
    O(n ln n) + O(n ln n) + O(n) = O(n ln n).

  • Code has lots of printstatements to see what's going on at each iterationof the loop. These are meant for understanding only.

  • 为了降低查找子集的复杂性,我发现sort在比较元素以符合子集的条件之前,它首先适用于 两个列表。
  • 它帮助我breakloop当第二列表元素的值B[j]比第一个列表元素的值A[i]
  • last_index_j用于启动looplist B其最后一次离开。它有助于避免从开头开始比较 list B(您可能猜到这是不必要的,list Bindex 0随后的iterations.)开始。
  • 复杂性将O(n ln n)分别用于对两个列表进行排序和O(n)检查子集。
    O(n ln n) + O(n ln n) + O(n) = O(n ln n).

  • 代码有很多的print报表,看看发生了什么事情,在每一个iterationloop。这些仅用于理解。

Check if one list is subset of another list

检查一个列表是否是另一个列表的子集

is_subset = True;

A = [9, 3, 11, 1, 7, 2];
B = [11, 4, 6, 2, 15, 1, 9, 8, 5, 3];

print(A, B);

# skip checking if list A has elements more than list B
if len(A) > len(B):
    is_subset = False;
else:
    # complexity of sorting using quicksort or merge sort: O(n ln n)
    # use best sorting algorithm available to minimize complexity
    A.sort();
    B.sort();

    print(A, B);

    # complexity: O(n^2)
    # for a in A:
    #   if a not in B:
    #       is_subset = False;
    #       break;

    # complexity: O(n)
    is_found = False;
    last_index_j = 0;

    for i in range(len(A)):
        for j in range(last_index_j, len(B)):
            is_found = False;

            print("i=" + str(i) + ", j=" + str(j) + ", " + str(A[i]) + "==" + str(B[j]) + "?");

            if B[j] <= A[i]:
                if A[i] == B[j]:
                    is_found = True;
                last_index_j = j;
            else:
                is_found = False;
                break;

            if is_found:
                print("Found: " + str(A[i]));
                last_index_j = last_index_j + 1;
                break;
            else:
                print("Not found: " + str(A[i]));

        if is_found == False:
            is_subset = False;
            break;

print("subset") if is_subset else print("not subset");

Output

输出

[9, 3, 11, 1, 7, 2] [11, 4, 6, 2, 15, 1, 9, 8, 5, 3]
[1, 2, 3, 7, 9, 11] [1, 2, 3, 4, 5, 6, 8, 9, 11, 15]
i=0, j=0, 1==1?
Found: 1
i=1, j=1, 2==1?
Not found: 2
i=1, j=2, 2==2?
Found: 2
i=2, j=3, 3==3?
Found: 3
i=3, j=4, 7==4?
Not found: 7
i=3, j=5, 7==5?
Not found: 7
i=3, j=6, 7==6?
Not found: 7
i=3, j=7, 7==8?
not subset

回答by Leo Bastin

Below code checks whether a given set is a "proper subset" of another set

下面的代码检查给定的集合是否是另一个集合的“适当子集”

 def is_proper_subset(set, superset):
     return all(x in superset for x in set) and len(set)<len(superset)

回答by Eamonn Kenny

Set theory is inappropriate for lists since duplicates will result in wrong answers using set theory.

集合论不适用于列表,因为使用集合论重复会导致错误的答案。

For example:

例如:

a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
set(b) > set(a)

has no meaning. Yes, it gives a false answer but this is not correct since set theory is just comparing: 1,3,5 versus 1,3,4,5. You must include all duplicates.

没有任何意义。是的,它给出了错误的答案,但这是不正确的,因为集合论只是在比较:1、3、5 与 1、3、4、5。您必须包括所有重复项。

Instead you must count each occurrence of each item and do a greater than equal to check. This is not very expensive, because it is not using O(N^2) operations and does not require quick sort.

相反,您必须计算每个项目的每次出现次数并进行大于等于检查。这不是很昂贵,因为它不使用 O(N^2) 操作并且不需要快速排序。

#!/usr/bin/env python

from collections import Counter

def containedInFirst(a, b):
  a_count = Counter(a)
  b_count = Counter(b)
  for key in b_count:
    if a_count.has_key(key) == False:
      return False
    if b_count[key] > a_count[key]:
      return False
  return True


a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)

a = [1, 3, 3, 3, 4, 4, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)

Then running this you get:

然后运行这个你得到:

$ python contained.py 
b in a:  False
b in a:  True

回答by SuperNova

In python 3.5 you can do a [*set()][index]to get the element. It is much slower solution than other methods.

在 python 3.5 中,您可以执行 a[*set()][index]来获取元素。它比其他方法慢得多。

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

result = set(x in two for x in one)

[*result][0] == True

or just with len and set

或者只是使用 len 和 set

len(set(a+b)) == len(set(a))