Python 在一个列表中查找不在另一个列表中的元素

Question

提问by CosimoCD

I need to compare two lists in order to create a new list of specific elements found in one list but not in the other. For example:

我需要比较两个列表，以便创建一个包含在一个列表中但不在另一个列表中的特定元素的新列表。例如：

main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]

I want to loop through list_1 and append to main_list all the elements from list_2 that are not found in list_1.

我想遍历 list_1 并将 list_2 中未在 list_1 中找到的所有元素附加到 main_list。

The result should be:

结果应该是：

main_list=["f", "m"]

How can I do it with python?

我怎么能用python做到这一点？

Answer 1

采纳答案by jcoderepo

TL;DR:
SOLUTION (1)

TL;博士：
解决方案（1）

import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`

SOLUTION (2)You want a sorted list

解决方案（2）你想要一个排序列表

def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans
main_list = setdiff_sorted(list_2,list_1)

EXPLANATIONS:
(1)You can use NumPy's setdiff1d(array1,array2,assume_unique=False).

说明：
(1)您可以使用 NumPy 的setdiff1d( array1, array2, assume_unique= False)。

assume_uniqueasks the user IF the arrays ARE ALREADY UNIQUE.
If False, then the unique elements are determined first.
If True, the function will assume that the elements are already unique AND function will skip determining the unique elements.

assume_unique询问用户数组是否已经是唯一的。
如果False，则首先确定唯一元素。
如果True，函数将假定元素已经是唯一的，并且函数将跳过确定唯一元素。

This yields the unique values in array1that are notin array2. assume_uniqueis Falseby default.

这产生了独特的值array1是不是在array2。assume_unique是False默认的。

If you are concerned with the uniqueelements (based on the response of Chinny84), then simply use (where assume_unique=False=> the default value):

如果您关心唯一元素（基于Chinny84的响应），那么只需使用（其中assume_unique=False=> 默认值）：

import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"] 
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`

(2)For those who want answers to be sorted, I've made a custom function:

(2)对于那些想要对答案进行排序的人，我做了一个自定义函数：

import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans

To get the answer, run:

要获得答案，请运行：

main_list = setdiff_sorted(list_2,list_1)

SIDE NOTES:
(a) Solution 2 (custom function setdiff_sorted) returns a list(compared to an arrayin solution 1).

(b) If you aren't sure if the elements are unique, just use the default setting of NumPy's setdiff1din both solutions A and B. What can be an example of a complication? See note (c).

(c) Things will be different if either of the two lists is notunique.
Say list_2is not unique: list2 = ["a", "f", "c", "m", "m"]. Keep list1as is: list_1 = ["a", "b", "c", "d", "e"]
Setting the default value of assume_uniqueyields ["f", "m"](in both solutions). HOWEVER, if you set assume_unique=True, both solutions give ["f", "m", "m"]. Why? This is because the user ASSUMED that the elements are unique). Hence, IT IS BETTER TO KEEP assume_uniqueto its default value. Note that both answers are sorted.

边注：
(a) 解决方案 2（自定义函数setdiff_sorted）返回一个列表（与解决方案 1 中的数组相比）。

(b) 如果您不确定元素是否唯一，只需setdiff1d在解决方案 A 和 B 中使用 NumPy 的默认设置。什么是复杂的示例？见注(c)。

(c) 如果两个列表中的任何一个不是唯一的，情况就会不同。
说list_2不是唯一的：list2 = ["a", "f", "c", "m", "m"]。保持list1原样：list_1 = ["a", "b", "c", "d", "e"]
设置assume_unique产量的默认值["f", "m"]（在两种解决方案中）。但是，如果您设置了assume_unique=True，则两种解决方案都会给出["f", "m", "m"]. 为什么？这是因为用户假设元素是唯一的）。因此，最好保留assume_unique到它的默认值。请注意，两个答案都已排序。

python numpy

蟒蛇 numpy的

Answer 2

回答by nrlakin

You can use sets:

您可以使用集合：

main_list = list(set(list_2) - set(list_1))

Output:

输出：

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> set(list_2) - set(list_1)
set(['m', 'f'])
>>> list(set(list_2) - set(list_1))
['m', 'f']

Per @JonClements' comment, here is a tidier version:

根据@JonClements 的评论，这里有一个更整洁的版本：

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> list(set(list_2).difference(list_1))
['m', 'f']

Answer 3

回答by A.Kot

Not sure why the above explanations are so complicated when you have native methods available:

当您有可用的本地方法时，不确定为什么上述解释如此复杂：

main_list = list(set(list_2)-set(list_1))

Answer 4

回答by ettanany

Use a list comprehensionlike this:

使用这样的列表理解：

main_list = [item for item in list_2 if item not in list_1]

Output:

输出：

>>> list_1 = ["a", "b", "c", "d", "e"]
>>> list_2 = ["a", "f", "c", "m"] 
>>> 
>>> main_list = [item for item in list_2 if item not in list_1]
>>> main_list
['f', 'm']

Edit:

编辑：

Like mentioned in the comments below, with large lists, the above is not the ideal solution. When that's the case, a better option would be converting list_1to a setfirst:

就像在下面的评论中提到的那样，对于大列表，以上不是理想的解决方案。在这种情况下，更好的选择是转换list_1为set第一个：

set_1 = set(list_1)  # this reduces the lookup time from O(n) to O(1)
main_list = [item for item in list_2 if item not in set_1]

Answer 5

回答by ShadowRanger

If you want a one-liner solution (ignoring imports) that only requires O(max(n, m))work for inputs of length nand m, not O(n * m)work, you can do so with the itertoolsmodule:

如果你想要一个班轮解决方案（忽略进口），只需要O(max(n, m))对长度的输入工作n和m不O(n * m)工作，你可以这样做的itertools模块：

from itertools import filterfalse

main_list = list(filterfalse(set(list_1).__contains__, list_2))

This takes advantage of the functional functions taking a callback function on construction, allowing it to create the callback once and reuse it for every element without needing to store it somewhere (because filterfalsestores it internally); list comprehensions and generator expressions can do this, but it's ugly.?

这利用了在构造时采用回调函数的函数式函数，允许它创建一次回调并为每个元素重用它，而无需将其存储在某处（因为filterfalse将其存储在内部）；列表推导式和生成器表达式可以做到这一点，但它很难看。？

That gets the same results in a single line as:

在一行中得到相同的结果：

main_list = [x for x in list_2 if x not in list_1]

with the speed of:

速度为：

set_1 = set(list_1)
main_list = [x for x in list_2 if x not in set_1]

Of course, if the comparisons are intended to be positional, so:

当然，如果比较是针对位置的，那么：

list_1 = [1, 2, 3]
list_2 = [2, 3, 4]

should produce:

应该产生：

main_list = [2, 3, 4]

(because value in list_2has a match at the same index in list_1), you should definitely go with Patrick's answer, which involves no temporary lists or sets (even with sets being roughly O(1), they have a higher "constant" factor per check than simple equality checks) and involves O(min(n, m))work, less than any other answer, and if your problem is position sensitive, is the only correctsolution when matching elements appear at mismatched offsets.

（因为值 inlist_2与in中的相同索引匹配list_1），您绝对应该使用Patrick 的答案，它不涉及临时lists 或sets （即使sets 大致为O(1)，它们每次检查的“常数”因子比简单的相等检查更高) 并涉及O(min(n, m))工作，比任何其他答案都要少，如果您的问题是位置敏感的，那么当匹配元素出现在不匹配的偏移处时，这是唯一正确的解决方案。

?: The way to do the same thing with a list comprehension as a one-liner would be to abuse nested looping to create and cache value(s) in the "outermost" loop, e.g.:

?: 将列表推导式作为单行式执行相同操作的方法是滥用嵌套循环在“最外层”循环中创建和缓存值，例如：

main_list = [x for set_1 in (set(list_1),) for x in list_2 if x not in set_1]

which also gives a minor performance benefit on Python 3 (because now set_1is locally scoped in the comprehension code, rather than looked up from nested scope for each check; on Python 2 that doesn't matter, because Python 2 doesn't use closures for list comprehensions; they operate in the same scope they're used in).

这也给 Python 3 带来了轻微的性能优势（因为现在set_1在理解代码中是局部范围的，而不是从每次检查的嵌套范围中查找；在 Python 2 上这无关紧要，因为 Python 2 不使用闭包列表推导式；它们的操作范围与它们所使用的范围相同）。

Answer 6

回答by Inconnu

main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]

for i in list_2:
    if i not in list_1:
        main_list.append(i)

print(main_list)

output:

输出：

['f', 'm']

Answer 7

回答by Patrick Haugh

I would zipthe lists together to compare them element by element.

我会将zip列表放在一起以逐个元素地比较它们。

main_list = [b for a, b in zip(list1, list2) if a!= b]

Answer 8

回答by Msquare

I used two methods and I found one method useful over other. Here is my answer:

我使用了两种方法，我发现一种方法比其他方法有用。这是我的回答：

My input data:

我的输入数据：

crkmod_mpp = ['M13','M18','M19','M24']
testmod_mpp = ['M13','M14','M15','M16','M17','M18','M19','M20','M21','M22','M23','M24']

Method1: np.setdiff1dI like this approach over other because it preserves the position

方法 1：np.setdiff1d我喜欢这种方法而不是其他方法，因为它保留了位置

test= list(np.setdiff1d(testmod_mpp,crkmod_mpp))
print(test)
['M15', 'M16', 'M22', 'M23', 'M20', 'M14', 'M17', 'M21']

Method2: Though it gives same answer as in Method1 but disturbs the order

方法 2：虽然它给出了与方法 1 相同的答案，但扰乱了顺序

test = list(set(testmod_mpp).difference(set(crkmod_mpp)))
print(test)
['POA23', 'POA15', 'POA17', 'POA16', 'POA22', 'POA18', 'POA24', 'POA21']

Method1 np.setdiff1dmeets my requirements perfectly. This answer for information.

Method1np.setdiff1d完全符合我的要求。这个答案的信息。

Answer 9

回答by MSeifert

If the number of occurences should be taken into account you probably need to use something like collections.Counter:

如果应考虑出现次数，您可能需要使用以下内容collections.Counter：

list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 
from collections import Counter
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['f', 'm']

As promised this can also handle differing number of occurences as "difference":

正如承诺的那样，这也可以将不同数量的出现作为“差异”处理：

list_1=["a", "b", "c", "d", "e", 'a']
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['a', 'f', 'm']

Answer 10

回答by adnan

From ser1 remove items present in ser2.

从 ser1 中删除 ser2 中存在的项目。

Input

输入

ser1 = pd.Series([1, 2, 3, 4, 5]) ser2 = pd.Series([4, 5, 6, 7, 8])

Solution

解决方案

ser1[~ser1.isin(ser2)]

Python 在一个列表中查找不在另一个列表中的元素

提问by CosimoCD

采纳答案by jcoderepo

回答by nrlakin

回答by A.Kot

回答by ettanany

回答by ShadowRanger

回答by Inconnu

回答by Patrick Haugh

回答by Msquare

回答by MSeifert

回答by adnan

Input

输入

Solution

解决方案

相关推荐

最近更新

标签

Python 在一个列表中查找不在另一个列表中的元素

提问by CosimoCD

采纳答案by jcoderepo

回答by nrlakin

回答by A.Kot

回答by ettanany

回答by ShadowRanger

回答by Inconnu

回答by Patrick Haugh

回答by Msquare

回答by MSeifert

回答by adnan

Input

输入

Solution

解决方案

相关推荐

Python 在pycharm中安装csv包

Python 在 Windows 10 中的 Task Scheduler 上安排 .py 文件

如何在 Python 中更新字典中键的值？

如何获得 Python Pillow (PIL) 版本？

相关推荐

最近更新

标签