Python 在一个列表中查找不在另一个列表中的元素
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41125909/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python find elements in one list that are not in the other
提问by CosimoCD
I need to compare two lists in order to create a new list of specific elements found in one list but not in the other. For example:
我需要比较两个列表,以便创建一个包含在一个列表中但不在另一个列表中的特定元素的新列表。例如:
main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]
I want to loop through list_1 and append to main_list all the elements from list_2 that are not found in list_1.
我想遍历 list_1 并将 list_2 中未在 list_1 中找到的所有元素附加到 main_list。
The result should be:
结果应该是:
main_list=["f", "m"]
How can I do it with python?
我怎么能用python做到这一点?
采纳答案by jcoderepo
TL;DR:
SOLUTION (1)
TL;博士:
解决方案(1)
import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`
SOLUTION (2)You want a sorted list
解决方案(2)你想要一个排序列表
def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans
main_list = setdiff_sorted(list_2,list_1)
EXPLANATIONS:
(1)You can use NumPy's setdiff1d
(array1
,array2
,assume_unique
=False
).
说明:
(1)您可以使用 NumPy 的setdiff1d
( array1
, array2
, assume_unique
= False
)。
assume_unique
asks the user IF the arrays ARE ALREADY UNIQUE.
If False
, then the unique elements are determined first.
If True
, the function will assume that the elements are already unique AND function will skip determining the unique elements.
assume_unique
询问用户数组是否已经是唯一的。
如果False
,则首先确定唯一元素。
如果True
,函数将假定元素已经是唯一的,并且函数将跳过确定唯一元素。
This yields the unique values in array1
that are notin array2
. assume_unique
is False
by default.
这产生了独特的值array1
是不是在array2
。assume_unique
是False
默认的。
If you are concerned with the uniqueelements (based on the response of Chinny84), then simply use (where assume_unique=False
=> the default value):
如果您关心唯一元素(基于Chinny84的响应),那么只需使用(其中assume_unique=False
=> 默认值):
import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"]
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`
(2)For those who want answers to be sorted, I've made a custom function:
(2)对于那些想要对答案进行排序的人,我做了一个自定义函数:
import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans
To get the answer, run:
要获得答案,请运行:
main_list = setdiff_sorted(list_2,list_1)
SIDE NOTES:
(a) Solution 2 (custom function setdiff_sorted
) returns a list(compared to an arrayin solution 1).
(b) If you aren't sure if the elements are unique, just use the default setting of NumPy's setdiff1d
in both solutions A and B. What can be an example of a complication? See note (c).
(c) Things will be different if either of the two lists is notunique.
Say list_2
is not unique: list2 = ["a", "f", "c", "m", "m"]
. Keep list1
as is: list_1 = ["a", "b", "c", "d", "e"]
Setting the default value of assume_unique
yields ["f", "m"]
(in both solutions). HOWEVER, if you set assume_unique=True
, both solutions give ["f", "m", "m"]
. Why? This is because the user ASSUMED that the elements are unique). Hence, IT IS BETTER TO KEEP assume_unique
to its default value. Note that both answers are sorted.
边注:
(a) 解决方案 2(自定义函数setdiff_sorted
)返回一个列表(与解决方案 1 中的数组相比)。
(b) 如果您不确定元素是否唯一,只需setdiff1d
在解决方案 A 和 B 中使用 NumPy 的默认设置。什么是复杂的示例?见注(c)。
(c) 如果两个列表中的任何一个不是唯一的,情况就会不同。
说list_2
不是唯一的:list2 = ["a", "f", "c", "m", "m"]
。保持list1
原样:list_1 = ["a", "b", "c", "d", "e"]
设置assume_unique
产量的默认值["f", "m"]
(在两种解决方案中)。但是,如果您设置了assume_unique=True
,则两种解决方案都会给出["f", "m", "m"]
. 为什么?这是因为用户假设元素是唯一的)。因此,最好保留assume_unique
到它的默认值。请注意,两个答案都已排序。
回答by nrlakin
You can use sets:
您可以使用集合:
main_list = list(set(list_2) - set(list_1))
Output:
输出:
>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> set(list_2) - set(list_1)
set(['m', 'f'])
>>> list(set(list_2) - set(list_1))
['m', 'f']
Per @JonClements' comment, here is a tidier version:
根据@JonClements 的评论,这里有一个更整洁的版本:
>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> list(set(list_2).difference(list_1))
['m', 'f']
回答by A.Kot
Not sure why the above explanations are so complicated when you have native methods available:
当您有可用的本地方法时,不确定为什么上述解释如此复杂:
main_list = list(set(list_2)-set(list_1))
回答by ettanany
Use a list comprehensionlike this:
使用这样的列表理解:
main_list = [item for item in list_2 if item not in list_1]
Output:
输出:
>>> list_1 = ["a", "b", "c", "d", "e"]
>>> list_2 = ["a", "f", "c", "m"]
>>>
>>> main_list = [item for item in list_2 if item not in list_1]
>>> main_list
['f', 'm']
Edit:
编辑:
Like mentioned in the comments below, with large lists, the above is not the ideal solution. When that's the case, a better option would be converting list_1
to a set
first:
就像在下面的评论中提到的那样,对于大列表,以上不是理想的解决方案。在这种情况下,更好的选择是转换list_1
为set
第一个:
set_1 = set(list_1) # this reduces the lookup time from O(n) to O(1)
main_list = [item for item in list_2 if item not in set_1]
回答by ShadowRanger
If you want a one-liner solution (ignoring imports) that only requires O(max(n, m))
work for inputs of length n
and m
, not O(n * m)
work, you can do so with the itertools
module:
如果你想要一个班轮解决方案(忽略进口),只需要O(max(n, m))
对长度的输入工作n
和m
不O(n * m)
工作,你可以这样做的itertools
模块:
from itertools import filterfalse
main_list = list(filterfalse(set(list_1).__contains__, list_2))
This takes advantage of the functional functions taking a callback function on construction, allowing it to create the callback once and reuse it for every element without needing to store it somewhere (because filterfalse
stores it internally); list comprehensions and generator expressions can do this, but it's ugly.?
这利用了在构造时采用回调函数的函数式函数,允许它创建一次回调并为每个元素重用它,而无需将其存储在某处(因为filterfalse
将其存储在内部);列表推导式和生成器表达式可以做到这一点,但它很难看。?
That gets the same results in a single line as:
在一行中得到相同的结果:
main_list = [x for x in list_2 if x not in list_1]
with the speed of:
速度为:
set_1 = set(list_1)
main_list = [x for x in list_2 if x not in set_1]
Of course, if the comparisons are intended to be positional, so:
当然,如果比较是针对位置的,那么:
list_1 = [1, 2, 3]
list_2 = [2, 3, 4]
should produce:
应该产生:
main_list = [2, 3, 4]
(because value in list_2
has a match at the same index in list_1
), you should definitely go with Patrick's answer, which involves no temporary list
s or set
s (even with set
s being roughly O(1)
, they have a higher "constant" factor per check than simple equality checks) and involves O(min(n, m))
work, less than any other answer, and if your problem is position sensitive, is the only correctsolution when matching elements appear at mismatched offsets.
(因为值 inlist_2
与in中的相同索引匹配list_1
),您绝对应该使用Patrick 的答案,它不涉及临时list
s 或set
s (即使set
s 大致为O(1)
,它们每次检查的“常数”因子比简单的相等检查更高) 并涉及O(min(n, m))
工作,比任何其他答案都要少,如果您的问题是位置敏感的,那么当匹配元素出现在不匹配的偏移处时,这是唯一正确的解决方案。
?: The way to do the same thing with a list comprehension as a one-liner would be to abuse nested looping to create and cache value(s) in the "outermost" loop, e.g.:
?: 将列表推导式作为单行式执行相同操作的方法是滥用嵌套循环在“最外层”循环中创建和缓存值,例如:
main_list = [x for set_1 in (set(list_1),) for x in list_2 if x not in set_1]
which also gives a minor performance benefit on Python 3 (because now set_1
is locally scoped in the comprehension code, rather than looked up from nested scope for each check; on Python 2 that doesn't matter, because Python 2 doesn't use closures for list comprehensions; they operate in the same scope they're used in).
这也给 Python 3 带来了轻微的性能优势(因为现在set_1
在理解代码中是局部范围的,而不是从每次检查的嵌套范围中查找;在 Python 2 上这无关紧要,因为 Python 2 不使用闭包列表推导式;它们的操作范围与它们所使用的范围相同)。
回答by Inconnu
main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]
for i in list_2:
if i not in list_1:
main_list.append(i)
print(main_list)
output:
输出:
['f', 'm']
回答by Patrick Haugh
I would zip
the lists together to compare them element by element.
我会将zip
列表放在一起以逐个元素地比较它们。
main_list = [b for a, b in zip(list1, list2) if a!= b]
回答by Msquare
I used two methods and I found one method useful over other. Here is my answer:
我使用了两种方法,我发现一种方法比其他方法有用。这是我的回答:
My input data:
我的输入数据:
crkmod_mpp = ['M13','M18','M19','M24']
testmod_mpp = ['M13','M14','M15','M16','M17','M18','M19','M20','M21','M22','M23','M24']
Method1: np.setdiff1d
I like this approach over other because it preserves the position
方法 1:np.setdiff1d
我喜欢这种方法而不是其他方法,因为它保留了位置
test= list(np.setdiff1d(testmod_mpp,crkmod_mpp))
print(test)
['M15', 'M16', 'M22', 'M23', 'M20', 'M14', 'M17', 'M21']
Method2: Though it gives same answer as in Method1 but disturbs the order
方法 2:虽然它给出了与方法 1 相同的答案,但扰乱了顺序
test = list(set(testmod_mpp).difference(set(crkmod_mpp)))
print(test)
['POA23', 'POA15', 'POA17', 'POA16', 'POA22', 'POA18', 'POA24', 'POA21']
Method1 np.setdiff1d
meets my requirements perfectly.
This answer for information.
Method1np.setdiff1d
完全符合我的要求。这个答案的信息。
回答by MSeifert
If the number of occurences should be taken into account you probably need to use something like collections.Counter
:
如果应考虑出现次数,您可能需要使用以下内容collections.Counter
:
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]
from collections import Counter
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]
>>> final
['f', 'm']
As promised this can also handle differing number of occurences as "difference":
正如承诺的那样,这也可以将不同数量的出现作为“差异”处理:
list_1=["a", "b", "c", "d", "e", 'a']
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]
>>> final
['a', 'f', 'm']
回答by adnan
From ser1 remove items present in ser2.
从 ser1 中删除 ser2 中存在的项目。
Input
输入
ser1 = pd.Series([1, 2, 3, 4, 5]) ser2 = pd.Series([4, 5, 6, 7, 8])
ser1 = pd.Series([1, 2, 3, 4, 5]) ser2 = pd.Series([4, 5, 6, 7, 8])
Solution
解决方案
ser1[~ser1.isin(ser2)]
ser1[~ser1.isin(ser2)]