Python 从另一个列表中删除出现在一个列表中的所有元素
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4211209/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove all the elements that occur in one list from another
提问by fandom
Let's say I have two lists, l1and l2. I want to perform l1 - l2, which returns all elements of l1not in l2.
假设我有两个列表,l1并且l2. 我想执行l1 - l2,它返回l1not in 的所有元素l2。
I can think of a naive loop approach to doing this, but that is going to be really inefficient. What is a pythonic and efficient way of doing this?
我可以想到一种简单的循环方法来执行此操作,但这将非常低效。这样做的pythonic和有效方法是什么?
As an example, if I have l1 = [1,2,6,8] and l2 = [2,3,5,8], l1 - l2should return [1,6]
例如,如果我有l1 = [1,2,6,8] and l2 = [2,3,5,8],l1 - l2应该返回[1,6]
回答by Donut
Python has a language feature called List Comprehensionsthat is perfectly suited to making this sort of thing extremely easy. The following statement does exactly what you want and stores the result in l3:
Python 有一个语言特性叫做List Comprehensions,它非常适合让这种事情变得非常简单。以下语句完全符合您的要求并将结果存储在l3:
l3 = [x for x in l1 if x not in l2]
l3will contain [1, 6].
l3将包含[1, 6].
回答by Arkku
One way is to use sets:
一种方法是使用集合:
>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])
回答by nonot1
Use the Python set type. That would be the most Pythonic. :)
使用 Python 集类型。那将是最Pythonic的。:)
Also, since it's native, it should be the most optimized method too.
此外,由于它是原生的,它也应该是最优化的方法。
See:
看:
http://docs.python.org/library/stdtypes.html#set
http://docs.python.org/library/stdtypes.html#set
http://docs.python.org/library/sets.htm(for older python)
http://docs.python.org/library/sets.htm(适用于较旧的 Python)
# Using Python 2.7 set literal format.
# Otherwise, use: l1 = set([1,2,6,8])
#
l1 = {1,2,6,8}
l2 = {2,3,5,8}
l3 = l1 - l2
回答by Daniel Pryden
Expanding on Donut's answer and the other answers here, you can get even better results by using a generator comprehension instead of a list comprehension, and by using a setdata structure (since the inoperator is O(n) on a list but O(1) on a set).
扩展 Donut 的答案和此处的其他答案,您可以通过使用生成器推导式而不是列表推导式以及使用set数据结构(因为in运算符在列表上为 O(n) 但 O(1))获得更好的结果在一组)。
So here's a function that would work for you:
所以这里有一个适合你的功能:
def filter_list(full_list, excludes):
s = set(excludes)
return (x for x in full_list if x not in s)
The result will be an iterable that will lazily fetch the filtered list. If you need a real list object (e.g. if you need to do a len()on the result), then you can easily build a list like so:
结果将是一个可迭代的,它将懒惰地获取过滤后的列表。如果您需要一个真正的列表对象(例如,如果您需要对len()结果执行 a ),那么您可以轻松构建一个列表,如下所示:
filtered_list = list(filter_list(full_list, excludes))
回答by Akshay Hazari
Alternate Solution :
替代解决方案:
reduce(lambda x,y : filter(lambda z: z!=y,x) ,[2,3,5,8],[1,2,6,8])
回答by Moinuddin Quadri
As an alternative, you may also use filterwith the lambda expressionto get the desired result. For example:
作为替代方案,您也可以使用filterlambda 表达式来获得所需的结果。例如:
>>> l1 = [1,2,6,8]
>>> l2 = set([2,3,5,8])
# v `filter` returns the a iterator object. Here I'm type-casting
# v it to `list` in order to display the resultant value
>>> list(filter(lambda x: x not in l2, l1))
[1, 6]
Performance Comparison
性能比较
Here I am comparing the performance of all the answers mentioned here. As expected, Arkku'ssetbased operation is fastest.
在这里,我比较这里提到的所有答案的性能。正如预期的那样,基于Arkku 的set操作是最快的。
Arkku's Set Difference- First (0.124 usec per loop)
mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2" 10000000 loops, best of 3: 0.124 usec per loopDaniel Pryden's List Comprehension with
setlookup- Second (0.302 usec per loop)mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]" 1000000 loops, best of 3: 0.302 usec per loopDonut's List Comprehension on plain list- Third (0.552 usec per loop)
mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]" 1000000 loops, best of 3: 0.552 usec per loopMoinuddin Quadri's using
filter- Fourth (0.972 usec per loop)mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)" 1000000 loops, best of 3: 0.972 usec per loopAkshay Hazari's using combination of
reduce+filter- Fifth (3.97 usec per loop)mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)" 100000 loops, best of 3: 3.97 usec per loop
Arkku 的设置差异- 第一(每个循环 0.124微秒)
mquadri$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2" 10000000 loops, best of 3: 0.124 usec per loopDaniel Pryden's List Comprehension with
setlookup- Second(0.302 usec per loop)mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]" 1000000 loops, best of 3: 0.302 usec per loop普通列表上的甜甜圈列表理解- 第三个(每个循环 0.552 usec)
mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]" 1000000 loops, best of 3: 0.552 usec per loopMoinuddin Quadri 的使用
filter- 第四个(每个循环 0.972 微秒)mquadri$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)" 1000000 loops, best of 3: 0.972 usec per loopAkshay Hazari 使用
reduce+filter- Fifth 的组合(每个循环 3.97 微秒)mquadri$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)" 100000 loops, best of 3: 3.97 usec per loop
PS:setdoes not maintain the order and removes the duplicate elements from the list. Hence, do not use set differenceif you need any of these.
PS:set不维护顺序并从列表中删除重复的元素。因此,如果您需要其中任何一个,请不要使用集差。
回答by lbsweek
use Set Comprehensions{x for x in l2} or set(l2) to get set, then use List Comprehensionsto get list
使用 Set Comprehensions{x for x in l2} 或 set(l2) 获取集合,然后使用List Comprehensions获取列表
l2set = set(l2)
l3 = [x for x in l1 if x not in l2set]
benchmark test code:
基准测试代码:
import time
l1 = list(range(1000*10 * 3))
l2 = list(range(1000*10 * 2))
l2set = {x for x in l2}
tic = time.time()
l3 = [x for x in l1 if x not in l2set]
toc = time.time()
diffset = toc-tic
print(diffset)
tic = time.time()
l3 = [x for x in l1 if x not in l2]
toc = time.time()
difflist = toc-tic
print(difflist)
print("speedup %fx"%(difflist/diffset))
benchmark test result:
基准测试结果:
0.0015058517456054688
3.968189239501953
speedup 2635.179227x

