Python 在整数序列中查找缺失元素的有效方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16974047/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficient way to find missing elements in an integer sequence
提问by vkaul11
Suppose we have two items missing in a sequence of consecutive integers and the missing elements lie between the first and last elements. I did write a code that does accomplish the task. However, I wanted to make it efficient using less loops if possible. Any help will be appreciated. Also what about the condition when we have to find more missing items (say close to n/4) instead of 2. I think then my code should be efficient right because I am breaking out from the loop earlier?
假设我们在连续整数序列中缺少两个项目,并且缺少的元素位于第一个和最后一个元素之间。我确实编写了一个可以完成任务的代码。但是,如果可能的话,我想使用更少的循环来提高效率。任何帮助将不胜感激。另外,当我们必须找到更多丢失的项目(比如接近 n/4)而不是 2 时的条件呢?我认为我的代码应该是高效的,因为我更早地从循环中跳出来了?
def missing_elements(L,start,end,missing_num):
complete_list = range(start,end+1)
count = 0
input_index = 0
for item in complete_list:
if item != L[input_index]:
print item
count += 1
else :
input_index += 1
if count > missing_num:
break
def main():
L = [10,11,13,14,15,16,17,18,20]
start = 10
end = 20
missing_elements(L,start,end,2)
if __name__ == "__main__":
main()
采纳答案by Lie Ryan
Assuming that L is a list of integers with no duplicates, you can infer that the part of the list between start and index is completely consecutive if and only if L[index] == L[start] + (index - start)and similarly with index and end is completely consecutive if and only if L[index] == L[end] - (end - index). This combined with splitting the list into two recursively gives a sublinear solution.
假设 L 是一个没有重复的整数列表,您可以推断出 start 和 index 之间的列表部分是完全连续的当且仅当L[index] == L[start] + (index - start)并且类似地 index 和 end 是完全连续的当且仅当L[index] == L[end] - (end - index)。这与将列表递归地拆分为两个相结合,给出了一个次线性解决方案。
# python 3.3 and up, in older versions, replace "yield from" with yield loop
def missing_elements(L, start, end):
if end - start <= 1:
if L[end] - L[start] > 1:
yield from range(L[start] + 1, L[end])
return
index = start + (end - start) // 2
# is the lower half consecutive?
consecutive_low = L[index] == L[start] + (index - start)
if not consecutive_low:
yield from missing_elements(L, start, index)
# is the upper part consecutive?
consecutive_high = L[index] == L[end] - (end - index)
if not consecutive_high:
yield from missing_elements(L, index, end)
def main():
L = [10,11,13,14,15,16,17,18,20]
print(list(missing_elements(L,0,len(L)-1)))
L = range(10, 21)
print(list(missing_elements(L,0,len(L)-1)))
main()
回答by Martijn Pieters
If the input sequence is sorted, you could use sets here. Take the start and end values from the input list:
如果输入序列已排序,则可以在此处使用集合。从输入列表中获取开始和结束值:
def missing_elements(L):
start, end = L[0], L[-1]
return sorted(set(range(start, end + 1)).difference(L))
This assumes Python 3; for Python 2, use xrange()to avoid building a list first.
这假设 Python 3;对于 Python 2,用于xrange()避免首先构建列表。
The sorted()call is optional; without it a set()is returned of the missing values, with it you get a sorted list.
该sorted()调用是可选的; 没有它,将set()返回缺失值,有了它,您将获得一个排序列表。
Demo:
演示:
>>> L = [10,11,13,14,15,16,17,18,20]
>>> missing_elements(L)
[12, 19]
Another approach is by detecting gaps between subsequent numbers; using an older itertoolslibrary sliding window recipe:
另一种方法是检测后续数字之间的差距;使用较旧的itertools库滑动窗口配方:
from itertools import islice, chain
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
def missing_elements(L):
missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
return list(missing)
This is a pure O(n) operation, and if you know the number of missing items, you can make sure it only produces those and then stops:
这是一个纯 O(n) 操作,如果您知道丢失项目的数量,您可以确保它只生成那些然后停止:
def missing_elements(L, count):
missing = chain.from_iterable(range(x + 1, y) for x, y in window(L) if (y - x) > 1)
return list(islice(missing, 0, count))
This will handle larger gaps too; if you are missing 2 items at 11 and 12, it'll still work:
这也将处理更大的差距;如果您在 11 和 12 处丢失了 2 个项目,它仍然可以工作:
>>> missing_elements([10, 13, 14, 15], 2)
[11, 12]
and the above sample only had to iterate over [10, 13]to figure this out.
并且上面的示例只需要迭代[10, 13]来解决这个问题。
回答by acjay
Here's a one-liner:
这是一个单行:
In [10]: l = [10,11,13,14,15,16,17,18,20]
In [11]: [i for i, (n1, n2) in enumerate(zip(l[:-1], l[1:])) if n1 + 1 != n2]
Out[11]: [1, 7]
I use the list, slicing to offset the copies by one, and use enumerate to get the indices of the missing item.
我使用列表,切片以将副本偏移一个,并使用 enumerate 获取丢失项目的索引。
For long lists, this isn't great because it's not O(log(n)), but I think it should be pretty efficient versus using a setfor small inputs. izipfrom itertools would probably make it quicker still.
对于长列表,这不是很好,因为它不是 O(log(n)),但我认为与set对小输入使用 a 相比,它应该非常有效。izip从 itertools 可能会使其更快。
回答by user2461709
missingItems = [x for x in complete_list if not x in L]
回答by mVChr
My take was to use no loops and set operations:
我的看法是不使用循环和设置操作:
def find_missing(in_list):
complete_set = set(range(in_list[0], in_list[-1] + 1))
return complete_set - set(in_list)
def main():
sample = [10, 11, 13, 14, 15, 16, 17, 18, 20]
print find_missing(sample)
if __name__ == "__main__":
main()
# => set([19, 12])
回答by aldeb
Using collections.Counter:
from collections import Counter
dic = Counter([10, 11, 13, 14, 15, 16, 17, 18, 20])
print([i for i in range(10, 20) if dic[i] == 0])
Output:
输出:
[12, 19]
回答by Brent Washburne
Simply walk the list and look for non-consecutive numbers:
只需遍历列表并查找不连续的数字:
prev = L[0]
for this in L[1:]:
if this > prev+1:
for item in range(prev+1, this): # this handles gaps of 1 or more
print item
prev = this
回答by Mike Müller
We found a missing value if the difference between two consecutive numbers is greater than 1:
如果两个连续数字之间的差大于 ,我们发现了一个缺失值1:
>>> L = [10,11,13,14,15,16,17,18,20]
>>> [x + 1 for x, y in zip(L[:-1], L[1:]) if y - x > 1]
[12, 19]
Note: Python 3. In Python 2 use itertools.izip.
注意:Python 3。在 Python 2 中使用itertools.izip.
Improved version for more than one value missing in a row:
连续丢失多个值的改进版本:
>>> import itertools as it
>>> L = [10,11,14,15,16,17,18,20] # 12, 13 and 19 missing
>>> [x + diff for x, y in zip(it.islice(L, None, len(L) - 1),
it.islice(L, 1, None))
for diff in range(1, y - x) if diff]
[12, 13, 19]
回答by dansalmo
>>> l = [10,11,13,14,15,16,17,18,20]
>>> [l[i]+1 for i, j in enumerate(l) if (l+[0])[i+1] - l[i] > 1]
[12, 19]
回答by 0x90
Using scipylib:
使用scipy库:
import math
from scipy.optimize import fsolve
def mullist(a):
mul = 1
for i in a:
mul = mul*i
return mul
a = [1,2,3,4,5,6,9,10]
s = sum(a)
so = sum(range(1,11))
mulo = mullist(range(1,11))
mul = mullist(a)
over = mulo/mul
delta = so -s
# y = so - s -x
# xy = mulo/mul
def func(x):
return (so -s -x)*x-over
print int(round(fsolve(func, 0))), int(round(delta - fsolve(func, 0)))
Timing it:
计时:
$ python -mtimeit -s "$(cat with_scipy.py)"
7 8
100000000 loops, best of 3: 0.0181 usec per loop
Other option is:
其他选项是:
>>> from sets import Set
>>> a = Set(range(1,11))
>>> b = Set([1,2,3,4,5,6,9,10])
>>> a-b
Set([8, 7])
And the timing is:
时间是:
Set([8, 7])
100000000 loops, best of 3: 0.0178 usec per loop

