Python 最长递增子序列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3992697/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 13:44:13  来源:igfitidea点击:

Longest increasing subsequence

pythonalgorithmlanguage-agnostic

提问by Jungle Hunter

Given an input sequence, what is the best way to find the longest (not necessarily continuous) non-decreasing subsequence.

给定一个输入序列,找到最长(不一定是连续的)非递减子序列的最佳方法是什么。

0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 # sequence

1, 9, 13, 15 # non-decreasing subsequence

0, 2, 6, 9, 13, 15 # longest non-deceasing subsequence (not unique)

I'm looking for the best algorithm. If there is code, Python would be nice, but anything is alright.

我正在寻找最好的算法。如果有代码,Python 会很好,但什么都好。

采纳答案by Rik Poggi

I just stumbled in this problem, and came up with this Python 3 implementation:

我只是偶然发现了这个问题,并想出了这个 Python 3 实现:

def subsequence(seq):
    if not seq:
        return seq

    M = [None] * len(seq)    # offset by 1 (j -> j-1)
    P = [None] * len(seq)

    # Since we have at least one element in our list, we can start by 
    # knowing that the there's at least an increasing subsequence of length one:
    # the first element.
    L = 1
    M[0] = 0

    # Looping over the sequence starting from the second element
    for i in range(1, len(seq)):
        # Binary search: we want the largest j <= L
        #  such that seq[M[j]] < seq[i] (default j = 0),
        #  hence we want the lower bound at the end of the search process.
        lower = 0
        upper = L

        # Since the binary search will not look at the upper bound value,
        # we'll have to check that manually
        if seq[M[upper-1]] < seq[i]:
            j = upper

        else:
            # actual binary search loop
            while upper - lower > 1:
                mid = (upper + lower) // 2
                if seq[M[mid-1]] < seq[i]:
                    lower = mid
                else:
                    upper = mid

            j = lower    # this will also set the default value to 0

        P[i] = M[j-1]

        if j == L or seq[i] < seq[M[j]]:
            M[j] = i
            L = max(L, j+1)

    # Building the result: [seq[M[L-1]], seq[P[M[L-1]]], seq[P[P[M[L-1]]]], ...]
    result = []
    pos = M[L-1]
    for _ in range(L):
        result.append(seq[pos])
        pos = P[pos]

    return result[::-1]    # reversing

Since it took me some time to understand how the algorithm works I was a little verbose with comments, and I'll also add a quick explanation:

由于我花了一些时间来理解算法的工作原理,所以我的评论有点冗长,我还将添加一个快速解释:

  • seqis the input sequence.
  • Lis a number: it gets updated while looping over the sequence and it marks the length of longest incresing subsequence found up to that moment.
  • Mis a list. M[j-1]will point to an index of seqthat holds the smallest value that could be used (at the end) to build an increasing subsequence of length j.
  • Pis a list. P[i]will point to M[j], where iis the index of seq. In a few words, it tells which is the previous element of the subsequence. Pis used to build the result at the end.
  • seq是输入序列。
  • L是一个数字:它在循环序列时得到更新,它标记了到那一刻找到的最长递增子序列的长度。
  • M是一个列表。M[j-1]将指向一个索引,seq该索引包含可用于(最后)构建 length 递增子序列的最小值j
  • P是一个列表。P[i]将指向M[j]i的索引在哪里seq。简而言之,它告诉哪个是子序列的前一个元素。P用于在最后构建结果。

How the algorithm works:

算法的工作原理:

  1. Handle the special case of an empty sequence.
  2. Start with a subsequence of 1 element.
  3. Loop over the input sequence with index i.
  4. With a binary search find the jthat let seq[M[j]be <than seq[i].
  5. Update P, Mand L.
  6. Traceback the result and return it reversed.
  1. 处理空序列的特殊情况。
  2. 从 1 个元素的子序列开始。
  3. 使用 index 循环输入序列i
  4. 通过二分查找找到jlet seq[M[j]be <than seq[i]
  5. 更新PML
  6. 回溯结果并将其反向返回。

Note:The only differences with the wikipedia algorithmare the offset of 1 in the Mlist, and that Xis here called seq. I also test it with a slightly improved unit test version of the one showed in Eric Gustavson answerand it passed all tests.

注意:维基百科算法的唯一区别是M列表中的偏移量为 1,X这里称为seq. 我还使用Eric Gustavson 答案中显示的单元测试版本的稍微改进的单元测试版本对其进行了测试,并且它通过了所有测试。



Example:

例子:

seq = [30, 10, 20, 50, 40, 80, 60]

       0    1   2   3   4   5   6   <-- indexes

At the end we'll have:

最后我们将有:

M = [1, 2, 4, 6, None, None, None]
P = [None, None, 1, 2, 2, 4, 4]
result = [10, 20, 40, 60]

As you'll see Pis pretty straightforward. We have to look at it from the end, so it tells that before 60there's 40,before 80there's 40, before 40there's 20, before 50there's 20and before 20there's 10, stop.

正如您将看到P的那样非常简单。我们来看看它到底,所以它告诉之前60还有的40,8040,以前4020,之前5020和以前2010,停止。

The complicated part is on M. At the beginning Mwas [0, None, None, ...]since the last element of the subsequence of length 1 (hence position 0 in M) was at the index 0: 30.

复杂的部分在M。一开始M[0, None, None, ...]因为长度为 1 的子序列的最后一个元素(因此位置 0 in M)位于索引 0: 处30

At this point we'll start looping on seqand look at 10, since 10is <than 30, Mwill be updated:

在这一点上,我们将开始循环seq并查看10,因为10is <than 30M将被更新:

if j == L or seq[i] < seq[M[j]]:
    M[j] = i

So now Mlooks like: [1, None, None, ...]. This is a good thing, because 10have more chanches to create a longer increasing subsequence. (The new 1 is the index of 10)

所以,现在M的样子:[1, None, None, ...]。这是一件好事,因为10有更多的机会来创建更长的递增子序列。(新的1是10的索引)

Now it's the turn of 20. With 10and 20we have subsequence of length 2 (index 1 in M), so Mwill be: [1, 2, None, ...]. (The new 2 is the index of 20)

现在轮到了20。与1020我们有长度为2(以索引1的子序列M),因此M将是:[1, 2, None, ...]。(新的2是20的索引)

Now it's the turn of 50. 50will not be part of any subsequence so nothing changes.

现在轮到了5050不会成为任何子序列的一部分,所以没有任何改变。

Now it's the turn of 40. With 10, 20and 40we have a sub of length 3 (index 2 in M, so Mwill be: [1, 2, 4, None, ...]. (The new 4 is the index of 40)

现在轮到了40。用102040我们有长度为3(索引2的在子M,所以M将是:[1, 2, 4, None, ...](新图4是40指数)

And so on...

等等...

For a complete walk through the code you can copy and paste it here:)

要完整浏览代码,您可以将其复制并粘贴到此处:)

回答by codaddict

The most efficient algorithm for this is O(NlogN) outlined here.

最有效的算法是这里概述的 O(NlogN) 。

Another way to solve this is to take the longest common subsequence(LCS) of the original array and it's sorted version, which takes O(N2) time.

解决此问题的另一种方法是采用原始数组的最长公共子序列(LCS) 及其排序版本,这需要 O(N 2) 时间。

回答by Margus

Here is how to simply find longest increasing/decreasing subsequence in Mathematica:

以下是如何在 Mathematica 中简单地找到最长的递增/递减子序列:

 LIS[list_] := LongestCommonSequence[Sort[list], list];
 input={0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15};
 LIS[input]
 -1*LIS[-1*input]

Output:

输出:

{0, 2, 6, 9, 11, 15}
{12, 10, 9, 5, 3}

Mathematica has also LongestIncreasingSubsequencefunction in the Combinatorica`libary. If you do not have Mathematica you can query the WolframAlpha.

Mathematica在Combinatorica`库中也有LongestIncreasingSubsequence函数。如果您没有 Mathematica,您可以查询WolframAlpha

C++ O(nlogn) solution

There's also an O(nlogn) solution based on some observations. Let Ai,j be the smallest possible tail out of all increasing subsequences of length j using elements a1, a2, ... , ai. Observe that, for any particular i, Ai,1, Ai,2, ... , Ai,j. This suggests that if we want the longest subsequence that ends with ai + 1, we only need to look for a j such that Ai,j < ai + 1 < = Ai,j + 1 and the length will be j + 1. Notice that in this case, Ai + 1,j + 1 will be equal to ai + 1, and all Ai + 1,k will be equal to Ai,k for k!=j+1. Furthermore, there is at most one difference between the set Ai and the set Ai + 1, which is caused by this search. Since A is always ordered in increasing order, and the operation does not change this ordering, we can do a binary search for every single a1, a2, ... , an.

Implementation C++(O(nlogn) algorithm)

#include <vector>
using namespace std;

/* Finds longest strictly increasing subsequence. O(n log k) algorithm. */
void find_lis(vector<int> &a, vector<int> &b)
{
  vector<int> p(a.size());
  int u, v;

  if (a.empty()) return;

  b.push_back(0);

  for (size_t i = 1; i < a.size(); i++) {
      if (a[b.back()] < a[i]) {
          p[i] = b.back();
          b.push_back(i);
          continue;
      }

      for (u = 0, v = b.size()-1; u < v;) {
          int c = (u + v) / 2;
          if (a[b[c]] < a[i]) u=c+1; else v=c;
      }

      if (a[i] < a[b[u]]) {
          if (u > 0) p[i] = b[u-1];
          b[u] = i;
      }   
  }

  for (u = b.size(), v = b.back(); u--; v = p[v]) b[u] = v;
}

/* Example of usage: */
#include <cstdio>
int main()
{
  int a[] = { 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 };
  vector<int> seq(a, a+sizeof(a)/sizeof(a[0]));
  vector<int> lis;
        find_lis(seq, lis);

  for (size_t i = 0; i < lis.size(); i++)
      printf("%d ", seq[lis[i]]);
        printf("\n");    

  return 0;
}

C++ O(nlogn) 解决方案

还有一个基于一些观察的 O(nlogn) 解决方案。令 Ai,j 是所有长度为 j 的递增子序列中使用元素 a 1, a 2, ... , a i的最小可能尾部。观察到,对于任何特定的 i, A i,1, A i,2, ... , A i,j. 这表明如果我们想要以 ai + 1 结尾的最长子序列,我们只需要寻找 aj 使得 Ai,j < ai + 1 < = Ai,j + 1 并且长度将为 j + 1。注意,在这种情况下,Ai + 1,j + 1 将等于 ai + 1,并且所有 Ai + 1,k 将等于 Ai,k,因为 k!=j+1。此外,集合Ai和集合Ai+1之间最多有一个差异,这是由该搜索引起的。由于 A 总是按升序排列,并且操作不会改变这种顺序,我们可以对每个 a 1, a 2, ... , a n进行二分搜索。

实现C++(O(nlogn) 算法)

#include <vector>
using namespace std;

/* Finds longest strictly increasing subsequence. O(n log k) algorithm. */
void find_lis(vector<int> &a, vector<int> &b)
{
  vector<int> p(a.size());
  int u, v;

  if (a.empty()) return;

  b.push_back(0);

  for (size_t i = 1; i < a.size(); i++) {
      if (a[b.back()] < a[i]) {
          p[i] = b.back();
          b.push_back(i);
          continue;
      }

      for (u = 0, v = b.size()-1; u < v;) {
          int c = (u + v) / 2;
          if (a[b[c]] < a[i]) u=c+1; else v=c;
      }

      if (a[i] < a[b[u]]) {
          if (u > 0) p[i] = b[u-1];
          b[u] = i;
      }   
  }

  for (u = b.size(), v = b.back(); u--; v = p[v]) b[u] = v;
}

/* Example of usage: */
#include <cstdio>
int main()
{
  int a[] = { 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 };
  vector<int> seq(a, a+sizeof(a)/sizeof(a[0]));
  vector<int> lis;
        find_lis(seq, lis);

  for (size_t i = 0; i < lis.size(); i++)
      printf("%d ", seq[lis[i]]);
        printf("\n");    

  return 0;
}

Source: link

来源:链接

I have rewritten the C++ implementation to Java a while ago, and can confirm it works. Vector alternative in python is List. But if you want to test it yourself, here is link for online compiler with example implementation loaded: link

不久前我已经将 C++ 实现重写为 Java,并且可以确认它有效。python中的向量替代是List。但是如果你想自己测试它,这里是加载了示例实现的在线编译器的链接link

Example data is: { 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 }and answer: 1 3 4 5 6 7.

示例数据是:{ 1, 9, 3, 8, 11, 4, 5, 6, 4, 19, 7, 1, 7 }和答案:1 3 4 5 6 7

回答by zzz

Here is some python code with tests which implements the algorithm running in O(n*log(n)). I found this on a the wikipedia talk pageabout the longest increasing subsequence.

这是一些带有测试的 python 代码,它实现了在 O(n*log(n)) 中运行的算法。我发现这是一个在上维基百科讨论页关于最长递增子

import unittest


def LongestIncreasingSubsequence(X):
    """
    Find and return longest increasing subsequence of S.
    If multiple increasing subsequences exist, the one that ends
    with the smallest value is preferred, and if multiple
    occurrences of that value can end the sequence, then the
    earliest occurrence is preferred.
    """
    n = len(X)
    X = [None] + X  # Pad sequence so that it starts at X[1]
    M = [None]*(n+1)  # Allocate arrays for M and P
    P = [None]*(n+1)
    L = 0
    for i in range(1,n+1):
        if L == 0 or X[M[1]] >= X[i]:
            # there is no j s.t. X[M[j]] < X[i]]
            j = 0
        else:
            # binary search for the largest j s.t. X[M[j]] < X[i]]
            lo = 1      # largest value known to be <= j
            hi = L+1    # smallest value known to be > j
            while lo < hi - 1:
                mid = (lo + hi)//2
                if X[M[mid]] < X[i]:
                    lo = mid
                else:
                    hi = mid
            j = lo

        P[i] = M[j]
        if j == L or X[i] < X[M[j+1]]:
            M[j+1] = i
            L = max(L,j+1)

    # Backtrack to find the optimal sequence in reverse order
    output = []
    pos = M[L]
    while L > 0:
        output.append(X[pos])
        pos = P[pos]
        L -= 1

    output.reverse()
    return output

# Try small lists and check that the correct subsequences are generated.

class LISTest(unittest.TestCase):
    def testLIS(self):
        self.assertEqual(LongestIncreasingSubsequence([]),[])
        self.assertEqual(LongestIncreasingSubsequence(range(10,0,-1)),[1])
        self.assertEqual(LongestIncreasingSubsequence(range(10)),range(10))
        self.assertEqual(LongestIncreasingSubsequence(\
            [3,1,4,1,5,9,2,6,5,3,5,8,9,7,9]), [1,2,3,5,8,9])

unittest.main()

回答by benben

    int[] a = {1,3,2,4,5,4,6,7};
    StringBuilder s1 = new StringBuilder();
    for(int i : a){
     s1.append(i);
    }       
    StringBuilder s2 = new StringBuilder();
    int count = findSubstring(s1.toString(), s2);       
    System.out.println(s2.reverse());

public static int findSubstring(String str1, StringBuilder s2){     
    StringBuilder s1 = new StringBuilder(str1);
    if(s1.length() == 0){
        return 0;
    }
    if(s2.length() == 0){
        s2.append(s1.charAt(s1.length()-1));
        findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s2);           
    } else if(s1.charAt(s1.length()-1) < s2.charAt(s2.length()-1)){ 
        char c = s1.charAt(s1.length()-1);
        return 1 + findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s2.append(c));
    }
    else{
        char c = s1.charAt(s1.length()-1);
        StringBuilder s3 = new StringBuilder();
        for(int i=0; i < s2.length(); i++){
            if(s2.charAt(i) > c){
                s3.append(s2.charAt(i));
            }
        }
        s3.append(c);
        return Math.max(findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s2), 
                findSubstring(s1.deleteCharAt(s1.length()-1).toString(), s3));
    }       
    return 0;
}

回答by Deepak Singhvi

Here is the code and explanation with Java, may be I will add for python soon.

这是Java的代码和解释,可能我很快就会为python添加。

arr = {0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15}
  1. list = {0} - Initialize list to the empty set
  2. list = {0,8} - New largest LIS
  3. list = {0, 4} - Changed 8 to 4
  4. list = {0, 4, 12} - New largest LIS
  5. list = {0, 2, 12} - Changed 4 to 2
  6. list = {0, 2, 10} - Changed 12 to 10
  7. list = {0, 2, 6} - Changed 10 to 6
  8. list = {0, 2, 6, 14} - New largest LIS
  9. list = {0, 1, 6, 14} - Changed 2 to 1
  10. list = {0, 1, 6, 9} - Changed 14 to 9
  11. list = {0, 1, 5, 9} - Changed 6 to 5
  12. list = {0, 1, 6, 9, 13} - Changed 3 to 2
  13. list = {0, 1, 3, 9, 11} - New largest LIS
  14. list = {0, 1, 3, 9, 11} - Changed 9 to 5
  15. list = {0, 1, 3, 7, 11} - New largest LIS
  16. list = {0, 1, 3, 7, 11, 15} - New largest LIS
  1. list = {0} - 将列表初始化为空集
  2. list = {0,8} - 新的最大 LIS
  3. list = {0, 4} - 将 8 改为 4
  4. list = {0, 4, 12} - 新的最大 LIS
  5. list = {0, 2, 12} - 将 4 改为 2
  6. list = {0, 2, 10} - 将 12 改为 10
  7. list = {0, 2, 6} - 将 10 改为 6
  8. list = {0, 2, 6, 14} - 新的最大 LIS
  9. list = {0, 1, 6, 14} - 将 2 改为 1
  10. list = {0, 1, 6, 9} - 将 14 改为 9
  11. list = {0, 1, 5, 9} - 将 6 改为 5
  12. list = {0, 1, 6, 9, 13} - 将 3 改为 2
  13. list = {0, 1, 3, 9, 11} - 新的最大 LIS
  14. list = {0, 1, 3, 9, 11} - 将 9 改为 5
  15. list = {0, 1, 3, 7, 11} - 新的最大 LIS
  16. list = {0, 1, 3, 7, 11, 15} - 新的最大 LIS

So the length of the LIS is 6 (the size of list).

所以 LIS 的长度是 6(列表的大小)。

import java.util.ArrayList;
import java.util.List;


public class LongestIncreasingSubsequence {
    public static void main(String[] args) {
        int[] arr = { 0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 };
        increasingSubsequenceValues(arr);
    }

    public static void increasingSubsequenceValues(int[] seq) {
        List<Integer> list = new ArrayList<Integer>();
        for (int i = 0; i < seq.length; i++) {
            int j = 0;
            boolean elementUpdate = false;
            for (; j < list.size(); j++) {
                if (list.get(j) > seq[i]) {
                    list.add(j, seq[i]);
                    list.remove(j + 1);
                    elementUpdate = true;
                    break;
                }
            }
            if (!elementUpdate) {
                list.add(j, seq[i]);
            }
        }
        System.out.println("Longest Increasing Subsequence" + list);
    }


}

Output for the above code: Longest Increasing Subsequence[0, 1, 3, 7, 11, 15]

上述代码的输出:最长递增子序列[0, 1, 3, 7, 11, 15]

回答by cmantas

here's a compact implementation using "enumerate"

这是使用“枚举”的紧凑实现

def lis(l):

# we will create a list of lists where each sub-list contains
# the longest increasing subsequence ending at this index
lis = [[e] for e in l]
# start with just the elements of l as contents of the sub-lists

# iterate over (index,value) of l
for i, e in enumerate(l):
    # (index,value) tuples for elements b where b<e and a<i
    lower_tuples = filter(lambda (a,b): b<e, enumerate(l[:i]))
    # if no such items, nothing to do
    if not lower_tuples: continue
    # keep the lis-es of such items
    lowerlises = [lis[a] for a,b in  lower_tuples ]
    # choose the longest one of those and add
    # to the current element's lis
    lis[i] = max(lowerlises, key=len) + [e]

# retrun the longest of lis-es
return max(lis, key=len)

回答by isarandi

Here's a more compact but still efficient Python implementation:

这是一个更紧凑但仍然有效的 Python 实现:

def longest_increasing_subsequence_indices(seq):
    from bisect import bisect_right

    if len(seq) == 0:
        return seq

    # m[j] in iteration i is the last index of the increasing subsequence of seq[:i]
    # that ends with the lowest possible value while having length j
    m = [None] * len(seq)
    predecessor = [None] * len(seq)
    best_len = 0

    for i, item in enumerate(seq):
        j = bisect_right([seq[k] for k in m[:best_len]], item)
        m[j] = i
        predecessor[i] = m[j-1] if j > 0 else None
        best_len = max(best_len, j+1)

    result = []
    i = m[best_len-1]
    while i is not None:
        result.append(i)
        i = predecessor[i]
    result.reverse()
    return result

def longest_increasing_subsequence(seq):
    return [seq[i] for i in longest_increasing_subsequence_indices(seq)]

回答by arekolek

Here is a pretty general solution that:

这是一个非常通用的解决方案:

  • runs in O(n log n)time,
  • handles increasing, nondecreasing, decreasing and nonincreasing subsequences,
  • works with any sequence objects, including list, numpy.array, strand more,
  • supports lists of objects and custom comparison methods through the keyparameter that works like the one in the builtin sortedfunction,
  • can return the elements of the subsequence or their indices.
  • 在运行O(n log n)时,
  • 处理递增、非递减、递减和非递增子序列,
  • 与任何序列对象,包括工程listnumpy.arraystr多,
  • 通过key类似于内置sorted函数中的参数,支持对象列表和自定义比较方法,
  • 可以返回子序列的元素或其索引。

The code:

编码:

from bisect import bisect_left, bisect_right
from functools import cmp_to_key

def longest_subsequence(seq, mode='strictly', order='increasing',
                        key=None, index=False):

  bisect = bisect_left if mode.startswith('strict') else bisect_right

  # compute keys for comparison just once
  rank = seq if key is None else map(key, seq)
  if order == 'decreasing':
    rank = map(cmp_to_key(lambda x,y: 1 if x<y else 0 if x==y else -1), rank)
  rank = list(rank)

  if not rank: return []

  lastoflength = [0] # end position of subsequence with given length
  predecessor = [None] # penultimate element of l.i.s. ending at given position

  for i in range(1, len(seq)):
    # seq[i] can extend a subsequence that ends with a lesser (or equal) element
    j = bisect([rank[k] for k in lastoflength], rank[i])
    # update existing subsequence of length j or extend the longest
    try: lastoflength[j] = i
    except: lastoflength.append(i)
    # remember element before seq[i] in the subsequence
    predecessor.append(lastoflength[j-1] if j > 0 else None)

  # trace indices [p^n(i), ..., p(p(i)), p(i), i], where n=len(lastoflength)-1
  def trace(i):
    if i is not None:
      yield from trace(predecessor[i])
      yield i
  indices = trace(lastoflength[-1])

  return list(indices) if index else [seq[i] for i in indices]

I wrote a docstring for the function that I didn't paste above in order to show off the code:

我为上面没有粘贴的函数写了一个文档字符串,以展示代码:

"""
Return the longest increasing subsequence of `seq`.

Parameters
----------
seq : sequence object
  Can be any sequence, like `str`, `list`, `numpy.array`.
mode : {'strict', 'strictly', 'weak', 'weakly'}, optional
  If set to 'strict', the subsequence will contain unique elements.
  Using 'weak' an element can be repeated many times.
  Modes ending in -ly serve as a convenience to use with `order` parameter,
  because `longest_sequence(seq, 'weakly', 'increasing')` reads better.
  The default is 'strict'.
order : {'increasing', 'decreasing'}, optional
  By default return the longest increasing subsequence, but it is possible
  to return the longest decreasing sequence as well.
key : function, optional
  Specifies a function of one argument that is used to extract a comparison
  key from each list element (e.g., `str.lower`, `lambda x: x[0]`).
  The default value is `None` (compare the elements directly).
index : bool, optional
  If set to `True`, return the indices of the subsequence, otherwise return
  the elements. Default is `False`.

Returns
-------
elements : list, optional
  A `list` of elements of the longest subsequence.
  Returned by default and when `index` is set to `False`.
indices : list, optional
  A `list` of indices pointing to elements in the longest subsequence.
  Returned when `index` is set to `True`.
"""

Some examples:

一些例子:

>>> seq = [0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15]

>>> longest_subsequence(seq)
[0, 2, 6, 9, 11, 15]

>>> longest_subsequence(seq, order='decreasing')
[12, 10, 9, 5, 3]

>>> txt = ("Given an input sequence, what is the best way to find the longest"
               " (not necessarily continuous) non-decreasing subsequence.")

>>> ''.join(longest_subsequence(txt))
' ,abdegilnorsu'

>>> ''.join(longest_subsequence(txt, 'weak'))
'              ceilnnnnrsssu'

>>> ''.join(longest_subsequence(txt, 'weakly', 'decreasing'))
'vuutttttttssronnnnngeee.'

>>> dates = [
...   ('2015-02-03', 'name1'),
...   ('2015-02-04', 'nameg'),
...   ('2015-02-04', 'name5'),
...   ('2015-02-05', 'nameh'),
...   ('1929-03-12', 'name4'),
...   ('2023-07-01', 'name7'),
...   ('2015-02-07', 'name0'),
...   ('2015-02-08', 'nameh'),
...   ('2015-02-15', 'namex'),
...   ('2015-02-09', 'namew'),
...   ('1980-12-23', 'name2'),
...   ('2015-02-12', 'namen'),
...   ('2015-02-13', 'named'),
... ]

>>> longest_subsequence(dates, 'weak')

[('2015-02-03', 'name1'),
 ('2015-02-04', 'name5'),
 ('2015-02-05', 'nameh'),
 ('2015-02-07', 'name0'),
 ('2015-02-08', 'nameh'),
 ('2015-02-09', 'namew'),
 ('2015-02-12', 'namen'),
 ('2015-02-13', 'named')]

>>> from operator import itemgetter

>>> longest_subsequence(dates, 'weak', key=itemgetter(0))

[('2015-02-03', 'name1'),
 ('2015-02-04', 'nameg'),
 ('2015-02-04', 'name5'),
 ('2015-02-05', 'nameh'),
 ('2015-02-07', 'name0'),
 ('2015-02-08', 'nameh'),
 ('2015-02-09', 'namew'),
 ('2015-02-12', 'namen'),
 ('2015-02-13', 'named')]

>>> indices = set(longest_subsequence(dates, key=itemgetter(0), index=True))

>>> [e for i,e in enumerate(dates) if i not in indices]

[('2015-02-04', 'nameg'),
 ('1929-03-12', 'name4'),
 ('2023-07-01', 'name7'),
 ('2015-02-15', 'namex'),
 ('1980-12-23', 'name2')]

This answer was in part inspired by the question over at Code Reviewand in part by question asking about "out of sequence" values.

这个答案部分受到Code Review 上问题的启发,部分受到询问“乱序”值问题的启发。

回答by Old Pro

There are several answers in code, but I found them a bit hard to understand, so here is an explanation of the general idea, leaving out all the optimizations. I will get to the optimizations later.

代码中有几个答案,但我发现它们有点难以理解,所以这里是对总体思路的解释,省略了所有优化。稍后我将讨论优化。

We will use the sequence 2, 8, 4, 12, 3, 10 and, to make it easier to follow, we will require the input sequence to not be empty and to not include the same number more than once.

我们将使用序列 2, 8, 4, 12, 3, 10 ,为了更容易理解,我们将要求输入序列不能为空,并且不能多次包含相同的数字。

We go through the sequence in order.

我们按顺序进行。

As we do, we maintain a set of sequences, the best sequences we have found so far for each length. After we find the first sequence of length 1, which is the first element of the input sequence, we are guaranteed to have a set of sequences for each possible length from 1 to the longest we have found so far. This is obvious, because if we have a sequence of length 3, then the first 2 elements of that sequence are a sequence of length 2.

正如我们所做的那样,我们维护了一组序列,这是迄今为止我们为每个长度找到的最佳序列。在我们找到第一个长度为 1 的序列后,它是输入序列的第一个元素,我们保证对于从 1 到我们迄今为止找到的最长长度的每个可能长度都有一组序列。这很明显,因为如果我们有一个长度为 3 的序列,那么该序列的前 2 个元素是一个长度为 2 的序列。

So we start with the first element being a sequence of length 1 and our set looks like

所以我们从第一个元素是长度为 1 的序列开始,我们的集合看起来像

 1: 2

We take the next element of the sequence (8) and look for the longest sequence we can add it to. This is sequence 1, so we get

我们取序列 (8) 的下一个元素并寻找我们可以将其添加到的最长序列。这是序列 1,所以我们得到

1: 2
2: 2 8

We take the next element of the sequence (4) and look for the longest sequence we can add it to. The longest sequence we can add it to is the one of length 1 (which is just 2). Here is what I found to be the tricky (or at least non-obvious) part.Because we could not add it to the end of the sequence of length 2 (2 8) that means it must be a better choice to end the length 2 candidate. If the element were greater than 8, it would have tacked on to the length 2 sequence and given us a new length 3 sequence. So we know that it is less than 8 and therefore replace the 8 with the 4.

我们取序列 (4) 的下一个元素并寻找我们可以将其添加到的最长序列。我们可以将其添加到的最长序列是长度为 1 的序列(即2)。这是我发现的棘手(或至少不明显)的部分。因为我们无法将它添加到长度为 2( 2 8)的序列的末尾,这意味着以长度为 2 的候选结尾一定是更好的选择。如果元素大于 8,它将附加到长度为 2 的序列并给我们一个新的长度为 3 的序列。所以我们知道它小于 8,因此用 4 替换 8。

Algorithmically, what we say is that whatever is the longest sequence we can tack the element onto, that sequence plus this element is the best candidate for a sequence of the resulting length. Note that every element we process must belong somewhere (because we ruled out duplicate numbers in the input). If it is smaller than the element in length 1, it is the new length 1, otherwise it goes on the end of some existing sequence.Here, the length 1 sequence plus the element 4 becomes the new length 2 sequence and we have:

从算法上讲,我们所说的是,无论我们可以将元素添加到哪个最长的序列上,该序列加上该元素都是结果长度序列的最佳候选者。请注意,我们处理的每个元素都必须属于某个地方(因为我们排除了输入中的重复数字)。如果它小于长度为 1 的元素,则它是新的长度 1,否则它位于某个现有序列的末尾。这里,长度为 1 的序列加上元素 4 成为新的长度为 2 的序列,我们有:

1: 2
2: 2 4 (replaces 2 8)

The next element, 12, gives us a sequence of length 3 and we have

下一个元素 12 给了我们一个长度为 3 的序列,我们有

1: 2
2: 2 4
3: 2 4 12

The next element, 3, gives us a better sequence of length 2:

下一个元素 3 为我们提供了一个更好的长度为 2 的序列:

1: 2
2: 2 3 (replaces 2 4)
3: 2 4 12

Note the we cannot alter the sequence of length 3 (substituting the 3 for the 4) because they did not occur in that order in the input sequence. The next element, 10, takes care of this. Because the best we can do with 10 is add it on to 2 3it becomes the new list of length 3:

请注意,我们不能改变长度为 3 的序列(用 3 代替 4),因为它们在输入序列中没有按该顺序出现。下一个元素 10 负责处理这个问题。因为我们可以用 10 做的最好的事情是将它添加到2 3它成为长度为 3 的新列表中:

1: 2
2: 2 3
3: 2 3 10 (replaces 2 4 12)

Note that in terms of the algorithm, we really don't care what comes before the last element on any of our candidate sequences, but of course we need to keep track so that at the end we can output the full sequence.

请注意,就算法而言,我们真的不关心任何候选序列的最后一个元素之前是什么,但当然我们需要跟踪以便最后我们可以输出完整的序列。

We keep processing input elements like this: just tack each one onto the longest sequence we can and make that the new candidate sequence for the resulting length, because it is guaranteed not to be worse than the existing sequence of that length. At the end, we output the longest sequence we have found.

我们像这样继续处理输入元素:只需将每个元素添加到我们可以使用的最长序列上,并将其作为结果长度的新候选序列,因为它保证不会比该长度的现有序列更糟糕。最后,我们输出我们找到的最长序列。

Optimizations

优化

One optimizationis that we do not really need to store the entire sequence of each length. To do so would take space of O(n^2). For the most part, we can get away with just storing the last element of each sequence, since that is all we ever compare against. (I will get to why this is not entirely sufficient in a bit. See if you can figure out why before I get to it.)

一种优化是我们并不真的需要存储每个长度的整个序列。这样做会占用 O(n^2) 的空间。在大多数情况下,我们可以只存储每个序列的最后一个元素,因为这是我们比较过的所有元素。(稍后我会解释为什么这还不够充分。看看你是否能在我开始之前弄清楚原因。)

So let's say we will store our set of sequences as an array Mwhere M[x]holds the last element of the sequence of length x. If you think about it, you will realize that the elements of Mare themselves in increasing order: they are sorted. If M[x+1]were less than M[x], it would have replaced M[x]instead.

因此,假设我们将我们的序列集存储为一个数组M,其中M[x]包含 length 序列的最后一个元素x。如果你仔细想想,你会意识到 的元素M本身是按递增顺序排列的:它们是有序的。如果M[x+1]小于M[x],它将被M[x]替换。

Since Mis sorted, the next optimizationgoes to something I totally glossed over above: how do we find the sequence to add on to? Well, since Mis sorted, we can just do a binary search to find the largest M[x]less than the element to be added. That is the sequence we add on to.

由于M已排序,下一个优化将进行我在上面完全掩盖的内容:我们如何找到要添加的序列?好吧,既然M是排序的,我们就可以做一个二分搜索来找到M[x]小于要添加的元素的最大的。这就是我们添加的序列。

This is great if all we want to do is find the length of the longest sequence. However, Mis not sufficient to reconstruct the sequence itself. Remember, at one point our set looked like this:

如果我们只想找到最长序列的长度,那就太好了。然而,M不足以重建序列本身。请记住,有一次我们的集合是这样的:

1: 0
2: 0 2
3: 0 4 12

We cannot just output Mitself as the sequence. We need more information in order to be able to reconstruct the sequence. For this, we make 2 more changes. First, we store the input sequence in an array seqand instead of storing the value of the element in M[x], we store the index of the element in seq, so the value is seq[M[x]].

我们不能仅仅将M自身输出为序列。我们需要更多信息才能重建序列。为此,我们再进行 2 次更改首先,我们将输入序列存储在一个数组中seq,而不是将元素的值存储在 中M[x],而是将元素的索引存储在 中seq,因此值为seq[M[x]]

We do this so that we can keep a record of the entire sequence by chaining subsequences. As you saw at the beginning, every sequence is created by adding a single element to the end of an already existing sequence. So, second, we keep another array Pthat stores the index (in seq) of the last element of the sequence we are adding on to. In order to make it chainable, since what we are storing in Pis an index of seqwe have to index Pitself by an index of seq.

我们这样做是为了通过链接子序列来记录整个序列。正如您在开头看到的,每个序列都是通过在现有序列的末尾添加一个元素来创建的。因此,第二,我们保留另一个数组P,用于存储seq我们要添加到的序列的最后一个元素的索引 (in )。为了使其可链接,因为我们存储的P是一个索引,seq我们必须通过索引索引P它自己seq

The way this works is that when processing element iof seq, we find which sequence we are adding onto. Remember, we are going to tack seq[i]onto a sequence of length xto create a new sequence of length x+1for some x, and we are storing i, not seq[i]in M[x+1]. Later, when we find that x+1is the biggest length possible, we are going to want to reconstruct the sequence, but the only starting point we have is M[x+1].

它的工作方式是,在处理 的元素iseq,我们会找到要添加到哪个序列上。请记住,我们将添加seq[i]一个长度x序列来x+1为 some创建一个新的长度序列x,并且我们正在存储i,而不是seq[i]在 中M[x+1]。稍后,当我们发现这x+1是可能的最大长度时,我们将要重建序列,但我们唯一的起点是M[x+1]

What we do is set M[x+1] = iand P[i] = M[x](which is identical to P[M[x+1]] = M[x]), which is to say that for every element iwe add, we store ias the last element in the longest chain we can and we store the index of the last element of the chain we are extending in P[i]. So we have:

我们所做的是设置M[x+1] = iP[i] = M[x](与 相同P[M[x+1]] = M[x]),也就是说,对于i我们添加的每个元素,我们将其存储i为最长链中的最后一个元素,并存储我们所在链的最后一个元素的索引中延伸P[i]。所以我们有:

last element: seq[M[x]]
 before that: seq[P[M[x]]]
 before that: seq[P[P[M[x]]]]
 etc...

And now we are done. If you want to compare this to actual code, you can look at the otherexamples. The main differences are they use jinstead of x, may store the list of length jat M[j-1]instead of M[j]to avoid wasting the space at M[0], and may call the input sequence Xinstead of seq.

现在我们完成了。如果您想将其与实际代码进行比较,可以查看其他示例。主要区别在于它们使用j代替x,可以存储长度列表jatM[j-1]代替M[j]以避免浪费空间 at M[0],并且可以调用输入序列X代替seq