Python 列表过滤:从列表列表中删除子集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1318935/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:56:27  来源:igfitidea点击:

Python list filtering: remove subsets from list of lists

pythonlist

提问by Oliver

Using Python how do you reduce a list of lists by an ordered subset match [[..],[..],..]?

使用 Python 如何通过有序子集匹配减少列表列表[[..],[..],..]

In the context of this question a list L is a subsetof list Mif Mcontains all members of L, and in the same order. For example, the list [1,2] is a subset of the list [1,2,3], but not of the list [2,1,3].

在这个问题的上下文中,列表 L 是列表的子集M如果M包含 , 的所有成员L,并且顺序相同。例如,列表 [1,2] 是列表 [1,2,3] 的子集,但不是列表 [2,1,3] 的子集。

Example input:

示例输入:

a. [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
b. [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]

Expected result:

预期结果:

a. [[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
b. [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69],  [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]

Further Examples:

进一步的例子:

L = [[1, 2, 3, 4, 5, 6, 7], [1, 2, 5, 6]]- No reduce

L = [[1, 2, 3, 4, 5, 6, 7], [1, 2, 5, 6]]- 没有减少

L = [[1, 2, 3, 4, 5, 6, 7],[1, 2, 3], [1, 2, 4, 8]]- Yes reduce

L = [[1, 2, 3, 4, 5, 6, 7],[1, 2, 3], [1, 2, 4, 8]]- 是减少

L = [[1, 2, 3, 4, 5, 6, 7], [7, 6, 5, 4, 3, 2, 1]]- No reduce

L = [[1, 2, 3, 4, 5, 6, 7], [7, 6, 5, 4, 3, 2, 1]]- 没有减少

(Sorry for causing confusion with the incorrect data set.)

(很抱歉导致与不正确的数据集混淆。)

采纳答案by Oliver

Thanks to all who suggested solutions and coping with my sometimes erroneous data sets. Using @hughdbrownsolution I modified it to what I wanted:

感谢所有提出解决方案并处理我有时错误的数据集的人。使用@hughdbrown解决方案,我将其修改为我想要的:

The modification was to use a sliding window over the target to ensure the subset sequence was found. I think I should have used a more appropriate word than 'Set' to describe my problem.

修改是在目标上使用滑动窗口以确保找到子集序列。我想我应该使用比“设置”更合适的词来描述我的问题。

def is_sublist_of_any_list(cand, lists):
    # Compare candidate to a single list
    def is_sublist_of_list(cand, target):
        try:
            i = 0            
            try:
                start = target.index(cand[0])
            except:
                return False

            while start < (len(target) + len(cand)) - start:
                if cand == target[start:len(cand)]:
                    return True
                else:
                    start = target.index(cand[0], start + 1)
        except ValueError:
            return False

    # See if candidate matches any other list
    return any(is_sublist_of_list(cand, target) for target in lists if len(cand) <= len(target))

# Compare candidates to all other lists
def super_lists(lists):
    a = [cand for i, cand in enumerate(lists) if not is_sublist_of_any_list(cand, lists[:i] + lists[i+1:])]
    return a

lists = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
expect = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69],  [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]

def test():
    out = super_lists(list(lists))

    print "In  : ", lists
    print "Out : ", out

    assert (out == expect)

Result:

结果:

In  :  [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
Out :  [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]

回答by iElectric

This could be simplified, but:

这可以简化,但是:

l = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
l2 = l[:]

for m in l:
    for n in l:
        if set(m).issubset(set(n)) and m != n:
            l2.remove(m)
            break

print l2
[[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]

回答by Triptych

This code should be rather memory efficient. Beyond storing your initial list of lists, this code uses negligible extra memory (no temporary sets or copies of lists are created).

这段代码应该具有相当的内存效率。除了存储您的初始列表列表之外,此代码使用的额外内存可以忽略不计(不会创建列表的临时集或副本)。

def is_subset(needle,haystack):
   """ Check if needle is ordered subset of haystack in O(n)  """

   if len(haystack) < len(needle): return False

   index = 0
   for element in needle:
      try:
         index = haystack.index(element, index) + 1
      except ValueError:
         return False
   else:
      return True

def filter_subsets(lists):
   """ Given list of lists, return new list of lists without subsets  """

   for needle in lists:
      if not any(is_subset(needle, haystack) for haystack in lists
         if needle is not haystack):
         yield needle

my_lists = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], 
            [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]    
print list(filter_subsets(my_lists))

>>> [[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]

And, just for fun, a one-liner:

而且,只是为了好玩,单行:

def filter_list(L):
    return [x for x in L if not any(set(x)<=set(y) for y in L if x is not y)]

回答by hughdbrown

A list is a superlist if it is not a subset of any other list. It's a subset of another list if every element of the list can be found, in order, in another list.

如果列表不是任何其他列表的子集,则该列表是超级列表。如果列表的每个元素都可以在另一个列表中按顺序找到,那么它就是另一个列表的子集。

Here's my code:

这是我的代码:

def is_sublist_of_any_list(cand, lists):
    # Compare candidate to a single list
    def is_sublist_of_list(cand, target):
        try:
            i = 0
            for c in cand:
                i = 1 + target.index(c, i)
            return True
        except ValueError:
            return False
    # See if candidate matches any other list
    return any(is_sublist_of_list(cand, target) for target in lists if len(cand) <= len(target))

# Compare candidates to all other lists
def super_lists(lists):
    return [cand for i, cand in enumerate(lists) if not is_sublist_of_any_list(cand, lists[:i] + lists[i+1:])]

if __name__ == '__main__':
    lists = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
    superlists = super_lists(lists)
    print superlists

Here are the results:

结果如下:

[[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]

Edit: Results for your later data set.

编辑:您以后的数据集的结果。

>>> lists = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17,
 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2,
 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
>>> superlists = super_lists(lists)
>>> expected = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [5
0, 69],  [2, 3, 21], [1, 2, 4, 8]]
>>> assert(superlists == expected)
>>> print superlists
[[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3,
21], [1, 2, 4, 8]]

回答by quamrana

Refined answer after new test case:

新测试用例后的精炼答案:

original= [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]

class SetAndList:
    def __init__(self,aList):
        self.list=aList
        self.set=set(aList)
        self.isUnique=True
    def compare(self,other):
        if self.set.issubset(other.set):
            #print self.list,'superceded by',other.list
            self.isUnique=False

def listReduce(lists):
    temp=[]
    for l in lists:
        s=SetAndList(l)
        for t in temp:
            t.compare(s)
            s.compare(t)
        temp.append( s )
        temp=[t for t in temp if t.isUnique]

    return [t.list for t in temp if t.isUnique]

print listReduce(original)

You didn't give the required output, but I'm guessing this is right, as [1,2,3]does not appear in the output.

您没有提供所需的输出,但我猜这是正确的,因为[1,2,3]没有出现在输出中。

回答by hughdbrown

So what you really wanted was to know if a list was a substring, so to speak, of another, with all the matching elements consecutive. Here is code that converts the candidate and the target list to comma-separated strings and does a substring comparison to see if the candidate appears within the target list

所以你真正想要的是知道一个列表是否是一个子字符串,可以说是另一个,所有匹配的元素都是连续的。这是将候选和目标列表转换为逗号分隔的字符串并进行子字符串比较以查看候选是否出现在目标列表中的代码

def is_sublist_of_any_list(cand, lists):
    def comma_list(l):
        return "," + ",".join(str(x) for x in l) + ","
    cand = comma_list(cand)
    return any(cand in comma_list(target) for target in lists if len(cand) <= len(target))


def super_lists(lists):
    return [cand for i, cand in enumerate(lists) if not is_sublist_of_any_list(cand, lists[:i] + lists[i+1:])]

The function comma_list() puts leading and trailing commas on the list to ensure that integers are fully delimited. Otherwise, [1] would be a subset of [100], for example.

函数comma_list() 将前导和尾随逗号放在列表中以确保整数被完全分隔。否则,例如,[1] 将是 [100] 的子集。

回答by Ants Aasma

Edit: I really need to improve my reading comprehension. Here's the answer to what was actually asked. It exploits the fact that "A is super of B" implies "len(A) > len(B) or A == B".

编辑:我真的需要提高我的阅读理解能力。这是对实际问题的答案。它利用了“ A is super of B”暗示“ len(A) > len(B) or A == B”的事实。

def advance_to(it, value):
    """Advances an iterator until it matches the given value. Returns False
    if not found."""
    for item in it:
        if item == value:
            return True
    return False

def has_supersequence(seq, super_sequences):
    """Checks if the given sequence has a supersequence in the list of
    supersequences.""" 
    candidates = map(iter, super_sequences)
    for next_item in seq:
        candidates = [seq for seq in candidates if advance_to(seq, next_item)]
    return len(candidates) > 0

def find_supersequences(sequences):
    """Finds the supersequences in the given list of sequences.

    Sequence A is a supersequence of sequence B if B can be created by removing
    items from A."""
    super_seqs = []
    for candidate in sorted(sequences, key=len, reverse=True):
        if not has_supersequence(candidate, super_seqs):
            super_seqs.append(candidate)
    return super_seqs

print(find_supersequences([[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3],
    [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]))
#Output: [[1, 2, 3, 4, 5, 6, 7], [1, 2, 4, 8], [2, 3, 21]]

If you need to also preserve the original order of the sequences, then the find_supersequences()function needs to keep track of the positions of the sequences and sort the output afterwards.

如果您还需要保留序列的原始顺序,则该find_supersequences()函数需要跟踪序列的位置并在之后对输出进行排序。

回答by dugres

list0=[[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]

for list1 in list0[:]:
    for list2 in list0:
        if list2!=list1:
            len1=len(list1)
            c=0
            for n in list2:
                if n==list1[c]:
                    c+=1
                if c==len1:
                    list0.remove(list1)
                    break

This filters list0 in place using a copy of it. This is good if the result is expected to be about the same size as the original, there is only a few "subset" to remove.

这会使用 list0 的副本就地过滤 list0。如果预期结果与原始大小大致相同,则这很好,只需删除几个“子集”即可。

If the result is expected to be small and the original is large, you might prefer this one who is more memory freindly as it doesn't copy the original list.

如果预期结果很小而原始列表很大,您可能更喜欢这个更友好的内存,因为它不会复制原始列表。

list0=[[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
result=[]

for list1 in list0:
    subset=False
    for list2 in list0:
        if list2!=list1:
            len1=len(list1)
            c=0
            for n in list2:
                if n==list1[c]:
                    c+=1
                if c==len1:
                    subset=True
                    break
            if subset:
                break
    if not subset:
        result.append(list1)

回答by quamrana

This seems to work:

这似乎有效:

original=[[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]

target=[[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]

class SetAndList:
    def __init__(self,aList):
        self.list=aList
        self.set=set(aList)
        self.isUnique=True
    def compare(self,aList):
        s=set(aList)
        if self.set.issubset(s):
            #print self.list,'superceded by',aList
            self.isUnique=False

def listReduce(lists):
    temp=[]
    for l in lists:
        for t in temp:
            t.compare(l)
        temp.append( SetAndList(l) )

    return [t.list for t in temp if t.isUnique]

print listReduce(original)
print target

This prints the calculated list and the target for visual comparison.

这将打印计算出的列表和用于视觉比较的目标。

Uncomment the print line in the compare method to see how various lists get superceded.

在 compare 方法中取消注释打印行以查看各种列表如何被取代。

Tested with python 2.6.2

用python 2.6.2测试

回答by Jo?o Silva

I implemented a different issubseqbecause yours doesn't say that [1, 2, 4, 5, 6]is a subsequence of [1, 2, 3, 4, 5, 6, 7], for example (besides being painfully slow). The solution I came up with looks like this:

我实现了一个不同的,issubseq因为你的并没有说这[1, 2, 4, 5, 6]是 的子序列[1, 2, 3, 4, 5, 6, 7],例如(除了痛苦地缓慢之外)。我想出的解决方案如下所示:

 def is_subseq(a, b):
    if len(a) > len(b): return False
    start = 0
    for el in a:
        while start < len(b):
            if el == b[start]:
                break
            start = start + 1
        else:
            return False
    return True

def filter_partial_matches(sets):
     return [s for s in sets if all([not(is_subseq(s, ss)) for ss in sets if s != ss])]

A simple test case, given your inputs and outputs:

一个简单的测试用例,给定您的输入和输出:

>>> test = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
>>> another_test = [[1, 2, 3, 4], [2, 4, 3], [3, 4, 5]]
>>> filter_partial_matches(test)
[[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
>>> filter_partial_matches(another_test)
[[1, 2, 3, 4], [2, 4, 3], [3, 4, 5]]

Hope it helps!

希望能帮助到你!