列表中的 Python 唯一值

Question

提问by Mike Mcmahon

I am new to Python and I am finding set() to be a bit confusing. Can someone offer some help with finding and creating a new list of unique numbers( another words eliminate duplicates)?

我是 Python 新手，我发现 set() 有点令人困惑。有人可以提供一些帮助来查找和创建一个新的唯一数字列表（换句话说就是消除重复）？

import string
import re

def go():
        import re
        file = open("C:/Cryptography/Pollard/Pollard/newfile.txt","w")
        filename = "C:/Cryptography/Pollard/Pollard/primeFactors.txt"
        with open(filename, 'r') as f:
                lines = f.read()

                found = re.findall(r'[\d]+[^\d.\d+()+\s]+[^\s]+[\d+\w+\d]+[\d+\^+\d]+[\d+\w+\d]+', lines)
                a = found
                for i in range(5):
                         a[i] = str(found[i])
                         print(a[i].split('x'))

Now

现在

print(a[i].split('x'))

....gives the following output

....给出以下输出

['2', '3', '1451', '40591', '258983', '11409589', '8337580729',
'1932261797039146667']

['2897', '514081', '585530047', '108785617538783538760452408483163']

['2', '3', '5', '19', '28087', '4947999059',
'2182718359336613102811898933144207']

['3', '5', '53', '293', '31159', '201911', '7511070764480753',
'22798192180727861167']

['2', '164493637239099960712719840940483950285726027116731']

How do I output a list of only non repeating numbers? I read on the forums that "set()" can do this, but I have tried this with no avail. Any help is much appreciated!

如何输出仅包含非重复数字的列表？我在论坛上读到“set()”可以做到这一点，但我试过没有用。任何帮助深表感谢！

Answer 1

采纳答案by Blckknght

A setis a collection (like a listor tuple), but it does not allow duplicates and has very fast membership testing. You can use a list comprehension to filter out values in one list that have appeared in a previous list:

Aset是一个集合（如 alist或tuple），但它不允许重复并且具有非常快的成员资格测试。您可以使用列表推导过滤掉一个列表中出现在上一个列表中的值：

data = [['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
        ['2897', '514081', '585530047', '108785617538783538760452408483163'],
        ['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
        ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
        ['2', '164493637239099960712719840940483950285726027116731']]

seen = set() # set of seen values, which starts out empty

for lst in data:
    deduped = [x for x in lst if x not in seen] # filter out previously seen values
    seen.update(deduped)                        # add the new values to the set

    print(deduped)                              # do whatever with deduped list

Output:

输出：

['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667']
['2897', '514081', '585530047', '108785617538783538760452408483163']
['5', '19', '28087', '4947999059', '2182718359336613102811898933144207']
['53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']
['164493637239099960712719840940483950285726027116731']

Note that this version does not filter out values that are duplicated within a single list (unless they're already duplicates of a value in a previous list). You could work around that by replacing the list comprehension with an explicit loop that checks each individual value against the seenset (and adds it if it's new) before appending to a list for output. Or if the order of the items in your sub-lists is not important, you could turn them into sets of their own:

请注意，此版本不会过滤掉在单个列表中重复的值（除非它们已经与上一个列表中的值重复）。您可以通过用显式循环替换列表理解来解决这个问题，该循环在附加到输出列表之前根据seen集合检查每个单独的值（add如果它是新的，则检查它）。或者，如果您的子列表中项目的顺序不重要，您可以将它们变成自己的集合：

seen = set()
for lst in data:
    lst_as_set = set(lst)               # this step eliminates internal duplicates
    deduped_set = lst_as_set - seen     # set subtraction!
    seen.update(deduped_set)

    # now do stuff with deduped_set, which is iterable, but in an arbitrary order

Finally, if the internal sub-lists are a red herring entirely and you want to simply filter a flattened list to get only unique values, that sounds like a job for the unique_everseenrecipe from the itertoolsdocumentation:

最后，如果内部子列表完全是一个红鲱鱼，并且您只想过滤扁平列表以仅获取唯一值，那么这听起来像是文档中unique_everseen配方的工作：itertools

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

Answer 2

回答by Anthony Kong

setshould work in this case.

set在这种情况下应该工作。

You can try the following:

您可以尝试以下操作：

# Concat all your lists into a single list
>>> a = ['2', '3', '1451', '40591', '258983', '11409589', '8337580729','1932261797039146667'] +['2897', '514081', '585530047', '108785617538783538760452408483163'] +['2', '3', '5', '19', '28087', '4947999059','2182718359336613102811898933144207'] + ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']+ ['2', '164493637239099960712719840940483950285726027116731']
>>> len(a)
29
>>> set(a)
set(['514081', '258983', '40591', '201911', '11409589', '585530047', '3', '2', '5', '108785617538783538760452408483163', '2279819218\
0727861167', '164493637239099960712719840940483950285726027116731', '8337580729', '4947999059', '19', '2897', '7511070764480753', '5\
3', '28087', '2182718359336613102811898933144207', '1451', '31159', '1932261797039146667', '293'])

>>> len(set(a))
24
>>>

Answer 3

回答by elucify

If you want unique values from the flattened list, you can use reduce() to flatten the list. Then use the frozenset() constructor to get the result list:

如果您想要扁平列表中的唯一值，您可以使用 reduce() 来扁平列表。然后使用frozenset()构造函数获取结果列表：

>>> data = [
   ['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
   ['2897', '514081', '585530047', '108785617538783538760452408483163'],
   ['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
   ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
   ['2', '164493637239099960712719840940483950285726027116731']]

>>> print list(frozenset(reduce((lambda a, b: a+b), data)))
['514081', '258983', '40591', '201911', '11409589', '585530047', '3',
'2', '5', '108785617538783538760452408483163', '22798192180727861167',
'164493637239099960712719840940483950285726027116731', '8337580729', 
'4947999059', '19', '2897', '7511070764480753', '53', '28087', 
'2182718359336613102811898933144207', '1451', '31159',
'1932261797039146667', '293']

列表中的 Python 唯一值

提问by Mike Mcmahon

采纳答案by Blckknght

回答by Anthony Kong

回答by elucify

相关推荐

最近更新

标签

列表中的 Python 唯一值

提问by Mike Mcmahon

采纳答案by Blckknght

回答by Anthony Kong

回答by elucify

相关推荐

Python 无法安装 pyaudio，gcc 错误

如何在python中将字节字符串拆分为单独的字节

Python 为 Scikit-Learn 向量化 Pandas 数据框

Python 在 DataFrame 索引上应用函数

相关推荐

最近更新

标签